Manifests¶
Introduction¶
payu automatically generates and updates manifest files in the manifest
subdirectory in the control directory. The manifests are stored in YAML
format.
There are three manifests: manifest/exe.yaml
tracks executable files,
manifest/input.yaml
tracks input files and manifest/restart.yaml
tracks restart files.
Only files in the temporary work
directory are tracked by manifests. Any
files that are directly accessed from other locations in the filesystem
within models or other programs are not tracked
Manifest contents¶
The manifests store information about the files contained in the
work
directory of an experiment. In most cases those files are symbolically
linked from another location.
An example input manifest is shown below:
format: yamanifest
version: 1.0
---
work/INPUT/gotmturb.inp:
fullpath: /scratch/x00/aaa000/mom/input/bowl1/gotmturb.inp
hashes:
binhash: 1730d092cdc5d86e234d3749857ed318
md5: 3016ea3bccf1acd2c18eefdd6dbf02e9
work/INPUT/grid_spec.nc:
fullpath: /scratch/x00/aaa000/mom/input/bowl1/grid_spec.nc
hashes:
binhash: b79c406507e2b96725a08237e2165314
md5: f571a0106c4a2eba38e3c407335e8cca
work/INPUT/ocean_temp_salt.res.nc:
fullpath: /scratch/x00/aaa000/mom/input/bowl1/ocean_temp_salt.res.nc
hashes:
binhash: d70322dece2f10aaacf751254a2acee7
md5: f506e15417ed813fde3516a262ff35e5
The first section of the file specifes a format (yamanifest
) and a version
number (1.0
). The second section has a local path in the work
directory
as the key, and for each of these paths stores the location in the filesystem
(fullpath
) and two hashes, binhash
and md5
.
There are two hashes as binhash
is fast and size independent designed
just to detect if a file has changed. If the calculated binhash is not the same
as that stored in the manifest the slower but robust MD5 hash is calculated.
Whenever a hash changes the updated value is stored in the manifest file.
Experiment tracking¶
The manifest files are automatically added to the git repository that tracks changes to the experimental configuration. Each time the model is run the manifest is checked and changed hashes are updated, and any new files found are added to the manifest.
In this way manifests uniquely identify all executables, input and restart files for each model run.
Manifest updates¶
Each of the manifests is updated in a slightly different way which reflects the way the files are expected to change during an experiment.
The executable manifest is recalculated each time the model is run. Executables are generally fairly small in size and number, so there is very little overhead calculating full MD5 hashes. This also means there is no need to check that exectutable paths are still correct and also any changes to executables are automatically included in the manifest.
The restart manifest is also recalculated for every run as there is no expectation that restart (or pickup) files are ever the same between normal model runs.
The input manifest changes relatively rarely and can often contain a small number of very large files. It is this combination that can cause a significant time overhead if full MD5 hashes have to be computed for every run. By using binhash, a fast change-sensitive hash, these time consuming hashes only need be computed when a change has been detected. So the slow md5 hashes are recalculated as little as possible.
Manifest options¶
By default manifests just reflect the state of the model, and when files change the update hashes are saved in the manifest. These changes in the manifest files are then tracked with git.
There are some configuration options available to change this default behaviour.