Restarts

IFS-NEMO

After each SIM chunk, the resulting restart files are automatically relocated to separate subdirectories, all inside a directory identified by the chunk number. The restart directory is also automatically defined as a symbolic link to the current chunk’s restart directory, allowing users to rerun any particular chunk at their convenience. In a regular execution, the directory 1/ will be empty (initializing from initial conditions), the restart 2/ contain the restart files necessary to start the second chunk, and so on.

In case of a failure: - If the failure is before writing the restarts, the retrial will take the restart files to initialize the chunk again. - If the failure is during the writing of the restart files, the worfklow will detect that and use the not corrupted set of restart files.

ICON

The restart mechanism in ICON is similar to that of IFS-NEMO, where restart files are generated at the end of each simulation chunk. After each SIM chunk, the restart file is moved to the rundir directory, which inclues a restart folder with all the chunk restarts. The naming of these include the experiment id, the simulation name being run, the resolution, if its ocean or atmosphere and the chunk end-date (ex. t062/rundir/restarts/t062_test-icon-r2b8_restart_oce_19900103T000000Z.mfr) In a regular execution, the workflow and ICON should start from ocean initial conditions that are included in the input data. This mechanissm is controlled by variable initialize_fromrestart which is set to False if we are running the first chunk, and True if we are running subsequent chunks. When the first chunk is finished, the subsequent chunks basically start from the restart files generated by the previous chunk, this is done by creating simlinks in the model rundir, to the restart folder mentioned before. In case of a failure: - The retrial will run the same chunk again, using the last restart files generated by the previous chunk or the initial conditions.

IFS-FESOM

The IFS-FESOM restart mechanism follows a similar chunk-based approach but manages two sets of restart files: IFS restarts (waminfo, rcf) and FESOM restarts (fesom_raw_restart/ directory, which contains fesom.clock). After each SIM chunk, the restart files are relocated to a subdirectory identified by the chunk number. A current symbolic link always points to the latest completed chunk’s restart directory.

Before each chunk starts, the workflow creates backups (rcf-backup, waminfo-backup, fesom_raw_restart-backup) of the current restart files. On a retrial, these backups are automatically detected and restored, so the chunk can be re-executed without manual intervention. After a successful chunk, the restarts are moved to the next chunk’s directory, the backups in the current chunk are restored (to allow future retrials), and the current symlink is updated to point to the next chunk.

Restarted runs from another experiment are also supported. Setting RESTARTED_RUN: "true" and RESTART_FROM: "<source-expid>" in the configuration will cause the INI step to copy the restart files from the source experiment’s current/ symlink into chunk 1 of the new experiment using rsync. For runs where the start date differs from the original IFS start date, an inipath with date-redirection symlinks is created automatically.

In case of a failure: - The retrial will automatically restore from the pre-run backups and rerun the chunk. - See the FAQ for a step-by-step procedure for manual re-execution.