File and Directory Structure
This section provides an overview of the file and directory structure of the Climate DT Workflow project. Below is a tree view of the main files and directories, followed by a detailed explanation of their purpose.
Project Structure
.
├── catalog/ # Catalog of datasets or configurations
├── CHANGELOG.md # Log of changes made to the project
├── conf/ # Configuration files for the workflow
│ ├── additional_jobs/ # Definitions for additional jobs
│ │ ├── aqua-True.yml # AQUA job configuration
│ │ ├── backup-True.yml # Backup job configuration
│ │ └── ... # Other additional job configurations
│ ├── applications/ # Application-specific configurations
│ │ ├── container_versions.yml # Container version mappings
│ │ ├── energy_indicators/ # Directory for energy indicators application configuration
│ │ ├── request/ # Directory to set data request details for all apps
│ │ └── ... # Other application configurations
│ ├── bootstrap/ # Internal bootstrap configurations
│ │ └── include.yml # Entry point for loading workflow configuration
│ ├── data_gov/ # Data governance-related configurations
│ │ ├── production.yml # Production data governance rules
│ │ └── ... # Other data governance configurations
│ ├── defaults/ # Default configuration files
│ │ ├── defaults_model.yml # Default model configuration
│ │ └── ... # Other default configurations
│ ├── model/ # Model-specific configurations
│ │ ├── icon/ # ICON model configurations
│ │ ├── ifs-fesom/ # IFS-FESOM model configurations
│ │ └── ifs-nemo/ # IFS-NEMO model configurations
│ ├── simulation/ # Simulation-specific configurations
│ │ ├── control-ifs-nemo.yml # Control simulation for IFS-NEMO
│ │ └── ... # Other simulation configurations
│ └── platforms.yml # Platform-specific configurations
├── data-portfolio/ # Submodule: Data portfolio for model output
├── docs/ # Documentation source files
│ ├── source/ # Sphinx documentation source
│ │ ├── developers_guide/ # Developer guide documentation
│ │ ├── schemas/ # Schema documentation
│ │ └── users_guide/ # User guide documentation
├── dvc-cache-de340/ # Submodule: DVC cache for managing IFS inputdata
├── ifs-nemo/ # Submodule: IFS-NEMO model sources
├── lib/ # Shared libraries and utilities
│ ├── common/ # Common utility scripts
│ │ ├── checkers.sh # Script for validation checks
│ │ └── util.sh # General utility functions
│ ├── LUMI/ # LUMI-specific configurations
│ │ └── config.sh # Configuration script for LUMI
│ └── ... # Other platform-specific configurations
├── mains/ # Main configuration examples
│ ├── main_example_app.yml # Example configuration for applications workflow
│ └── ... # Other main configuration examples
├── Makefile # Build automation file
├── nemo/ # Submodule: NEMO model sources
├── pyproject.toml # Python project configuration
├── pytest.ini # Pytest configuration
├── README.md # Project overview and getting started guide
├── runscripts/ # Additional scripts for APPS, ICON, FDB
│ ├── dn/ # Data Notifier scripts
│ │ └── run_dn.py # Script to run the Data Notifier
│ ├── icon # ICON model runscripts
│ │ ├── control # Control runscripts for ICON
│ │ ├── historical # Historical runscripts for ICON
│ │ └── ... # Other ICON runscripts
│ ├── hydroland/ # Hydroland application scripts
│ │ ├── run_hydroland.sh # Script to run Hydroland
│ │ └── ... # Other Hydroland scripts
│ └── ... # Other application-specific scripts
├── setup.py # Python package setup script
├── templates/ # Main workflow jobs bash templates
│ ├── aqua/ # AQUA-specific templates
│ │ ├── aqua_analysis.sh # AQUA analysis script template
│ │ └── ... # Other AQUA templates
│ └── ... # Other templates
├── tests/ # Unit and integration tests
│ ├── bats_tests/ # BATS tests for job shell scripts
│ ├── schemas/ # JSON schema validation tests
│ │ ├── run.schema.json # Schema for the RUN section
│ │ └── ... # Other schema files
│ └── workflow_mock/ # Mock workflow tests
└── utils/ # Utility scripts and helper functions
├── logger.py # Logging utilities
├── update_changelog.sh # Script to update the changelog
└── ... # Other utility scripts
Detailed Descriptions
Configuration System (conf/)
The conf/ directory contains all configuration files used by the workflow system to define behavior across different environments, models, and applications.
Model Configurations (conf/model/)
This directory contains settings specific to each climate model:
icon/: Configuration files for the ICON (ICOsahedral Nonhydrostatic) atmospheric model, including:Model resolution settings depending on the processing unit
Wallclock and timestepping configurations
Physical parameterization options
Default request resolutions
ifs-fesom/: Configuration for the IFS-FESOM coupled model:IFS model parameters depending on the processing unit
Wallclock, IO tasks, and additional setup configuration
ICMCL configurations
Default request resolutions
ifs-nemo/: Configuration for the IFS-NEMO coupled model:IFS model parameters depending on the processing unit
Wallclock, IO tasks, and additional setup configuration
Default request resolutions
Application Configurations (conf/applications/)
Contains settings specific to downstream applications:
container_versions.yml: Maps application names to container versions.default_gsv_request.yml: Configuration for the GSV, including:Grid resolutions
Area to process
Method (e.g., nn)
opa/opa.yml: Configuration for the opa, including:Apps output directory
Retries
Platform specific settings
energy_indicators/: Directory for energy indicators application configurable physical parameters.energy_offshore/: Directory for energy offshore application configurable physical parameters.hydroland/: Directory for hydroland application configurable physical parameters.hydromet/: Directory for hydromet application configurable physical parameters.wildfires_fwi/: Directory for wildfires FWI application configurable physical parameters.wildfires_wise/: Directory for wildfires WISE application configurable physical parameters.obsall/: Directory for OBSALL application configurable physical parameters.data/: Directory to set details of the data workflow.request/: Directory to set data request details for all apps. A file per app.Contains detailed specifications for GSV and OPA requests
Defines parameters like activity, resolution, generation, and realization
Includes hardcoded settings for datasets, grids, and methods
Workflow Job Management (conf/additional_jobs/)
Defines auxiliary workflows that can be attached to the main simulation workflow:
aqua-True.yml: Configuration for the AQUA jobs, including:Adds AQUA to the jobs definitions
Additional parameters for the analysis
dqc-True.yml: Configuration for DQC jobs, including:Adds the DQC to the jobs definitions
Additional DQC parameters (e.g., profiles)
backup-True.yml: Configuration for backup jobs.cleanup-True.yml: Configuration for cleanup jobs.transfer-True.yml: Configuration for transfer jobs.memory_checker-True.yml: Configuration for memory checking job.transfer-True.yml: Configuration for transferring FDB data between systems.wipe-True.yml: Configuration for cleaning up transferred data.size_checker-True.yml: Configuration for checking the size of generated data.scaling-True.yml: Configuration for performance scaling job.energy_indicators-True.yml: Configuration for energy indicators jobs.energy_offshore-True.yml: Configuration for jobs of Energy Offshore.hydroland-True.yml: Configuration for jobs of Hydroland.hydromet-True.yml: Configuration for jobs of Hydromet.wildfires_fwi-True.yml: Configuration for jobs of Wildfires FWI.wildfires_wise-True.yml: Configuration for jobs of Wildfires WISE.obsall-True.yml: Configuration for jobs of OBSALL.data-True.yml: Configuration for data retrieval workflow.postproc_hydroland-True.yml: Configuration for post-processing jobs of Hydroland.postproc_energy_indicators-True.yml: Configuration for post-processing jobs of energy_indicators.
Bootstrap System (conf/bootstrap/)
include.yml: Bootstrap entry point that:Establishes configuration loading order
Sets up environment-specific overrides
Initializes the workflow context
Data Governance (conf/run_types/)
Defines data management policies and rules for each workflow mode:
production.yml: Production environment data governance rules for:FDB keys (EXPVER: 0001, CLASS: d1, FDB Path, etc.)
Application specifics
research.yml: Research environment data governance rules for:GSV Requests (expid, d1, FDB Path, etc.)
Application specifics
Additional files for other types of runs:
pre-production.ymltest.ymloperational-read
Default Settings (conf/defaults/)
Base configurations that can be overridden by more specific files:
defaults_model.yml: Default parameters for all models, including:Model paths
Resquest and AQUA configurations
I/O settings
Additional files for other modes of the workflow:
defaults_end-to-end.ymldefaults_simless.yml
Simulation Configurations (conf/simulation/)
Defines standard simulation types and scenarios:
control-ifs-nemo.yml: Configuration for IFS-NEMO control simulations, including:Chunking information, calendar, etc.
IFS-NEMO Multi-IO Plans, RAPS conf, etc.
GSV Request definitions
control-r2b9-icon-1990.yml: Configuration for ICON 5km control simulations, including:Chunking information, calendar, etc.
Simulation timestepping definitions
GSV Request definitions
ifs-fesom-control-tco79.yml: Configuration for IFS-FESOM tco79 control simulations, including:Chunking information, calendar, etc.
RAPS, Portoflio, DQC settings
GSV Request definitions
Platform Configurations (conf/platforms.yml)
platforms.yml: Settings for different computing platforms:Defines platform-specific parameters (e.g., LUMI, MN5)
Includes queue names
Wallclock limits
SBATCH options
Components paths and version
General Configuration (conf/general.yml)
general.yml: Contains global configuration settings for the workflow:Defines paths for containers, scratch directories, and libraries
Sets up default directories for HPC projects and FDB (Field Database)
Includes general tools and ensemble versioning information
Job Templates (conf/jobs_<mode>.yml)
jobs_model.yml: Defines job configurations for model workflows:Specifies dependencies, wallclock limits, and platform settings.
jobs_simless.yml: Defines job configurations for simulation-less workflows:Specifies dependencies, wallclock limits, and platform settings, does not include SIM.
jobs_end-to-end.yml: Defines basic job configurations for end-to-end workflows.Specifies dependencies, wallclock limits, and platform settings, does include up to DN.
jobs_apps.yml: Defines job configurations for application workflows.Specifies dependencies, wallclock limits, and platform settings, does include up to DN
GSV Configuration (conf/gsv.yml)
gsv.yml: Configuration file for the GSV:Maps paths for grid definitions, weights, and test files
Defines GSV version
Defines model grid definitions paths
Libraries and Utilities (lib/)
Collection of shared scripts and libraries used across the workflow:
Common Utilities (lib/common/)
checkers.sh: Contains validation functions for:Verifying configuration file integrity
Checking environment setup for required dependencies
Validating input data for workflows
util.sh: General utility functions for:Logging and error handling
File system operations, such as creating directories or managing files
String manipulation and other helper functions
Platform-Specific Configurations (lib/LUMI/ and others)
LUMI/config.sh: Configuration settings specific to the LUMI supercomputing platform:Defines queue configurations and module loading for LUMI
Sets up environment variables and I/O path mappings for workflows
Includes functions for loading Singularity containers and managing dependencies
MARENOSTRUM5/config.sh: Configuration for MareNostrum5:Defines HPC-specific settings, such as module loading and queue names
Manages paths for input/output data and temporary directories
Runtime Components
Runscripts (runscripts/)
Contains scripts for ICON, applications, and other components:
icon/control/: ICON model execution scripts for control simulations:Manages namelists, input data, and output paths
Handles processing of restart files and log files
Configures process mappings
Handles job submission ICON control runs
historical/: ICON model execution scripts for historical simulations:Similar to control scripts but tailored for historical data processing
Manages specific configurations for historical runs
Handles job submission for ICON historical runs
test/: ICON scripts for testing
dn/run_dn.py: Script for the Data Notifier, which:Monitors for new data availability in the system
Triggers downstream workflows based on data readiness
Sends notifications to ensure workflow synchronization
hydroland/run_hydroland.sh: Script for the Hydroland application, which:Processes meteorological inputs and runs hydrological models
Generates outputs such as river discharge and soil moisture
Manages restart files and log files to optimize memory usage
opa/run_opa.py: Scripts for the OPA (One Pass algorithm)wildfires_fwi/: Scripts for running the Wildfires FWI applicationwildfires_wise/: Scripts for running the Wildfires WISE applicationensembles/: Scripts for ensemble simulations:perturb_nemo_restart.py: Perturbs NEMO restart files for ensemble simulationsperturb_var.py: Perturbs specific variables for ensemble runs
energy_onshore/run_energy_onshore.py: Scripts for the Energy Onshore application:Processes data and runs simulations for onshore energy applications
energy_offshore/run_energy_offshore.py: Scripts for the Energy Offshore application:Processes data and runs simulations for offshore energy applications
FDB/: Scripts for managing the Field Database (FDB):count_expected_messages.py: Counts expected messages in the FDByaml_to_mars.py: Converts YAML configurations to MARS requestsupdate_fdb_info.py: Updates FDB metadata and configurations
Templates (templates/)
The templates/ directory contains the bash scripts that define the behavior of the workflow jobs and components. Additionally it includes configuration files for the FDB and AQUA.
aqua/: Templates for AQUA-related workflows:aqua_analysis.sh: Template for AQUA analysis jobs, which:Performs quality analysis on model outputs, such as consistency checks
Generates diagnostic metrics and visualizations
Supports containerized execution for portability
aqua_push.sh: Template for pushing AQUA outputs to external storage or databaseslra_generator.sh: Template for generating LRA (Low Resolution Archive) configurations for AQUA
sim_ifs-nemo.sh: Template for IFS-NEMO simulation jobs, which:Configures model-specific parameters, such as grid resolution and timestepping
Manages input/output paths and restart files
Executes the simulation in chunks to optimize resource usage
sim_icon.sh: Template for ICON simulation jobs, which:Configures ICON-specific parameters, such as grid identifiers and refinement levels
Manages timestepping and coupling configurations
Handles restart files and output paths
sim_nemo.sh: Template for standalone NEMO simulation jobs, which:Configures ocean model parameters, such as grid resolution and timestepping
Manages input data and restart files
Executes the simulation and handles output generation
application.sh: General template for running application-specific workflows, including:Energy Onshore: Processes data and runs simulations for onshore energy applications
Energy Offshore: Processes data and runs simulations for offshore energy applications
Hydroland: Runs hydrological models and processes meteorological inputs
Wildfires FWI: Calculates fire weather indices for wildfire risk assessment
Wildfires WISE: Simulates wildfire spread and behavior
dqc.sh: Template for running Data Quality Checker (DQC) jobs, which:Validates data compliance with predefined standards
Checks spatial completeness, consistency, and physical plausibility
dn.sh: Template for the Data Notifier (DN) service, which:Monitors for new data availability
Triggers downstream workflows based on data readiness
Sends notifications to ensure workflow synchronization
transfer.sh: Template for transferring data between systems, which:Manages data movement to and from HPC environments
Ensures data integrity during transfers
Supports integration with the Field Database (FDB)
remote_setup.sh: Template for setting up remote environments, which:Loads HPC-specific configurations and modules
Prepares directories and dependencies for job execution
Handles compilation and installation of models
local_setup.sh: Template for setting up local environments, which:Prepares the local directory structure for workflows
Validates configuration files and dependencies
Compresses and transfers project files to remote systems
ini.sh: Template for initializing simulations, which:Configures initial conditions for models
Prepares namelists and other input files
Handles dependencies for simulation startup
synchronize.sh: Template for synchronizing data across systems, which:Ensures consistency between local and remote environments
Manages file transfers and updates
wipe.sh: Template for cleaning up data that has already been transferred to the bridge.Removes intermediate files generated during workflows
Frees up storage space on HPC systems
Testing Framework (tests/)
Comprehensive test suite for validating workflow components:
bats_tests/: BATS (Bash Automated Testing System) tests for shell scripts that:Verify the functionality of job templates and utility scripts
Test error handling and edge cases in shell scripts
Ensure compatibility with different HPC environments
schemas/: JSON schema validation tests:run.schema.json: Schema definition for validating theRUNsection of the workflow configurationOther schemas for validating model, application, and platform configurations
workflow_mock/: Mock tests for workflow execution that:Simulate workflow scenarios without requiring full execution
Test job dependencies and sequencing logic
Validate the correctness of workflow configurations and outputs
Documentation (docs/)
Comprehensive documentation for the project:
source/developers_guide/: Documentation for developers, including:Details on the architecture and structure of the workflow
Guidelines for contributing to the project
API references for internal tools and libraries
source/users_guide/: Documentation for end users, including:Tutorials for setting up and running experiments
Examples of workflow configurations for different use cases
Troubleshooting common issues
source/schemas/: Documentation of JSON schemas, including:Schema definitions for workflow configuration files
Validation rules for ensuring configuration correctness
Example configurations for reference
Submodules
Data Management
catalog/: Catalog of available datasets and configurations for AQUA:Contains metadata and configuration files for datasets
Defines paths and parameters for accessing and processing data
data-portfolio/: Submodule containing the data portfolio for model output:Manages metadata and configurations for model-generated data
Ensures consistency and traceability of data across workflows
dvc-cache-de340/: DVC (Data Version Control) cache for managing IFS input data:Stores versioned input data for reproducibility
Tracks changes to input datasets over time
Model Source Code
The project integrates multiple model codebases as Git submodules:
ifs-nemo/: Source code for the IFS-NEMO coupled Earth system model:Combines the IFS atmospheric model with the NEMO ocean model
Includes configuration files, scripts, and source code for running coupled simulations
nemo/: Standalone NEMO ocean model source code:Includes configuration files, scripts, and source code for running ocean-only simulations
Supports various resolutions and configurations for ocean modeling