File and Directory Structure **************************** This section provides an overview of the file and directory structure of the Climate DT Workflow project. Below is a tree view of the main files and directories, followed by a detailed explanation of their purpose. Project Structure ================ .. code-block:: none . ├── catalog/ # Catalog of datasets or configurations ├── CHANGELOG.md # Log of changes made to the project ├── conf/ # Configuration files for the workflow │ ├── additional_jobs/ # Definitions for additional jobs │ │ ├── aqua-True.yml # AQUA job configuration │ │ ├── backup-True.yml # Backup job configuration │ │ └── ... # Other additional job configurations │ ├── applications/ # Application-specific configurations │ │ ├── container_versions.yml # Container version mappings │ │ ├── energy_indicators/ # Directory for energy indicators application configuration │ │ ├── request/ # Directory to set data request details for all apps │ │ └── ... # Other application configurations │ ├── bootstrap/ # Internal bootstrap configurations │ │ └── include.yml # Entry point for loading workflow configuration │ ├── data_gov/ # Data governance-related configurations │ │ ├── production.yml # Production data governance rules │ │ └── ... # Other data governance configurations │ ├── defaults/ # Default configuration files │ │ ├── defaults_model.yml # Default model configuration │ │ └── ... # Other default configurations │ ├── model/ # Model-specific configurations │ │ ├── icon/ # ICON model configurations │ │ ├── ifs-fesom/ # IFS-FESOM model configurations │ │ └── ifs-nemo/ # IFS-NEMO model configurations │ ├── simulation/ # Simulation-specific configurations │ │ ├── control-ifs-nemo.yml # Control simulation for IFS-NEMO │ │ └── ... # Other simulation configurations │ └── platforms.yml # Platform-specific configurations ├── data-portfolio/ # Submodule: Data portfolio for model output ├── docs/ # Documentation source files │ ├── source/ # Sphinx documentation source │ │ ├── developers_guide/ # Developer guide documentation │ │ ├── schemas/ # Schema documentation │ │ └── users_guide/ # User guide documentation ├── dvc-cache-de340/ # Submodule: DVC cache for managing IFS inputdata ├── ifs-nemo/ # Submodule: IFS-NEMO model sources ├── lib/ # Shared libraries and utilities │ ├── common/ # Common utility scripts │ │ ├── checkers.sh # Script for validation checks │ │ └── util.sh # General utility functions │ ├── LUMI/ # LUMI-specific configurations │ │ └── config.sh # Configuration script for LUMI │ └── ... # Other platform-specific configurations ├── mains/ # Main configuration examples │ ├── main_example_app.yml # Example configuration for applications workflow │ └── ... # Other main configuration examples ├── Makefile # Build automation file ├── nemo/ # Submodule: NEMO model sources ├── pyproject.toml # Python project configuration ├── pytest.ini # Pytest configuration ├── README.md # Project overview and getting started guide ├── runscripts/ # Additional scripts for APPS, ICON, FDB │ ├── dn/ # Data Notifier scripts │ │ └── run_dn.py # Script to run the Data Notifier │ ├── icon # ICON model runscripts │ │ ├── control # Control runscripts for ICON │ │ ├── historical # Historical runscripts for ICON │ │ └── ... # Other ICON runscripts │ ├── hydroland/ # Hydroland application scripts │ │ ├── run_hydroland.sh # Script to run Hydroland │ │ └── ... # Other Hydroland scripts │ └── ... # Other application-specific scripts ├── setup.py # Python package setup script ├── templates/ # Main workflow jobs bash templates │ ├── aqua/ # AQUA-specific templates │ │ ├── aqua_analysis.sh # AQUA analysis script template │ │ └── ... # Other AQUA templates │ └── ... # Other templates ├── tests/ # Unit and integration tests │ ├── bats_tests/ # BATS tests for job shell scripts │ ├── schemas/ # JSON schema validation tests │ │ ├── run.schema.json # Schema for the RUN section │ │ └── ... # Other schema files │ └── workflow_mock/ # Mock workflow tests └── utils/ # Utility scripts and helper functions ├── logger.py # Logging utilities ├── update_changelog.sh # Script to update the changelog └── ... # Other utility scripts Detailed Descriptions ==================== Configuration System (``conf/``) ------------------------------- The ``conf/`` directory contains all configuration files used by the workflow system to define behavior across different environments, models, and applications. Model Configurations (``conf/model/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This directory contains settings specific to each climate model: - ``icon/``: Configuration files for the ICON (ICOsahedral Nonhydrostatic) atmospheric model, including: - Model resolution settings depending on the processing unit - Wallclock and timestepping configurations - Physical parameterization options - Default request resolutions - ``ifs-fesom/``: Configuration for the IFS-FESOM coupled model: - IFS model parameters depending on the processing unit - Wallclock, IO tasks, and additional setup configuration - ICMCL configurations - Default request resolutions - ``ifs-nemo/``: Configuration for the IFS-NEMO coupled model: - IFS model parameters depending on the processing unit - Wallclock, IO tasks, and additional setup configuration - Default request resolutions Application Configurations (``conf/applications/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Contains settings specific to downstream applications: - ``container_versions.yml``: Maps application names to container versions. - ``default_gsv_request.yml``: Configuration for the GSV, including: - Grid resolutions - Area to process - Method (e.g., nn) - ``opa/opa.yml``: Configuration for the opa, including: - Apps output directory - Retries - Platform specific settings - ``energy_indicators/``: Directory for energy indicators application configurable physical parameters. - ``energy_offshore/``: Directory for energy offshore application configurable physical parameters. - ``hydroland/``: Directory for hydroland application configurable physical parameters. - ``hydromet/``: Directory for hydromet application configurable physical parameters. - ``wildfires_fwi/``: Directory for wildfires FWI application configurable physical parameters. - ``wildfires_wise/``: Directory for wildfires WISE application configurable physical parameters. - ``obsall/``: Directory for OBSALL application configurable physical parameters. - ``data/``: Directory to set details of the data workflow. - ``request/``: Directory to set data request details for all apps. A file per app. - Contains detailed specifications for GSV and OPA requests - Defines parameters like activity, resolution, generation, and realization - Includes hardcoded settings for datasets, grids, and methods Workflow Job Management (``conf/additional_jobs/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Defines auxiliary workflows that can be attached to the main simulation workflow: - ``aqua-True.yml``: Configuration for the AQUA jobs, including: - Adds AQUA to the jobs definitions - Additional parameters for the analysis - ``dqc-True.yml``: Configuration for DQC jobs, including: - Adds the DQC to the jobs definitions - Additional DQC parameters (e.g., profiles) - ``backup-True.yml``: Configuration for backup jobs. - ``cleanup-True.yml``: Configuration for cleanup jobs. - ``transfer-True.yml``: Configuration for transfer jobs. - ``memory_checker-True.yml``: Configuration for memory checking job. - ``transfer-True.yml``: Configuration for transferring FDB data between systems. - ``wipe-True.yml``: Configuration for cleaning up transferred data. - ``size_checker-True.yml``: Configuration for checking the size of generated data. - ``scaling-True.yml``: Configuration for performance scaling job. - ``energy_indicators-True.yml``: Configuration for energy indicators jobs. - ``energy_offshore-True.yml``: Configuration for jobs of Energy Offshore. - ``hydroland-True.yml``: Configuration for jobs of Hydroland. - ``hydromet-True.yml``: Configuration for jobs of Hydromet. - ``wildfires_fwi-True.yml``: Configuration for jobs of Wildfires FWI. - ``wildfires_wise-True.yml``: Configuration for jobs of Wildfires WISE. - ``obsall-True.yml``: Configuration for jobs of OBSALL. - ``data-True.yml``: Configuration for data retrieval workflow. - ``postproc_hydroland-True.yml``: Configuration for post-processing jobs of Hydroland. - ``postproc_energy_indicators-True.yml``: Configuration for post-processing jobs of energy_indicators. Bootstrap System (``conf/bootstrap/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``include.yml``: Bootstrap entry point that: - Establishes configuration loading order - Sets up environment-specific overrides - Initializes the workflow context Data Governance (``conf/run_types/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Defines data management policies and rules for each workflow mode: - ``production.yml``: Production environment data governance rules for: - FDB keys (EXPVER: 0001, CLASS: d1, FDB Path, etc.) - Application specifics - ``research.yml``: Research environment data governance rules for: - GSV Requests (expid, d1, FDB Path, etc.) - Application specifics Additional files for other types of runs: - ``pre-production.yml`` - ``test.yml`` - ``operational-read`` Default Settings (``conf/defaults/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Base configurations that can be overridden by more specific files: - ``defaults_model.yml``: Default parameters for all models, including: - Model paths - Resquest and AQUA configurations - I/O settings Additional files for other modes of the workflow: - ``defaults_end-to-end.yml`` - ``defaults_simless.yml`` Simulation Configurations (``conf/simulation/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Defines standard simulation types and scenarios: - ``control-ifs-nemo.yml``: Configuration for IFS-NEMO control simulations, including: - Chunking information, calendar, etc. - IFS-NEMO Multi-IO Plans, RAPS conf, etc. - GSV Request definitions - ``control-r2b9-icon-1990.yml``: Configuration for ICON 5km control simulations, including: - Chunking information, calendar, etc. - Simulation timestepping definitions - GSV Request definitions - ``ifs-fesom-control-tco79.yml``: Configuration for IFS-FESOM tco79 control simulations, including: - Chunking information, calendar, etc. - RAPS, Portoflio, DQC settings - GSV Request definitions Platform Configurations (``conf/platforms.yml``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``platforms.yml``: Settings for different computing platforms: - Defines platform-specific parameters (e.g., LUMI, MN5) - Includes queue names - Wallclock limits - SBATCH options - Components paths and version General Configuration (``conf/general.yml``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``general.yml``: Contains global configuration settings for the workflow: - Defines paths for containers, scratch directories, and libraries - Sets up default directories for HPC projects and FDB (Field Database) - Includes general tools and ensemble versioning information Job Templates (``conf/jobs_.yml``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``jobs_model.yml``: Defines job configurations for model workflows: - Specifies dependencies, wallclock limits, and platform settings. - ``jobs_simless.yml``: Defines job configurations for simulation-less workflows: - Specifies dependencies, wallclock limits, and platform settings, does not include SIM. - ``jobs_end-to-end.yml``: Defines basic job configurations for end-to-end workflows. - Specifies dependencies, wallclock limits, and platform settings, does include up to DN. - ``jobs_apps.yml``: Defines job configurations for application workflows. - Specifies dependencies, wallclock limits, and platform settings, does include up to DN GSV Configuration (``conf/gsv.yml``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``gsv.yml``: Configuration file for the GSV: - Maps paths for grid definitions, weights, and test files - Defines GSV version - Defines model grid definitions paths Libraries and Utilities (``lib/``) --------------------------------- Collection of shared scripts and libraries used across the workflow: Common Utilities (``lib/common/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``checkers.sh``: Contains validation functions for: - Verifying configuration file integrity - Checking environment setup for required dependencies - Validating input data for workflows - ``util.sh``: General utility functions for: - Logging and error handling - File system operations, such as creating directories or managing files - String manipulation and other helper functions Platform-Specific Configurations (``lib/LUMI/`` and others) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``LUMI/config.sh``: Configuration settings specific to the LUMI supercomputing platform: - Defines queue configurations and module loading for LUMI - Sets up environment variables and I/O path mappings for workflows - Includes functions for loading Singularity containers and managing dependencies - ``MARENOSTRUM5/config.sh``: Configuration for MareNostrum5: - Defines HPC-specific settings, such as module loading and queue names - Manages paths for input/output data and temporary directories Runtime Components ----------------- Runscripts (``runscripts/``) ~~~~~~~~~~~~~~~~~~~~~~~~~~ Contains scripts for ICON, applications, and other components: - ``icon/`` - ``control/``: ICON model execution scripts for control simulations: - Manages namelists, input data, and output paths - Handles processing of restart files and log files - Configures process mappings - Handles job submission ICON control runs - ``historical/``: ICON model execution scripts for historical simulations: - Similar to control scripts but tailored for historical data processing - Manages specific configurations for historical runs - Handles job submission for ICON historical runs - ``test/``: ICON scripts for testing - ``dn/run_dn.py``: Script for the Data Notifier, which: - Monitors for new data availability in the system - Triggers downstream workflows based on data readiness - Sends notifications to ensure workflow synchronization - ``hydroland/run_hydroland.sh``: Script for the Hydroland application, which: - Processes meteorological inputs and runs hydrological models - Generates outputs such as river discharge and soil moisture - Manages restart files and log files to optimize memory usage - ``opa/run_opa.py``: Scripts for the OPA (One Pass algorithm) - ``wildfires_fwi/``: Scripts for running the Wildfires FWI application - ``wildfires_wise/``: Scripts for running the Wildfires WISE application - ``ensembles/``: Scripts for ensemble simulations: - ``perturb_nemo_restart.py``: Perturbs NEMO restart files for ensemble simulations - ``perturb_var.py``: Perturbs specific variables for ensemble runs - ``energy_onshore/run_energy_onshore.py``: Scripts for the Energy Onshore application: - Processes data and runs simulations for onshore energy applications - ``energy_offshore/run_energy_offshore.py``: Scripts for the Energy Offshore application: - Processes data and runs simulations for offshore energy applications - ``FDB/``: Scripts for managing the Field Database (FDB): - ``count_expected_messages.py``: Counts expected messages in the FDB - ``yaml_to_mars.py``: Converts YAML configurations to MARS requests - ``update_fdb_info.py``: Updates FDB metadata and configurations Templates (``templates/``) ~~~~~~~~~~~~~~~~~~~~~~~~ The ``templates/`` directory contains the bash scripts that define the behavior of the workflow jobs and components. Additionally it includes configuration files for the FDB and AQUA. - ``aqua/``: Templates for AQUA-related workflows: - ``aqua_analysis.sh``: Template for AQUA analysis jobs, which: - Performs quality analysis on model outputs, such as consistency checks - Generates diagnostic metrics and visualizations - Supports containerized execution for portability - ``aqua_push.sh``: Template for pushing AQUA outputs to external storage or databases - ``lra_generator.sh``: Template for generating LRA (Low Resolution Archive) configurations for AQUA - ``sim_ifs-nemo.sh``: Template for IFS-NEMO simulation jobs, which: - Configures model-specific parameters, such as grid resolution and timestepping - Manages input/output paths and restart files - Executes the simulation in chunks to optimize resource usage - ``sim_icon.sh``: Template for ICON simulation jobs, which: - Configures ICON-specific parameters, such as grid identifiers and refinement levels - Manages timestepping and coupling configurations - Handles restart files and output paths - ``sim_nemo.sh``: Template for standalone NEMO simulation jobs, which: - Configures ocean model parameters, such as grid resolution and timestepping - Manages input data and restart files - Executes the simulation and handles output generation - ``application.sh``: General template for running application-specific workflows, including: - **Energy Onshore**: Processes data and runs simulations for onshore energy applications - **Energy Offshore**: Processes data and runs simulations for offshore energy applications - **Hydroland**: Runs hydrological models and processes meteorological inputs - **Wildfires FWI**: Calculates fire weather indices for wildfire risk assessment - **Wildfires WISE**: Simulates wildfire spread and behavior - ``dqc.sh``: Template for running Data Quality Checker (DQC) jobs, which: - Validates data compliance with predefined standards - Checks spatial completeness, consistency, and physical plausibility - ``dn.sh``: Template for the Data Notifier (DN) service, which: - Monitors for new data availability - Triggers downstream workflows based on data readiness - Sends notifications to ensure workflow synchronization - ``transfer.sh``: Template for transferring data between systems, which: - Manages data movement to and from HPC environments - Ensures data integrity during transfers - Supports integration with the Field Database (FDB) - ``remote_setup.sh``: Template for setting up remote environments, which: - Loads HPC-specific configurations and modules - Prepares directories and dependencies for job execution - Handles compilation and installation of models - ``local_setup.sh``: Template for setting up local environments, which: - Prepares the local directory structure for workflows - Validates configuration files and dependencies - Compresses and transfers project files to remote systems - ``ini.sh``: Template for initializing simulations, which: - Configures initial conditions for models - Prepares namelists and other input files - Handles dependencies for simulation startup - ``synchronize.sh``: Template for synchronizing data across systems, which: - Ensures consistency between local and remote environments - Manages file transfers and updates - ``wipe.sh``: Template for cleaning up data that has already been transferred to the bridge. - Removes intermediate files generated during workflows - Frees up storage space on HPC systems Testing Framework (``tests/``) ---------------------------- Comprehensive test suite for validating workflow components: - ``bats_tests/``: BATS (Bash Automated Testing System) tests for shell scripts that: - Verify the functionality of job templates and utility scripts - Test error handling and edge cases in shell scripts - Ensure compatibility with different HPC environments - ``schemas/``: JSON schema validation tests: - ``run.schema.json``: Schema definition for validating the ``RUN`` section of the workflow configuration - Other schemas for validating model, application, and platform configurations - ``workflow_mock/``: Mock tests for workflow execution that: - Simulate workflow scenarios without requiring full execution - Test job dependencies and sequencing logic - Validate the correctness of workflow configurations and outputs Documentation (``docs/``) ----------------------- Comprehensive documentation for the project: - ``source/developers_guide/``: Documentation for developers, including: - Details on the architecture and structure of the workflow - Guidelines for contributing to the project - API references for internal tools and libraries - ``source/users_guide/``: Documentation for end users, including: - Tutorials for setting up and running experiments - Examples of workflow configurations for different use cases - Troubleshooting common issues - ``source/schemas/``: Documentation of JSON schemas, including: - Schema definitions for workflow configuration files - Validation rules for ensuring configuration correctness - Example configurations for reference Submodules --------- Data Management ~~~~~~~~~~~~~~~~ - ``catalog/``: Catalog of available datasets and configurations for AQUA: - Contains metadata and configuration files for datasets - Defines paths and parameters for accessing and processing data - ``data-portfolio/``: Submodule containing the data portfolio for model output: - Manages metadata and configurations for model-generated data - Ensures consistency and traceability of data across workflows - ``dvc-cache-de340/``: DVC (Data Version Control) cache for managing IFS input data: - Stores versioned input data for reproducibility - Tracks changes to input datasets over time Model Source Code ~~~~~~~~~~~~~~~~~ The project integrates multiple model codebases as Git submodules: - ``ifs-nemo/``: Source code for the IFS-NEMO coupled Earth system model: - Combines the IFS atmospheric model with the NEMO ocean model - Includes configuration files, scripts, and source code for running coupled simulations - ``nemo/``: Standalone NEMO ocean model source code: - Includes configuration files, scripts, and source code for running ocean-only simulations - Supports various resolutions and configurations for ocean modeling