Data governance

The FDB (Fields Data Base) is a domainspecific object store for meteorological data described by a curated schema of scientifically meaningful metadata. In this workflow, the FDB is used to store the data produced by the climate Models.

The applications read the data from the FDB. The keys used to write and read the data are defined under the REQUEST section of the YAML configuration files, and are based on the current schema.

The data in the FDB is encoded with a set of keys. This is the current first layer of the schema used in Climate-DT, that is used to encode the data in the FDB. The keys are:

class=d1, dataset=climate-dt, activity, experiment, generation, model, realization, expver, stream=clte/wave, date

Most of them are governed by the workflow, under REQUEST keys:

EXPERIMENT and ACTIVITY are set by RAPS in IFS-NEMO and IFS-FESOM. In the SIMULATION config files, we set the key %CONFIGURATION.RAPS_EXPERIMENT% to specify which experiment we run. RAPS associates the corresponding ACTIVITY. Then, to read the data, we set the keys REQUEST.EXPERIMENT and REQUEST.ACTIVITY. In ICON, the values of REQUEST.EXPERIMENT and REQUEST.ACTIVITY are directly used by the model.

Note

The keys REQUEST.EXPERIMENT, REQUEST.ACTIVITY, REQUEST.GENERATION, REQUEST.MODEL, REQUEST.REALIZATION and REQUEST.RESOLUTION are used to read the data from the FDB.

Information about the portfolio and the FDBs used in the workflow: https://wiki.eduuni.fi/spaces/cscRDIcollaboration/pages/578275221/Data+portfolios+and+FDB+documentation

Workflow types

Depending on the type of workflow, the data is stored in different FDBs and is to be accessed in a different way. The workflow allows to enable one or another way of accessing the data via the RUN.TYPE key in the main.yml file. The allowed keys for it are:

operational: The experiment writes the data using 0001 expver instead of the expid provided by Autosubmit. It writes in the HPC-FDB.
operational-read (previously production): The experiment reads the data from HPC-FDB using the expid provided by autosubmit. The data was previously written using 0001 expver.
research: The experiment writes the data using the expid provided by Autosubmit. It writes in the HPC-FDB.
pre-production: The experiment writes the data using 0001 expver instead of the expid provided by Autosubmit. It writes in a local FDB.
test: The experiment writes [reads] the data using the expid provided by Autosubmit. It writes in [reads from] a local FDB.

DataBridge

If the data was already transferred to the DataBridge, it can be read by enabling the key APP.READ_FROM_DATABRIDGE in the main.yml file.

LUMI

To access the data in the Data Bridge, it is enough to point to the correct FDB_HOME. The workflow will enable it by setting:

APP:
    READ_FROM_DATA_BRIDGE: "True"

MareNostrum 5

To access the data in the Data Bridge, the data needs to be downloaded to the HPC through the client machine. The workflow will enable it by setting:

APP:
    READ_FROM_DATA_BRIDGE: "True"
JOBS:
    DN:
        PLATFORM: "MARENOSTRUM5-READDATABRIDGE"

The MARENOSTRUM5-READDATABRIDGE platform is defined in the platforms.yml file, and it points to the client machine. It has access to the Data Bridge, and it can download the data to the HPC. The data is downloaded using the dn job. If this job is run in the client machine, it downloads the data to the HPC-FDB, and it can be accessed with the same FDB_HOME and keys as if it was stored in the production FDB.

Physically, the data is downloaded to: /gpfs/scratch/ehpc01/dte/fdb/gateway.