WikiPrint - from Polar Technologies

Simulation and post-processing


In this chapter you will learn about how to start a simulation and how to use the IPSL models and tools, from the beginning of the simulation to the post processing of the outputs and creation of diagrams.


1. Overview of IPSL running environment workflow

The main computing job automatically runs post processing jobs (at different frequencies) during the simulation. Here is a diagram describing the job sequence:


2. Simulation - Computing part

2.1. Submitting your simulation

Once you have defined and setup your simulation you can submit it. The run commands are:

curie > ccc_msub Job_MYJOBNAME
ada   > llsubmit Job_MYJOBNAME

These commands return a job number that can be used with the machine specificities to manage your job. Please refer to the environment page of your machine.

Before starting a simulation it is very important to double check that it was properly setup. We strongly encourage you to perform a short test before starting a long simulation.

The job you just submitted is the first element of a sequence of jobs. These jobs include the computing job itself, post processing jobs like: rebuild, pack, create_ts, create_se and diagram jobs like monitoring and atlas which are started at given frequencies.

If you recompile the modele during a simulation, the new executable will be used in the next period of the running job.


2.2. Status of the running simulation

2.2.1. run.card during the simulation

A run.card file is created as soon as your simulation starts. It contains information about your simulation, in particular the PeriodState parameter which is:

2.2.2. Execution directory

2.2.3. Accounting mail

You receive a mail « Simulation Accounting » that indicates the simulation starts fine, how many !periodNb you can use to be efficient and how many computing hours the simulation will consume. For example :

Dear Jessica,

this mail will be sent once for the simulation CURFEV13 you recently submitted

The whole simulation will consume around 10074.036500 hours. To be compared with your project allocation.

The recommended PeriodNb for a 24 hours job seems to be around 38.117600. To be compare with the current setting (Job_CURFEV13 parameter) : PeriodNb=30

Greetings!

2.3. End of the simulation

2.3.1. messages received

Example of message for a successfully completed simulation

From : no-reply.tgcc@cea.fr 
Object : CURFEV13 completed

Dear Jessica,

 Simulation CURFEV13 is completed on supercomputer curie3820.
 Job started : 20000101
 Job ended   : 20001231
 Output files are available in /ccc/store/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13
 Files to be rebuild are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13/REBUILD
 Pre-packed files are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13
 Script files, Script Outputs and Debug files (if necessary) are available in /ccc/work/.../modipsl/config/IPSLCM5_v5/CURFEV13

Example of message when the simulation failed

From : no-reply.tgcc@cea.fr 
Object : CURFEV13 failed

Dear Jessica,

 Simulation CURFEV13 is failed on supercomputer curie3424.
 Job started : 20000101
 Job ended   : 20001231
 Output files are available in /ccc/store/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13
 Files to be rebuild are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13/REBUILD
 Pre-packed files are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13
 Script files, Script Outputs and Debug files (if necessary) are available in /ccc/work/.../modipsl/config/IPSLCM5_v5/CURFEV13

2.3.2. run.card at the end of a simulation

At the end of your simulation, the PeriodState parameter of the run.card files indicates if the simulation has been completed or was aborted due to a Fatal error.
This files contains the following sections :

If the run.card file indicates a problem at the end of the simulation, you can check your Script_Output file for more details. See more details here.

2.3.3. Script_Output_JobName

A Script_Output_JobName file is created for each job executed. It contains the simulation job output log (list of the executed scripts, management of the I/O scripts).
This file contains mainly three parts :

These three parts are defined as below :

#######################################
#       ANOTHER GREAT SIMULATION      #
#######################################

 1st part (copying and handling of the input and parameter files)

#######################################
#      DIR BEFORE RUN EXECUTION       #
#######################################

 2nd part (running the model)

#######################################
#       DIR AFTER RUN EXECUTION       #
#######################################

 3rd part (copying of outputs files and launching of post processing steps (rebuild and pack))

2.3.4. The output files

The output files are stored on file servers. Their name follows a standardized nomenclature: IGCM_OUT/TagName/[SpaceName]/[ExperimentName]/JobName/ in different subdirectories for each "Output" and "Analyse" component (e.g. ATM/Output, ATM/Analyse), DEBUG, RESTART, ATLAS and MONITORING.

Prior to the packs execution, this directory structure is stored

After the packs execution (see diagram below), this tree is stored

2.3.4.1. Here is the storage directory structure of the output files produced at TGCC

2.3.4.2. Here is the storage directory structure of the output files produced at IDRIS

2.3.5. Debug/ directory

A Debug/ directory is created if the simulation crashed. This directory contains text files from each of the model components to help you finding reasons for the crash. See also the chapter on monitoring and debugging.

### How to continue or restart a simulation ?###

  1. If you want to continue an existing and finished simulation, change the simulation end date in the config.card file. Do not change the simulation start date.
  2. In the run.card file you must:
  3. You must change the output file number in your job to make sure that the job doesn't fail by trying to replace an existing Script_Output file. By default it is Script_Output_NomJob_.0001 but you can replace it by Script_Output_NomJob_.CumulPeriod (you will find CumulPeriod in run.card)
  4. If your simulation has stopped in the middle of a month and you want to restart it, you must delete the files created during this month (pack period) in your archives ($CCCSTOREDIR/IGCM_OUT/etc...). You can use the scripts `modipsl/libIGCM/clean_month.job` and `modipsl/libIGCM/clean_year.job`.
     cd $SUBMIT_DIR (ie modipsl/config/LMDZOR_v5/DIADEME)
     cp ../../../libIGCM/clean_month.job . ; chmod 755 clean_month.job  # Once and for all
     ./clean_month.job   #   Answer to the questions
    
    same for clean_year.job
    
    ccc_msub Job_EXP00 or llsubmit Job_EXP00
    

3. Simulation - Post processing and diagram part

3.1. Post processing in config.card

You must specify in config.card the kind and frequency of the post processing.

#========================================================================
#D-- Post -
[Post]
#D- Do we rebuild parallel output, this flag determines
#D- frequency of rebuild submission (use NONE for DRYRUN=3)
RebuildFrequency=1Y
#D- frequency of pack post-treatment : DEBUG, RESTART, Output
PackFrequency=1Y
#D- Do we rebuild parallel output from archive (use NONE to use SCRATCHDIR as buffer)
RebuildFromArchive=NONE
#D- If you want to produce time series, this flag determines
#D- frequency of post-processing submission (NONE if you don't want)
TimeSeriesFrequency=10Y
#D- If you want to produce seasonal average, this flag determines
#D- the period of this average (NONE if you don't want)
SeasonalFrequency=10Y
#D- Offset for seasonal average first start dates ; same unit as SeasonalFrequency
#D- Usefull if you do not want to consider the first X simulation's years
SeasonalFrequencyOffset=0
#========================================================================

If no post processing is desired you must specify NONE for the TimeSeriesFrequency and SeasonalFrequency frequencies.

3.2. Rebuild

Note: if JobType=DEV, the RebuildFrequency parameter is forced to be the PeriodLength value and one rebuild job per simulated period is started. Discouraged for long simulations.

3.3. Concatenation of "PACK" outputs

The model outputs are concatenated before being stored on archive servers. The concatenation frequency is set by the PackFrequency parameter. If this parameter is not set the rebuild frequency RebuildFrequency is used.
This packing step is performed by the PACKRESTART, PACKDEBUG(started by the main job) and PACKOUTPUT (started by the Rebuild job) jobs.

3.3.1. How are the different kinds of output files treated ?

All files listed below are archived or concatenated at the same frequency (PackFrequency)

3.4. Time Series

A Time Series is a file which contains a single variable over the whole simulation period (!ChunckJob2D = NONE) or for a shorter period for 2D (!ChunckJob2D = 100Y) or 3D (!ChunckJob3D = 50Y) variables.

Example for lmdz :

45  [OutputFiles]
46  List=   (histmth.nc,      ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc,      Post_1M_histmth), \
...
53  [Post_1M_histmth]
54  Patches= (Patch_20091030_histcom_time_axis)
55  GatherWithInternal = (lon, lat, presnivs, time_counter, aire)
56  TimeSeriesVars2D = (bils, cldh, ... )
57  ChunckJob2D = NONE
58  TimeSeriesVars3D = ()
59  ChunckJob3D = NONE

The Time Series coming from monthly (or daily) output files are stored on the archive server in the IGCM_OUT/TagName/[SpaceName]/[ExperimentName]/JobName/Composante/Analyse/TS_MO and TS_DA directories.

You can add or remove variables to the TimeSeries lists according to your needs.

There are as many time series jobs as there are !ChunckJob3D values. This can result in a number of create_ts jobs (automatically started by the computing sequence).

3.5. Monitoring and intermonitoring

The monitoring is a web-interface tool that visualizes the global mean over time for a set up of key variables. Access the monitoring using the address for dods at your machine ending with yourlogin/TagName/SpaceName/JobName. If you have a new account, you might need to contact the assistant team at the computer center to activate your write access to dods.

The key variables plotted in the monitoring are computed using Time Series values. The monitoring is updated at the TimeSerieFrequency set in config.card if the time series were successfully done. This allows you to monitor a simulation and to check the status during a ongoing simulation and afterwards. By monitoring your simulations you can detect anomalies and evaluate the impact of changes you have made. We suggest to create a tab in your browser allowing you to frequently monitor your simulation. If a few key variables start looking suspicious you might want to stop your simulation. By doing so, you will save computing time. A full documentation is available at http://wiki.ipsl.jussieu.fr/IGCMG/Outils/ferret/Monitoring.

Here is an example for the IPSLCM5A coupled model and a 10-year period. The first tab called Analysis Cards gives a summary of dates and execution times obtained from the config.card and run.card files. The second tab called Monitoring Board presents a monitoring table for the key variables (selecting one or more model components is optional).

3.5.1. Adding a variable to the monitoring

You can add or change the variables to be monitored by editing the configuration files of the monitoring. Those files are defined by default for each component.

The monitoring is defined here: ~compte_commun/atlas For example for LMDZ : monitoring01_lmdz_LMD9695.cfg

You can change the monitoring by creating a POST directory which is part of your configuration. Copy a .cfg file and change it the way you want. You will find two examples in special post processing

Be careful : to calculate a variable from two variables you must define it within parenthesis :

#-----------------------------------------------------------------------------------------------------------------
#  field | files patterns | files additionnal | operations | title | units | calcul of area
#-----------------------------------------------------------------------------------------------------------------
 nettop_global | "tops topl"                  | LMDZ4.0_9695_grid.nc | "(tops[d=1]-topl[d=2])" | "TOA. total heat flux (GLOBAL)"         | "W/m^2"     | "aire[d=3]" 

3.5.2. Inter Monitoring

3.5.3. Mini how to use the intermonitoring

Go to http://webservices.ipsl.fr/monitoring-1.20

3.5.4. How to add save the intermonitoring permanently

The plots done by the intermonitoring will be kept 15 days. During these days you can visualize using the same link the plots done. To keep them permanently, do as follow:

3.6. Seasonal means

3.7. Atlas

3.8. Storing files like ATLAS, MONITORING and ANALYSE

The files produced by ATLAS, MONITORING, time series and seasonal means are stored in the directories:

They are available through dods server at IDRIS and at TGCC.

3.9. How to check that the post processing was successful

The post processing output log files are :

In these directories, you find the job output files: rebuild, pack*, ts, se, atlas, monitoring .

The scripts to transfer data on dods are run at the end of the monitoring job or at the end of each atlas job.