In this chapter you will learn about how to start a simulation and how to use the IPSL models and tools, from the beginning of the simulation to the post processing and basic visualization of the outputs.
The main computing job automatically runs post processing jobs (at different frequencies) during the simulation. Here is a diagram describing the job sequence:
Once you have defined and setup your simulation you can submit it. The run commands are:
irene > ccc_msub Job_MYJOBNAME ada > llsubmit Job_MYJOBNAME
These commands return a job number that can be used with the machine specificities to manage your job. Please refer to the environment page of your machine.
Before starting a simulation it is very important to double check that it was properly setup. We strongly encourage you to perform a short test before starting a long simulation.
The job you just submitted is the first element of a sequence of jobs. These jobs include the computing job itself, post processing jobs like: rebuild, pack, create_ts, create_se and visualization jobs like monitoring and atlas which are started at given frequencies.
If you recompile the modele during a simulation, the new executable will be used in the next period of the running job.
A run.card file is created as soon as your simulation starts. It contains information about your simulation, in particular the PeriodState parameter which is:
You receive a mail « Simulation Accounting » that indicates the simulation starts fine, how many PeriodNb you can use to be efficient and how many computing hours the simulation will consume. For example :
Dear Jessica, this mail will be sent once for the simulation CURFEV13 you recently submitted The whole simulation will consume around 10074.036500 hours. To be compared with your project allocation. The recommended PeriodNb for a 24 hours job seems to be around 38.117600. To be compare with the current setting (Job_CURFEV13 parameter) : PeriodNb=30 Greetings!
From : no-reply.tgcc@cea.fr Object : CURFEV13 completed Dear Jessica, Simulation CURFEV13 is completed on supercomputer curie3820. Job started : 20000101 Job ended : 20001231 Output files are available in /ccc/store/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13 Files to be rebuild are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13/REBUILD Pre-packed files are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13 Script files, Script Outputs and Debug files (if necessary) are available in /ccc/work/.../modipsl/config/IPSLCM5_v5/CURFEV13
From : no-reply.tgcc@cea.fr Object : CURFEV13 failed Dear Jessica, Simulation CURFEV13 is failed on supercomputer curie3424. Job started : 20000101 Job ended : 20001231 Output files are available in /ccc/store/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13 Files to be rebuild are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13/REBUILD Pre-packed files are temporarily available in /ccc/scratch/.../IGCM_OUT/IPSLCM5A/DEVT/pdControl/CURFEV13 Script files, Script Outputs and Debug files (if necessary) are available in /ccc/work/.../modipsl/config/IPSLCM5_v5/CURFEV13
At the end of your simulation, the PeriodState parameter of the run.card files indicates if the simulation has been completed or was aborted due to a Fatal error.
This file contains the following sections :
[Configuration] #lastPREFIX OldPrefix= # ---> Prefix of the last created files during the simulation = JobName + date of the last period. Used for the Restart #Warning : OldPrefix not used anymore from libIGCM_v2.5. #Compute date of loop PeriodDateBegin= # --->start date of the next period to be simulated PeriodDateEnd= # ---> end date of the next period to be simulated CumulPeriod= # ---> number of already simulated periods # State of Job "Start", "Running", "OnQueue", "Completed" PeriodState="Completed" SubmitPath= # ---> Submission directory
[PostProcessing] TimeSeriesRunning=n # ---> indicates if the timeSeries are running TimeSeriesCompleted=20091231 # ---> indicates the date of the last TimeSerie produced by the post processing
[Log] # Executables Size LastExeSize=() #--------------------------------- # CumulPeriod | PeriodDateBegin | PeriodDateEnd | RunDateBegin | RunDateEnd | RealCpuTime | UserCpuTime | SysCpuTime | ExeDate # 1 | 20000101 | 20000131 | 2013-02-15T16:14:15 | 2013-02-15T16:27:34 | 798.33000 | 0.37000 | 3.05000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 # 2 | 20000201 | 20000228 | 2013-02-15T16:27:46 | 2013-02-15T16:39:44 | 718.16000 | 0.36000 | 3.39000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43
If the run.card file indicates a problem at the end of the simulation, you can check your Script_Output file for more details. See more details here.
A Script_Output_JobName file is created for each job executed. It contains the simulation job output log (list of the executed scripts, management of the I/O scripts).
This file contains mainly three parts :
These three parts are defined as below :
####################################### # ANOTHER GREAT SIMULATION # ####################################### 1st part (copying and handling of the input and parameter files) ####################################### # DIR BEFORE RUN EXECUTION # ####################################### 2nd part (running the model) ####################################### # DIR AFTER RUN EXECUTION # ####################################### 3rd part (copying of outputs files and launching of post processing steps (rebuild and pack))
The output files are stored on file servers. Their name follows a standardized nomenclature: IGCM_OUT/TagName/[SpaceName]/[ExperimentName]/JobName/ in different subdirectories for each "Output" and "Analyse" component (e.g. ATM/Output, ATM/Analyse), DEBUG, RESTART, ATLAS and MONITORING.
Prior to the packs execution, this directory structure is stored
After the packs execution (see diagram below), this tree is stored
A Debug/ directory is created if the simulation crashed. This directory contains text files from each of the model components to help you finding reasons for the crash. See also the chapter on monitoring and debugging.
cd $SUBMIT_DIR (ie modipsl/config/LMDZOR_v5/DIADEME) cp ../../../libIGCM/clean_PeriodLength.job . ; chmod 755 clean_PeriodLength.job # Once and for all ./clean_PeriodLength.job # Answer to the questions same for clean_latestPackperiod.job ccc_msub Job_EXP00 or llsubmit Job_EXP00
You must specify in config.card the kind and frequency of the post processing.
#======================================================================== #D-- Post - [Post] #D- Do we rebuild parallel output, this flag determines #D- frequency of rebuild submission (use NONE for DRYRUN=3) RebuildFrequency=NONE #D- frequency of pack post-treatment : DEBUG, RESTART, Output PackFrequency=1Y #D- If you want to produce time series, this flag determines #D- frequency of post-processing submission (NONE if you don't want) TimeSeriesFrequency=10Y #D- If you want to produce seasonal average, this flag determines #D- the period of this average (NONE if you don't want) SeasonalFrequency=10Y #D- Offset for seasonal average first start dates ; same unit as SeasonalFrequency #D- Useful if you do not want to consider the first X simulation's years SeasonalFrequencyOffset=0 #========================================================================
If no post processing is desired you must specify NONE for the TimeSeriesFrequency and SeasonalFrequency frequencies.
For almost all configurations, the rebuild phase is not needed. Single files are directly written in parallel by XIOS
The model outputs are concatenated before being stored on archive servers. The concatenation frequency is set by the PackFrequency parameter (NONE means no concatenation, not recommended). If this parameter is not set the rebuild frequency RebuildFrequency is used. This packing step is performed by the PACKRESTART, PACKDEBUG(started by the main job) and PACKOUTPUT (started by the rebuild job or the main job) jobs.
All files listed below are archived or concatenated at the same frequency (PackFrequency)
A Time Series is a file which contains a single variable over the whole simulation period (ChunckJob2D = NONE) or for a shorter period for 2D (ChunckJob2D = 100Y) or 3D (ChunckJob3D = 50Y) variables.
Example for lmdz :
45 [OutputFiles] 46 List= (histmth.nc, ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc, Post_1M_histmth), \ ... 53 [Post_1M_histmth] 54 Patches= () 55 GatherWithInternal = (lon, lat, presnivs, time_counter, aire) 56 TimeSeriesVars2D = (bils, cldh, ... ) 57 ChunckJob2D = NONE 58 TimeSeriesVars3D = () 59 ChunckJob3D = NONE
The Time Series coming from monthly (or daily) output files are stored on the archive server in the IGCM_OUT/TagName/[SpaceName]/[ExperimentName]/JobName/Composante/Analyse/TS_MO and TS_DA directories.
You can add or remove variables to the TimeSeries lists according to your needs.
There are as many time series jobs as there are ChunckJob3D values. This can result in a number of create_ts jobs (automatically started by the computing sequence).
The monitoring is a web-interface tool that visualizes the global mean over time for a set up of key variables. Access the monitoring using the esgf/thredds address for your machine ending with yourlogin/TagName/SpaceName/JobName/MONITORING. If you have a new account, you might need to contact the assistant team at the computer center to activate your write access to esgf/thredds.
The key variables plotted in the monitoring are computed using Time Series values. The monitoring is updated at the TimeSerieFrequency set in config.card if the time series were successfully done. This allows you to monitor a simulation. By monitoring your simulations you can detect anomalies and evaluate the impact of changes you have made. We suggest to create a tab in your browser allowing you to frequently monitor your simulation. If a few key variables start looking suspicious you might want to stop your simulation. By doing so, you will save computing time. A full documentation is available at http://wiki.ipsl.jussieu.fr/IGCMG/Outils/ferret/Monitoring.
Here is an example for the IPSLCM5A coupled model and a 10-year period. Once you are in yourlogin/TagName/SpaceName/JobName/MONITORING, you have to click on index.html. The first tab called Analysis Cards gives a summary of dates and execution times obtained from the config.card and run.card files. The second tab called Monitoring Board presents a monitoring table for the key variables (selecting one or more model components is optional).
You can add or change the variables to be monitored by editing the configuration files of the monitoring. Those files are defined by default for each component.
By default, the monitoring is defined here: ~compte_commun/atlas For example for LMDZ : monitoring01_lmdz_LMD9695.cfg
You can change the monitoring by creating a POST directory which is part of your configuration. Copy a .cfg file and change it the way you want. You will find two examples in special post processing
Be careful : to calculate a variable from two variables you must define it within parenthesis :
#----------------------------------------------------------------------------------------------------------------- # field | files patterns | files additionnal | operations | title | units | calcul of area #----------------------------------------------------------------------------------------------------------------- nettop_global | "tops topl" | "" | "(tops[d=1]-topl[d=2])" | "TOA. total heat flux (GLOBAL)" | "W/m^2" | "aire[d=1]" tops_global | "tops" | "" | "tops[d=1]" | "tops Global " | "W/m²" | "aire[d=1]"
#----------------------------------------------------------------------------------------------------------------- # field | files patterns | files additionnal | operations | title | units | calcul of area #----------------------------------------------------------------------------------------------------------------- nettop_global | "tops topl" | LMDZ4.0_9695_grid.nc | "(tops[d=1]-topl[d=2])" | "TOA. total heat flux (GLOBAL)" | "W/m^2" | "aire[d=3]"
FreqTS=DA #----------------------------------------------------------------------------------------------------------------- # field | files patterns | files additionnal | operations | title | units | calcul of area #----------------------------------------------------------------------------------------------------------------- nettop_global | "tops topl" | LMDZ4.0_9695_grid.nc | "(tops[d=1]-topl[d=2])" | "TOA. total heat flux (GLOBAL)" | "W/m^2" | "aire[d=3]"
Go to http://webservices.ipsl.fr/interMonitoring
The plots done by the intermonitoring will be kept 30 days. During these days you can visualize using the same link the plots done. To keep them permanently, do as follow:
# # IPSL (webservices) # # IDRIS (jeanzay) #. /home/users/brock/.atlas_env_asterix_bash # LSCE (asterix) # # IPSL (ciclad) # # TGCC (irene) or add for ciclad: . /home/brocksce/.atlas_env_ciclad1_bash
scriptname=./intermonit_CM6.jnl
# for ciclad cp -fR ${scriptname%%.jnl}_prod /prodigfs/ipslfs/http/dods/web/html/login/INTERMONITORING # or for curie: cp -fR ${scriptname%%.jnl}_prod /ccc/work/cont003/thredds/public/login/INTERMONITORING
use "/ccc/work/cont003/thredds/public/login/TagName/SpaceName/ExperimentName/JobName/MONITORING/files/($FILE)"
http://vesg.ipsl.upmc.fr/thredds/fileServer/work/login/INTERMONITORING/XXXXX_prod/index.html
The files produced by ATLAS, MONITORING, Time series and Seasonal means are stored in the following directories:
They are available through esgf/thredds server at IDRIS and at TGCC.
Since 2016, hermes.ipsl.upmc.fr is recommended to check and follow your simulations (computing and post-processing jobs). More information here. Besides, the post processing output log files are :
In these directories, you find the job output files for following jobs: rebuild, pack*, ts, se, atlas, monitoring .
Scripts to transfer data on esgf/thredds are run at the end of the monitoring job or at the end of each atlas job.