wiki:Doc/FAQ

Version 1 (modified by trac, 10 years ago) (diff)

--

Frequently Asked Questions


Frequently (and not so frequently) Asked Questions

Table of contents

  1. 1. FAQ : Setting up and performing a simulation
    1. 1.1. How do I overwrite an existing simulation?
    2. 1.2. How do I continue or restart a simulation?
    3. 1.3. How do I setup a new experiment?
    4. 1.4. How can I start from another simulation?
    5. 1.5. How do I create the LMDZ histins.nc file?
  2. 2. FAQ : Running the model
    1. 2.1. How do I read the Script_Output file?
    2. 2.2. The LMDZ parallelism and the Bands files
    3. 2.3. How do I define the number of MPI jobs and the number of OpenMP threads?
    4. 2.4. Why does the run.card file contain the keyword Fatal?
    5. 2.5. How do I use a different version of libIGCM?
    6. 2.6. How do I restart a simulation to recover missing output files?
  3. 3. FAQ : Special configurations
    1. 3.1. How do I create the initial conditions for LMDZOR?
    2. 3.2. How do I deactivate STOMATE in IPSLCM5 or in LMDZOR?
    3. 3.3. How do I perform a nudged run?
  4. 4. FAQ : Post processing
    1. 4.1. Where are post processing jobs run?
    2. 4.2. How do I check that the post processing jobs were successful?
    3. 4.3. How do I read/retrieve/use files on dods?
    4. 4.4. How do I add a variable to the Time Series?
    5. 4.5. How do I superimpose monitoring plots?
    6. 4.6. How do I add a plot to the monitoring?
    7. 4.7. How do I calculate seasonal means over 100 years?
  5. 5. FAQ : Unix tricks
    1. 5.1. How to delete a group of files using the find command?
    2. 5.2. Allowing read-access to everybody
  6. 6. FAQ : Miscellaneous
    1. 6.1. How do I use TimeSeries_Checker.job to create files on $STORE when the output files are on DMNFS?
    2. 6.2. How do I restart one simulation month which ran on $DMFDIR when the outputs are stored on $STORE?


1. FAQ : Setting up and performing a simulation

1.1. How do I overwrite an existing simulation?

  1. Delete the run.card file in your experiment directory.

  1. Delete the following directories:
  • at TGCC
    • $CCCSTOREDIR/IGCM_OUT/TagName/(...)/JobName
    • $CCCWORKDIR/IGCM_OUT/TagName/(...)/JobName
  • at IDRIS
    • $HOMEGAYA/IGCM_OUT/TagName/(...)/JobName
    • $WORKDIR/IGCM_OUT/TagName/(...)/JobName
  1. Delete the REBUILD/TagName/JobName directory (if it exists) in $SCRATCHDIR or in $WORKDIR.
  1. Delete the following directory:
  • at TGCC : $SCRATCHDIR/IGCM_OUT/TagName/(...)/JobName
  • at IDRIS, if you have changed the RUN_DIR_PATH variable, you must also delete the $WORKDIR/IGCM_OUT/TagName/(...)/JobName directory.
  1. Restart the job.

1.2. How do I continue or restart a simulation?

See here.

1.3. How do I setup a new experiment?

See here.

1.4. How can I start from another simulation?

See here.

1.5. How do I create the LMDZ histins.nc file?

You have several options. The easiest one is to change the output frequency of one of the existing files. For instance, you can change the output frequency of the histhf.nc file to instantaneous without changing the file name. To do so, keep HF in WriteFrequency in config.card:

[ATM]
#
WriteFrequency="1M 1D HF"

Change the 3rd column of the phys_out_filetimesteps parameter in PARAM/output.def_OutLevel. OutLevel is chosen in lmdz.card and by default OutLevel=low. Specify 1800.s in PARAM/output.def_low if you want the output to be saved every 30 minutes:

phys_out_filetimesteps = 1.mth,  1.day,  1800.s,  0.125day, 0.125day, 1800.s

You can also change phys_out_filelevels in the 3rd column.

2. FAQ : Running the model

2.1. How do I read the Script_Output file?

At the end of each job execution, a corresponding Script_Output file is created.
Important : If your simulation stops you can look for the keyword "IGCM_debug_CallStack" in this file. This word will be preceded by a line giving more details on the problem that occurred.

See can be found here for more details.

2.2. The LMDZ parallelism and the Bands files

See here.

2.3. How do I define the number of MPI jobs and the number of OpenMP threads?

If you run your model in MPI mode only (without OpenMP) the number of MPI processes is defined in config.card by the JobNumProcTot parameter:

#-- Total Number of Processors
JobNumProcTot=32

If you run your model in hybrid mode (MPI-OpenMP), the number of MPI processes and the number of OpenMP threads are set in config.card in the section "Executable". For instance, for LMDZ : 16 MPI processes and 2 OpenMP threads.

ATM= (gcm.e, lmdz.x, 16MPI, 2OMP)

Notice that the job header differs from the one for openMP.

2.4. Why does the run.card file contain the keyword Fatal?

The keyword Fatal indicates that something went wrong in your simulation. Below is a list of the most common reasons:

  • a problem was encountered while copying the input files
  • the frequency settings in config.card are erroneous
  • run.card has not been deleted before resubmitting a simulation, or "OnQueue" has not been specified in run.card when continuing a simulation
  • a problem was encountered during the run
  • the disk quotas have been reached
  • a problem was encountered while copying the output files
  • a post processing job encountered a problem
    • pack_xxx has failed and caused the simulation to abort. In this case, you must find STOP HERE INCLUDING THE COMPUTING JOB located in the appropriate output pack file.
      • rebuild was not completed successfully

See the corresponding chapter about monitoring and debug for further information.

2.5. How do I use a different version of libIGCM?

libIGCM is constantly being updated. We recommend to choose the latest tag of libIGCM. Here is what to do:

  • save the old libIGCM version (just in case)
  • get libIGCM (the symbolic links to the recommended version)
  • reinstall the post processing jobs
  • make sure that there has been no major change in AA_job, otherwise reinstall the main job
    cd modipsl
    mv libIGCM libIGCM_old
    svn checkout http://forge.ipsl.jussieu.fr/libigcm/svn/tags/libIGCM_v2.2 libIGCM
    diff libIGCM/AA_job libIGCM_old/AA_job
    util/ins_job
    

In case you need version X of the trunk of libIGCM, change the "svn checkout" line into:

svn checkout -r X http://forge.ipsl.jussieu.fr/libigcm/svn/trunk/libIGCM libIGCM

If AA_job has been modified, you must :

  • move to the experiment directory,
  • delete or move old jobs
  • rerun the new jobs using ins_job. MYCONFIG could be IPSLCM5_v5 or ORCHIDEE_OL, for example:
    cd ...../config/MYCONFIG/MYEXP
    mv Job_MYEXP OLDJOB                                              # save the old job
    ../../../util/ins_job
    # modifier Job_MYEXP : NbPeriod, memory,... as it was done in OLDJOB
    

2.6. How do I restart a simulation to recover missing output files?

TO BE VALIDATED (2/21/2013)

This method shows how to rerun a complete simulation period in a different directory (REDO instead of DEVT/PROD).

Example : To rerun v3.historicalAnt1 to recompute a whole year (e.g. 1964) you must :

  • On the file server (CCCSTOREDIR), create the necessary RESTART file and the Bands file.
  • On the scratch disk ($SCRATCHDIR/IGCM_OUT), create the mesh_mask file
    ## Directory
    mkdir $CCCSTOREDIR/....IGCM_OUT/IPSLCM5A/REDO/historicalAnt/v3.historicalAnt1REDO
    cd $CCCSTOREDIR/....IGCM_OUT/IPSLCM5A/REDO/historicalAnt/v3.historicalAnt1REDO
    # RESTART
    mkdir -p RESTART ; cd RESTART
    ln -s ../../../PROD/historicalAnt/v3.historicalAnt1/RESTART/v3.historicalAnt1_19640831_restart.nc v3.historicalAnt1REDO_19640831_restart.nc
    # Bands
    mkdir -p ATM/Debug
    cd ATM/Debug
    ln -s ../../../../../PROD/historicalAnt/v3.historicalAnt1/ATM/Debug/v3.historicalAnt1_Bands_96x95x39_3prc.dat_3 v3.historicalAnt1REDO_Bands_96x95x39_3prc.dat_3
    
    mkdir $SCRATCHDIR/....IGCM_OUT/IPSLCM5A/REDO/historicalAnt/v3.historicalAnt1REDO
    cd $SCRATCHDIR/....IGCM_OUT/IPSLCM5A/REDO/historicalAnt/v3.historicalAnt1REDO
    # mesh_mask
    mkdir -p OCE/Output
    cd OCE/Output
    ln -s ../../../../../PROD/historicalAnt/v3.historicalAnt1/OCE/Output/v3.historicalAnt1_mesh_mask.nc v3.historicalAnt1REDO_mesh_mask.nc
    cd ../..
    
  • On the computing machine:
    • create a new directory
      cp -pr  v3.historicalAnt1 v3.historicalAnt1REDO
      
    • in this new directory, change the run.card file and set the following parameters to:
      OldPrefix= v3.historicalAnt1_19631231
      PeriodDateBegin= 1964-01-01
      PeriodDateEnd= 1964-01-31
      CumulPeriod= xxx # Specify the proper "cad" value, i.e. the same month in the run.card cookie (ARGENT)
      PeriodState= OnQueue
      
    • change the config.card file to one pack period (1 year), do not do any post processing, start rebuild month by month and specify PackFrequency.
      JobName=v3.historicalAnt1
      ...
      SpaceName=REDO
      ...
      DateEnd= 1964-12-31
      ...
      RebuildFrequency=1M
      PackFrequency=1Y
      ...
      TimeSeriesFrequency=NONE
      ...
      SeasonalFrequency=NONE
      
    • restart the simulation :
      vi run.card # check one more time
      vi Job_v3.historicalAnt1 # check the time parameters and names of the output scripts
      qsub Job_v3.historicalAnt1 
      
  • once the job is finished, check that the solver.stat files are identical. The solver.stat files are stored in DEBUG :
    sdiff  OCE/Debug/v3.historicalAnt1REDO_19640901_19640930_solver.stat /dmnfs11/cont003/p86maf/IGCM_OUT/IPSLCM5A/PROD/historicalAnt/v3.historicalAnt1/OCE/Debug/v3.historicalAnt1_19640901_19640930_solver.stat
    



3. FAQ : Special configurations

3.1. How do I create the initial conditions for LMDZOR?

For a few configurations such as LMDZOR and LMDZREPR, you must create initial and boundary conditions in advance. This is not necessary for coupled configurations such as IPSLCM5_v5.

For more information, see this chapter.

3.2. How do I deactivate STOMATE in IPSLCM5 or in LMDZOR?

The IPSLCM5 model has not been evaluated for these cases.

Here is how to do it.

3.3. How do I perform a nudged run?

This paragraph describes how to perform a nudged run for configurations that include LMDZ. To do so, you have to:

  • activate option ok_guide in the lmdz.card file (this option enables you to activate the corresponding flag_ in PARAM/guide.def)
  • check that the wind fields specified are contained in BoundaryFiles.

For example:

[BoundaryFiles]
List= ....\
      (/dmnfs/p24data/ECMWF96x72/AN${year}/u_ecmwf_${year}${month}.nc, u.nc)\
      (/dmnfs/p24data/ECMWF96x72/AN${year}/v_ecmwf_${year}${month}.nc, v.nc)\
  • choose the proper dates in config.card (pay attention to leap years)

4. FAQ : Post processing

4.1. Where are post processing jobs run?

libIGCM allows you to perform post processing jobs on the same machine as the main job. You can also start post processing jobs on other machines dedicated particularly to post processing. It is not done anymore.

Currently used machines:

Center Computing machine Post processing
CCRT Titane Titane, queue mono
TGCC Curie Curie, large node, -q xlarge
IDRIS Ada Ada (ongoing)

4.2. How do I check that the post processing jobs were successful?

See here.

4.3. How do I read/retrieve/use files on dods?

  • At IDRIS, visit http://dodsp.idris.fr and select your login, your configuration, your simulation and the ATM component (then the Output or Analyse subdirectory) as well as ATLAS or MONITORING.
  • At CCRT, visit the following website:
  • Once you found a netcdf file (suffix .nc), you can download it by clicking on it or you can analyze it with the DODS functions. To do so, add cgi-bin/nph-dods to the address right after www. For example:
     ciclad : ferret ...
     > use "http://dods.extra.cea.fr/cgi-bin/nph-dods/data/mon_login/..."
     > use "http://dodsp.idris.fr/cgi-bin/nph-dods/mon_login/..."
    

More information can be found here: http://dods.ipsl.jussieu.fr

4.4. How do I add a variable to the Time Series?

See this section.

4.5. How do I superimpose monitoring plots?

Audio
Memo :

To select simulations from two centers or for two different logins, you must go back to step 1 and click on append directories to add new simulations.

4.6. How do I add a plot to the monitoring?

The answer to this question is here.

4.7. How do I calculate seasonal means over 100 years?

This feature is available with libIGCM_v1_10 since 12/13/2010.

In order to compute a seasonal mean over 100 years, check that all decades are on the file server (SE_checker). Then run the job create_multi_se on the post processing machine.

Note that an atlas for these 100 years will also be created. See the example for the 100-year atlas for piControl2 here : SE 2000 2099

  1. If not done yet, create a specific post processing directory. See the chapter on how to run or restart post processing jobs for details.
  2. Copy create_se.job, SE_checker.job and create_multi_se.job
  3. Check/change the following variables in create_se.job:
    libIGCM=${libIGCM:=.../POST_CMIP5/libIGCM_v1_10/modipsl/libIGCM}
    
  4. Check that all decades exist.
  5. Check/change the variables in SE_checker.job:
    libIGCM=${libIGCM:=.../POST_CMIP5/libIGCM_v1_10/modipsl/libIGCM} 
    SpaceName=${SpaceName:=PROD}
    ExperimentName=${ExperimentName:=piControl}
    JobName=${JobName:=piControlMR1}
    CARD_DIR=${CARD_DIR:=${CURRENT_DIR}}
    
  6. Start the ./SE_checker.job in interactive mode. All needed jobs create_se.job will be started. For example:
     ./SE_Checker.job
    
    ====================================================
    Where do we run ? cesium21
    Linux cesium21 2.6.18-194.11.4.el5 #1 SMP Tue Sep 21 05:04:09 EDT 2010 x86_64
    ====================================================
    
    sys source cesium Intel X-64 lib.
    
    --Debug1--> DefineVariableFromOption : config_UserChoices
    --------------Debug3--> config_UserChoices_JobName=piControlMR1
    --------------Debug3--> config_UserChoices_CalendarType=noleap
    --------------Debug3--> config_UserChoices_DateBegin=1800-01-01
    --------------Debug3--> config_UserChoices_DateEnd=2099-12-31
    
    --Debug1--> DateBegin/End for SE : 1800_1809
    --Debug1--> ATM
    --Debug1--> SRF
    --Debug1--> SBG
    --Debug1--> OCE
    --Debug1--> ICE
    --Debug1--> MBG
    --Debug1--> CPL
    ...
    --Debug1--> DateBegin/End for SE : 2030_2039
    --Debug1--> ATM
    --Debug1--> 2 file(s) missing for ATM :
    --Debug1--> piControlMR1_SE_2030_2039_1M_histmth.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_histmthNMC.nc
    --Debug1--> SRF
    --Debug1--> 1 file(s) missing for SRF :
    --Debug1--> piControlMR1_SE_2030_2039_1M_sechiba_history.nc
    --Debug1--> SBG
    --Debug1--> 2 file(s) missing for SBG :
    --Debug1--> piControlMR1_SE_2030_2039_1M_stomate_history.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_stomate_ipcc_history.nc
    --Debug1--> OCE
    --Debug1--> 4 file(s) missing for OCE :
    --Debug1--> piControlMR1_SE_2030_2039_1M_grid_T.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_grid_U.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_grid_V.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_grid_W.nc
    --Debug1--> ICE
    --Debug1--> 1 file(s) missing for ICE :
    --Debug1--> piControlMR1_SE_2030_2039_1M_icemod.nc
    --Debug1--> MBG
    --Debug1--> 3 file(s) missing for MBG :
    --Debug1--> piControlMR1_SE_2030_2039_1M_ptrc_T.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_diad_T.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_dbio_T.nc
    --Debug1--> CPL
    --Debug1--> 2 file(s) missing for CPL :
    --Debug1--> piControlMR1_SE_2030_2039_1M_cpl_atm.nc
    --Debug1--> piControlMR1_SE_2030_2039_1M_cpl_oce.nc
    --------Debug2--> Submit create_se  for period 2030-2039
    IGCM_sys_MkdirWork : .../POST_CMIP5/piControl/piControlMR1/OutScript
    IGCM_sys_QsubPost : create_se
    Submitted Batch Session 179472
    ...
    
  7. Wait for the create_se jobs to be completed
  8. Copy create_multi_se.job
  9. Check/change the variables :
    libIGCM=${libIGCM:=.../POST_CMIP5/libIGCM_v1_10/modipsl/libIGCM}
    
  10. If needed, adjust the number of decades in config.card: default=50Y (i.e. 50 years). Add the following line to the POST section, i.e. at the end after the keyword [POST]
    MultiSeasonalFrequency=100Y
    
  11. Run the create_multi_se.job job:ccc_msub create_multi_se.job
  12. The years used for the calculations are those between DateEnd (set in config.card in the local directory) and DateEnd - MultiSeasonalFrequency.

The mean values are stored in the "Analyse" directories of each model component in the subdirectory SE_100Y (e.g. ATM/Analyse/SE_100Y).

5. FAQ : Unix tricks

5.1. How to delete a group of files using the find command?

We recommend to also read the find manual.

Examples :

  • command recursively deleting all files in a directory containing DEMO in their name:
    find . -name '*DEMO*' -exec rm -f {} \;
    
  • command recursively deleting all files in a directory containing DEMO, TEST or ENCORE in their name:
    find . \( -name "*DEMO*" -o -name "*TEST*"  -o -name "*ENCORE*" \) -print -exec rm -f {} \;
    
  • command recursively computing the number of files in the current directory:
    find . -type f | wc -l
    

5.2. Allowing read-access to everybody

The chmod -R ugo+rX * command gives access to everybody to all files and subdirectories in the current directory.

6. FAQ : Miscellaneous

6.1. How do I use TimeSeries_Checker.job to create files on $STORE when the output files are on DMNFS?

  • You need libIGCM v1_12 to use DMNFS as input
  • Change !TimeSeries_Checker.job to use STORE
  • Change create_ts.job to use STORE

6.1.1. Example for the rcp45 simulation


cd modipsl
mv libIGCM libIGCM.old
svn checkout http://forge.ipsl.jussieu.fr/libigcm/svn/tags/libIGCM_v1_12 libIGCM
modipsl/ins_job

TimeSeries_Checker.job révision 658

line 169
#R_SAVE=${R_OUT}/${config_UserChoices_TagName}/${config_UserChoices_SpaceName}/${config_UserChoices_ExperimentName}/${config_Ensemble_EnsembleName}/${config_Ensemble_EnsembleDate}/${FreeName}

R_SAVE=${CCCSTOREDIR}/IGCM_OUT/${config_UserChoices_TagName}/${config_UserChoices_SpaceName}/${config_UserChoices_ExperimentName}/${FreeName}

create_ts.job révision 316

line 300
#DIRECTORY=${R_SAVE}/${comp}/Analyse/${TS_Dir}
DIRECTORY=${CCCSTOREDIR}/IGCM_OUT/IPSLCM5A/PROD/rcp45/v3.rcp45.strat/${comp}/Analyse/${TS_Dir}

line 768
#eval IGCM_sys_Put_Out ${file_out} \${R_OUT_${comp}}/Analyse/${TS_Dir}/${file_out}
IGCM_sys_Put_Out ${file_out} ${CCCSTOREDIR}/IGCM_OUT/IPSLCM5A/PROD/rcp45/v3.rcp45.strat/${comp}/Analyse/${TS_Dir}/${file_out}

line 780
#eval IGCM_sys_Put_Out ${file_out_YE} \${R_OUT_${comp}}/Analyse/TS_MO_YE/${file_out_YE}
IGCM_sys_Put_Out ${file_out_YE} ${CCCSTOREDIR}/IGCM_OUT/IPSLCM5A/PROD/rcp45/v3.rcp45.strat/${comp}/Analyse/TS_MO_YE/${file_out_YE}

6.2. How do I restart one simulation month which ran on $DMFDIR when the outputs are stored on $STORE?

Example: the past1000 simulation

login on titane: ssh titane.ccc.cea.fr

1455-12 to be reran on STORE, original on dmnfs
1477-12 to be reran on STORE, original on dmnfs
1517-11 to be reran on  STORE, original on dmnfs

cd $DMFDIR/IGCM_OUT/IPSLCM5A/TEST/PD_TEST

dmget LMCMP5/???/Restart/*14551130* LMCMP5/???/Restart/*14771130* LMCMP5/???/Restart/*15171030* LMCMP5/OCE/Output/LMCMP5_mesh_mask.nc LMCMP5/ATM/Debug/LMCMP5_Bands_96x95x39_26prc.dat_*

tar cvf $SCRATCHDIR/IGCM_OUT/IPSLCM5A/TEST/PD_TEST/RESTART.REDO.past1000.tar LMCMP5/???/Restart/*14551130* LMCMP5/???/Restart/*14771130* LMCMP5/???/Restart/*15171030* LMCMP5/OCE/Output/LMCMP5_mesh_mask.nc LMCMP5/ATM/Debug/LMCMP5_Bands_96x95x39_26prc.dat_*

cd $SCRATCHDIR/IGCM_OUT/IPSLCM5A/TEST/PD_TEST

tar xvf RESTART.REDO.past1000.tar

cd /work/cont003/p25khod/IPSLCM5A/modipsl/config/IPSLCM5A/LMCMP5_newlibIGCM.REDO

==> edit run.card

==> PackFrequency=NONE in config.card

==> qsub

#========================================================================
#D-- Post -
[Post]
#D- Do we rebuild parallel output, this flag determines
#D- frequency of rebuild submission (use NONE for DRYRUN=3)
RebuildFrequency=1Y
#D- Do we rebuild parallel output from archive (use NONE to use SCRATCHDIR as buffer)
RebuildFromArchive=NONE
# Pas de PACK
PackFrequency=NONE
#D- If you want to produce time series, this flag determines
#D- frequency of post-processing submission (NONE if you don't want)
TimeSeriesFrequency=NONE
#D- If you want to produce seasonal average, this flag determines
#D- the period of this average (NONE if you don't want)
SeasonalFrequency=NONE
#D- Offset for seasonal average first start dates ; same unit as SeasonalFrequency
#D- Usefull if you do not want to consider the first X simulation's years
SeasonalFrequencyOffset=0