Changes between Version 45 and Version 46 of Doc/CheckDebug


Ignore:
Timestamp:
10/10/19 14:38:31 (5 years ago)
Author:
omamce
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Doc/CheckDebug

    v45 v46  
    3939#### TGCC #### 
    4040 
    41 You can use `ccc_mstat` on Irene. To see the available options and useful scripts, see [wiki:Doc/Env/Tgcc/Irene#Jobmanagercommands Working on Irene]. 
     41You can use `ccc_mstat` on Irene. To see the available options and useful scripts, see [wiki:Doc/Env/TgccIrene#Jobmanagercommands Working on Irene]. 
    4242 
    4343#### IDRIS #### 
     
    6868  1. Here is information about the main job; the information comes from the `config.card` and the `run.card` : 
    6969    * The first line returns the job name and the date of the last time data saved to disk in the `run.card`. 
    70     * `DateBegin` - !DateEnd : start and end dates of the simulation as defined in `config.card`. 
     70    * `DateBegin` - `DateEnd` : start and end dates of the simulation as defined in `config.card`. 
    7171    * `PeriodState` : variable coming from the `run.card` giving the run's status : 
    7272      * `OnQueue`, `Waiting` : the run is queued ; 
     
    7474      * `Completed` : the run was completed successfully ; 
    7575      * `Fatal` : the run failed. 
    76     * `Current Period` : this variable from `run.card` shows which integration step (most often one month or one year) is being computed 
     76    * `Current Period` : this variable from `run.card` shows which integration step (most often one month or one year) is being computed. 
    7777    * `CumulPeriod` : variable from `run.card`. Number of the period being computed 
    7878    * `Pending Rebuilds, Nb | From | To` : number of files waiting to be "rebuild", date of the oldest and the latest files. Most of the configuration use parallel I/O and have not more rebuild steps. 
     
    9191  * `-u user` : starts the Checker for the simulation of another user 
    9292  * `-q` : silence mode 
    93   * `-j n` : displays n post processing jobs (10 by default) 
     93  * `-j n` : displays `n` post processing jobs (10 by default) 
    9494  * `-s` : looks for a simulation $WORKDIR and adds it to its catalog of simulations before displaying the information 
    9595  * `-p path` : !!!absolute!!! path of the directory containing the `config.card` instead of the job_name. 
     
    102102  * For a given post processing job, the number of successfully-transferred files varies according to the date : this might mean that errors occurred.  
    103103[[NoteBox(warn, In some cases (such as for historical simulations where the COSP outputs are activated starting from 1979 ...) this behavior is normal!, 600px)]] 
    104   * A `PeriodState` to Fatal indicates that an error occurred either in the main job or in one of the post processing jobs. 
     104  * A `PeriodState` to `Fatal` indicates that an error occurred either in the main job or in one of the post processing jobs. 
    105105  * If the number of rebuilds waiting is above... 
    106106 
    107107#### Good things to know #### 
    108108 
    109 During the first integration of a simulation using IPSLCM5, an additional rebuild file is transferred. This extra file is the NEMO "mesh_mask.nc" file. It is created and transferred only during the first step. It is then used for each "rebuild" of the NEMO output files to mask the variables.  
     109During the first integration of a simulation using IPSLCM5, an additional rebuild file is transferred. This extra file is the NEMO `mesh_mask.nc` file. It is created and transferred only during the first step. It is then used for each "rebuild" of the NEMO output files to mask the variables.  
    110110 
    111111## End of simulation ## 
    112 Once your simulation is finished you will receive an email saying that the simulation was "Completed" or that it "Failed" and two files will be created in the working directory of your experiment:  
     112Once your simulation is finished you will receive an email saying that the simulation was `Completed` or that it `Failed` and two files will be created in the working directory of your experiment:  
    113113  * [wiki:DocFsimu#run.cardattheendofasimulation run.card] 
    114114  * `Script_Output_JobName` 
    115115 
    116 A `Debug/` directory is created if the simulation failed in a way that is correctly diagnosed by libIGCM. This directory contains diagnostic text files for each model component. It won't be created if the job reaches the time limit and is stopped by the batch scheduler. 
    117  
    118 If the crash is not properly handeld by ligIGCM, you will find a lot of files in $RUN_DIR. In `Script_Output_JobName`, find the line starting with `IGCM_sys_Cd : ` and get the location of the RUN_DIR. 
     116A `Debug` directory is created if the simulation failed in a way that is correctly diagnosed by libIGCM. This directory contains diagnostic text files for each model component. It won't be created if the job reaches the time limit and is stopped by the batch scheduler. 
     117 
     118If the crash is not properly handeld by libIGCM, you will find a lot of files in `$RUN_DIR`. In `Script_Output_JobName`, find the line starting with `IGCM_sys_Cd : ` and get the location of the RUN_DIR. 
    119119 
    120120If the simulation was successfully completed output files will be stored in the following directory:  
     
    180180 
    181181 * if the file ends before the second part, possible reasons can be:  
    182    * you didn't delete the existing run.card file in case you wanted to overwrite the simulation;  
    183    * you didn't specify !OnQueue in the run.card file in case you wanted to continue the simulation;  
     182   * you didn't delete the existing `run.card` file in case you wanted to overwrite the simulation;  
     183   * you didn't specify `OnQueue`in the `run.card` file in case you wanted to continue the simulation;  
    184184   * one of the input files was missing (e.g. it doesn't exist, the machine has a problem,...);  
    185    * the frequencies (!RebuildFrequency, !PackFrequency ...) do not match !PeriodLength. 
     185   * the frequencies (`RebuildFrequency`, `PackFrequency` ...) do not match `PeriodLength`. 
    186186 
    187187 * if the file ends in the middle of the second part, it's most likely because you didn't request enough memory or CPU time. 
     
    214214}}} 
    215215 
    216 If there is a message indicating that the "restartphy.nc" file doesn't exist it means that the model simulation was completed but before the end date of your simulation. If this happens and if your model creates an output log other than the simulation output log, you must refer to this log. 
     216If there is a message indicating that the `restartphy.n`" file doesn't exist it means that the model simulation was completed but before the end date of your simulation. If this happens and if your model creates an output log other than the simulation output log, you must refer to this log. 
    217217For example, the output file of the ocean model is stored on the file server under this name: 
    218218{{{ 
     
    242242### The Debug directory ### 
    243243 
    244 If the simulation failed due to abnormal exit from the executable, a Debug/ directory is created in the working directory. It contains output text files of all model components for your configuration. You should read them to look for errors. For example : 
     244If the simulation failed due to abnormal exit from the executable, a `Debug` directory is created in the working directory. It contains output text files of all model components for your configuration. You should read them to look for errors. For example : 
    245245 
    246246 * `xxx_out_gcm.e_error` --> lmdz  text output 
     
    441441## Restarting the seasonal mean calculations ## 
    442442 
    443 [[NoteBox(tip, Transfer `config.card`\, `COMP`\, `POST`\, and `run.card` (post process part of the simulation only) in the `POST_REDO/` directory if you have not done so yet., 600px)]] 
     443[[NoteBox(tip, Transfer `config.card`\, `COMP`\, `POST`\, and `run.card` (post process part of the simulation only) in the `POST_REDO` directory if you have not done so yet., 600px)]] 
    444444 
    445445There are two methods:  
     
    500500# Optimization with Lucia # 
    501501 
    502 IPSLCM coupled model runs three executables (atmospehre, ocean and IO server) that use three separate sets of computing cores. The number of cores attributed to each one should be choose such as the execution times of each executable are as close as possible, to reduce the waiting time. 
     502IPSLCM coupled model runs three executables (atmosphere, ocean and IO server) that use three separate sets of computing cores. The number of cores attributed to each one should be choose such as the execution times of each executable are as close as possible, to reduce the waiting time. 
    503503 
    504504LUCIA is a tool implemented in OASIS that measure execution and waiting times of each executable, and helps to tune the number of execution cores for each model. 
     
    528528@@ -117,6 +117,7 @@ 
    529529     #   To be changed 
    530      #   On Curie 
    531      #   /ccc/scratch/cont003/dsm/p86caub/LUCIA/lucia 
    532 +     /ccc/cont003/home/igcmg/igcmg/Tools/irene/lucia/lucia 
     530     #   On Irene 
     531+     ~igcmg/Tools/irene/lucia/lucia 
    533532     #   On Ada 
    534533     #   /linkhome/rech/psl/rpsl035/LUCIA/lucia