wiki:DocBenvAidrisAada

Version 6 (modified by mafoipsl, 9 years ago) (diff)

--

Working on Ada


1. IDRIS users' manual

2. Commands to manage jobs on ada

  • The job's time limit is measured in real time, for example 1 hour on 32 procs accounts for 32 hours. Be careful not to have too much time on 1 processor.
  • llsubmit --> submit a job
  • llcancel --> cancel a job
  • llq -u login --> indicates all jobs in the queue or running for the login login
  • Trick: parameterize the llq display to see the job names
    llq -u $(whoami) -f %jn %id %st %c %dq %h -W
    
  • Post-mortem : idrjar , idrjar -l -j #jobid#, to obtain detailed information: memory, real time, efficiency,...
  • Example of idrjar output :
    ada > idrjar
    |----------------------------------------------|
    |--- IDRIS/CNRS. Version du 18 février 2013 ---|
    |----------------------------------------------|
    
    Sorties concernant l'identifiant rpslxxx pour la période du
            ==> 01 juin 2013 au 19 juin 2013
    
    
     Owner   Job Name        JobId      Queue tEse  tCpu   #T   (%)   S
    ------- ----------- --------------- ----- ---- ------ --- ------- -
    rpslxxx ADA337      ada338.290170.0 c32t2  133   1232  32   28.95 C
    rpslxxx ADA337      ada338.290333.0 c32t2 5425 165141  32   95.13 C
    rpslxxx PACKDEBUG   ada338.290610.0 t2      11      2   1   18.18 C
    rpslxxx ADA337      ada338.290438.0 c32t2 5471 166878  32   95.32 C
    rpslxxx PACKRESTART ada338.290611.0 t2     182     25   1   13.74 C
    rpslxxx REBUILDWRK  ada338.290612.0 t2    1577    503   1   31.90 C
    rpslxxx PACKOUTPUT  ada338.290730.0 t2     114     43   1   37.72 C
    

3. Example of a job to start an executable in MPI

Here is an example of a simple job to start an executable orchidee_ol (or gcm.e commented). The input files and the executable must be in the directory before starting the executable.

#!/bin/ksh
# ######################
# ##   ADA IDRIS   ##
# ######################
# Query's name
# @ job_name = test
# Job type
# @ job_type = parallel
# Standard output file
# @ output = Script_Output_test
# Error output file (the same)
# @ error = Script_Output_test
# Number of requested processes
# @ total_tasks = 8
# max. CPU time per MPI process hh:mm:ss
# @ wall_clock_limit = 1:00:00
# Number of task OpenMP/pthreads per MPI process
### @ parallel_threads = 4
# End of header
# @ queue

poe ./orchidee_ol
#poe ./gcm.e

4. Information on Ergon files from Ada

The mfls command on Ada provides information on the Ergon files.

5. Information on Ergon files from Adapp

The mfls command on Adapp provides information on the Ergon files.

Ergon files are visible from Adapp. Use $ARCHIVE to reach Ergon files.

6. Specificities libIGCM on Ada

At IDRIS and for Ada, output files are 'packed' using libIGCM_v2, i.e. they are grouped by periods (in general 1 year) using the command tar or ncrcat for NetCDF output files.

This has been a default setup at TGCC for a few months. It is a new feature since February 2013 for IDRIS.

The diagram below shows the different options offered by libIGCM. The 3rd option is currently activated by default at IDRIS. This option implies that files must be temporarily stored on the $WORKDIR space, which means that a large storage is needed (at least 20 To).

The diagram below details the added jobs pack_debug, pack_restart and pack_output as well as the directories those jobs are using. Note that the files are temporarily stored in the $WORKDIR/IGCM_OUT directories before being grouped and sent on Ergon in the IGCM_OUT directories.

You will obtain annual output files with 12 monthly values in the Output/MO directory if you put PackFrequency=1Y in config.card. This is the default grouping period of most configurations but you can of course change it.

What you must remember:

  • The tool RunChecker.job is meant to help you monitoring your simulations. It offers a synthetic view of the different post processing jobs' status.
  • The tool clean_year.job is meant to help you clean until the last successfully computed pack period.
  • If you detect anomalies and must rerun part of the simulation, you will have to make new complete pack periods (e.g. filling a gap by running 1 month of simulation is out of the question).
  • The restart files are stored and grouped on Ergon in the directory IGCM_OUT/.../RESTART
  • The different output text-files are stored and grouped on Ergon in the directory IGCM_OUT/.../DEBUG
  • The listings for pack-jobs outputs stay on Ada in the directory $WORKDIR/IGCM_OUT/.../Out
  • If you put the SpaceName=TESTparameter in config.card the pack jobs will not be started and your simulation will be stored as before in the WORKDIR/IGCM_OUT directory. This can be very useful for short tests.

To learn more about this Section, you can read the documentation on Simulation and post-processing and on Monitor, debug and relaunching.

Finaly, in case of panic, visit us or send your questions to the list platform-users.

Attachments (3)

Download all attachments as: .zip