Since January 2013, you must specify in the header from which project you will use computing time:
#MSUB -A genxxx
QoS (Quality of Service) is a test queue. You can have a maximum of 2 jobs in test queue, each of them is limited to 30min and 35 nodes (= 560 tasks). In the job header you must add:
#MSUB -Q test
and change the CPU time limit
#MSUB -T 1800
To check QoS parameters, use :
ccc_mqinfo Name Partition Priority MaxCPUs SumCPUs MaxNodes MaxRun MaxSub MaxTime ------- --------- -------- ------- ------- -------- ------ ------ ---------- long * 18 2048 4096 32 3-00:00:00 normal * 20 300 1-00:00:00 test standard 40 560 560 35 2 00:30:00
/usr/bin/ccc_mpinfo --------------CPUS------------ -------------NODES------------ PARTITION STATUS TOTAL DOWN USED FREE TOTAL DOWN USED FREE MpC CpN SpN CpS TpC --------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- --- --- --- --- standard up 80352 0 76161 4191 5022 0 4724 298 4000 16 2 8 1 hybrid up 264 0 0 264 33 0 0 33 2900 8 2 4 1
ccc_mstat -H 375309 JobID JobName Partitio ReqCPU Account Start Timelimit Elapsed State ExitCode ------- ---------- -------- ------ ------------------ ------------------- ---------- ---------- ---------- -------- 375309 v3.histor+ standard 0 gen0826@standard 2012-05-11T16:27:53 1-00:00:00 01:49:03 RUNNING 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T16:28:16 00:14:19 COMPLETED 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T16:42:47 00:12:54 COMPLETED 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T16:55:59 00:13:30 COMPLETED 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T17:09:31 00:13:22 COMPLETED 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T17:24:06 00:13:36 COMPLETED 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T17:37:54 00:13:31 COMPLETED 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T17:51:28 00:14:19 COMPLETED 0:0 375309+ p86maf_ru+ 32 gen0826@standard 2012-05-11T18:05:57 00:10:59 RUNNING 0:0
> ccc_macct 698214 Jobid : 698214 Jobname : v5.historicalCMR4.452 User : p86maf Account : gen2211@s+ Limits : time = 1-00:00:00 , memory/task = Unknown Date : submit=06/09/2012 17:51:56, start=06/09/2012 17:51:57 , end= 07/09/2012 02:20:28 Execution : partition = standard , QoS = normal Resources : ncpus = 53 , nnodes = 4 Nodes=curie[2166,5964,6002,6176] Memory /step ------ Resident (Mo) Virtual (Go) JobID Max (Node:Task) AveTask Max (Node:Task) AveTask ----------- ------------------------ ------- -------------------------- ------- 698214 0( : 0) 0 0.00( : 0) 0.00 698214.batch 25(curie2166 : 0) 0 0.00(curie2166 : 0) 0.00 698214.0 952(curie2166 : 0) 0 3.00(curie2166 : 1) 0.00 ... 698214.23 952(curie2166 : 0) 0 3.00(curie2166 : 2) 0.00 Accounting / step ------------------ JobID JobName Ncpus Nnodes Ntasks Elapsed State ExitCode ------------ ------------ ------ ------ ------- ------------ ---------- ------- 698214 v5.historic+ 53 4 08:28:31 COMPLETED 0:0 698214.batch batch 1 1 1 08:28:31 COMPLETED 698214.0 p86maf_run_+ 53 4 53 00:20:53 COMPLETED 698214.1 p86maf_run_+ 53 4 53 00:20:20 COMPLETED ... 698214.23 p86maf_run_+ 53 4 53 00:21:06 COMPLETED
> ccc_macct 680580 Jobid : 680580 Jobname : v5.historicalCMR4 User : p86maf Account : gen2211@s+ Limits : time = 1-00:00:00 , memory/task = Unknown Date : submit=30/08/2012 17:10:06, start=01/09/2012 04:11:30 , end= 01/09/2012 04:42:48 Execution : partition = standard , QoS = normal Resources : ncpus = 53 , nnodes = 5 Nodes=curie[2097,2107,4970,5413,5855] Memory /step ------ Resident (Mo) Virtual (Go) JobID Max (Node:Task) AveTask Max (Node:Task) AveTask ----------- ------------------------ ------- -------------------------- ------- 680580 0( : 0) 0 0.00( : 0) 0.00 680580.batch 28(curie2097 : 0) 0 0.00(curie2097 : 0) 0.00 680580.0 952(curie2097 : 0) 0 3.00(curie2097 : 1) 0.00 680580.1 316(curie2097 : 8) 0 2.00(curie2097 : 8) 0.00 Accounting / step ------------------ JobID JobName Ncpus Nnodes Ntasks Elapsed State ExitCode ------------ ------------ ------ ------ ------- ------------ ---------- ------- 680580 v5.historic+ 53 5 00:31:18 COMPLETED 0:9 680580.batch batch 1 1 1 00:31:18 COMPLETED 680580.0 p86maf_run_+ 53 5 53 00:19:48 COMPLETED 680580.1 p86maf_run_+ 53 5 53 00:10:06 CANCELLED b+
Since April 2016, only thin nodes are available at TGCCC. The job header must include #MSUB -q standard to use thin nodes.
SSD usage could accelerate rebuild job. It's very useful for medium and high resolution configuration like IPSLCM5A-MR. You have only to change header and RUN_DIR_PATH in rebuild.job. Take care you will run faster but cost will be multiplied by a factor of 16 because standard node ie 16 cpus are dedicated. Beware of the size of the /tmp (64GB/node) : if you have configuration with very high resolution and very high output frequency, the /tmp of standard node could be too small; in this case see below.
#MSUB -q standard # thin nodes #MSUB -x # exclusive node RUN_DIR_PATH=/tmp/REBUILD_DIR_MR_$$
Since october 2015 and libIGCM_v2.7, ins_job (libIGCM/ins_job) successfully completes job's header. Nevertheless you can check with job's header examples provided here.
To launch a job on XXX MPI tasks
#MSUB -r MyJob #MSUB -o Script_Output_MyJob.000001 # standard output #MSUB -e Script_Output_MyJob.000001 # error output #MSUB -eo #MSUB -n XXX # number of MPI task #MSUB -T 86400 # Wall clock limit (seconds) #MSUB -q standard # thin nodes #MSUB -A gen**** BATCH_NUM_PROC_TOT=$BRIDGE_MSUB_NPROC
Hybrid version are only available with _v6 configurations
To launch a job on XXX MPI tasks and YYY threads OMP on each task
ATM= (gcm.e, lmdz.x, XXXMPI, YYYOMP)
#MSUB -r MyJob #MSUB -o Script_Output_MyJob.000001 # standard output #MSUB -e Script_Output_MyJob.000001 # error output #MSUB -eo #MSUB -n XXX # number of MPI task #MSUB -c YYY # number of threads OMP by task #MSUB -T 86400 # Wall clock limit (seconds) #MSUB -q standard # thin nodes #MSUB -A gen**** BATCH_NUM_PROC_TOT=XXX * YYY # number of MPI task * OMP threads
To launch a job on XXX MPI tasks
#MSUB -r MyCoupledJob #MSUB -o Script_Output_MyCoupledJob.000001 # standard output #MSUB -e Script_Output_MyCoupledJob.000001 # error output #MSUB -eo #MSUB -n XXX # number of MPI task #MSUB -T 86400 # Wall clock limit (seconds) #MSUB -q standard # thin nodes #MSUB -A gen**** BATCH_NUM_PROC_TOT=$BRIDGE_MSUB_NPROC
Hybrid version are only available with _v6 configurations
To launch a job on XXX (27) MPI tasks and YYY (4) threads OMP for LMDZ, ZZZ (19) MPI tasks for NEMO and SSS (1) XIOS servers :
ATM= (gcm.e, lmdz.x, 27MPI, 4OMP) SRF= ("" ,"" ) SBG= ("" ,"" ) OCE= (opa, opa.xx , 19MPI) ICE= ("" ,"" ) MBG= ("" ,"" ) CPL= ("", "" ) IOS= (xios_server.exe, xios.x, 1MPI)
#MSUB -r MyCoupledJob #MSUB -o Script_Output_MyCoupledJob.000001 # standard output #MSUB -e Script_Output_MyCoupledJob.000001 # error output #MSUB -eo #MSUB -n 128 # Number of cores (XXX * YYY + ZZZ + SSS) #MSUB -x # exclusive node #MSUB -E '--cpu_bind=none' #MSUB -T 86400 # Wall clock limit (seconds) #MSUB -q standard # thin nodes #MSUB -A gen***
... module load ddt unset SLURM_SPANK_AUKS echo "-np 1 ${DDTPATH}/bin/ddt-client ${TMPDIR_DEBUG}/oasis" > run_file echo "-np 26 ${DDTPATH}/bin/ddt-client ${TMPDIR_DEBUG}/lmdz.x" >> run_file echo "-np 5 ${DDTPATH}/bin/ddt-client ${TMPDIR_DEBUG}/opa.xx" >> run_file ddt
ddt -start -n 51 -mpiargs "-rankfile rankfile.txt --tag-output \ -np 20 -x KMP_STACKSIZE=3g -x KMP_LIBRARY=turnaround -x MKL_SERIAL=YES -x OMP_NUM_THREADS=4 ./lmdz.x : \ -np 31 -x OMP_NUM_THREADS=1 ./opa.xx "
slurmd[curie1006]: error: *** STEP 639264.5 KILLED AT 2012-08-01T17:00:29 WITH SIGNAL 15 ***
This error message means that the time limit is exceeded. To solve the problem type clean_PeriodLenght.job, increase the time limit (or decrease PeriodNb) and restart.
Problem:
Solution:
srun: First task exited 600s ago srun: tasks 0-40,42-45: running srun: task 41: exited abnormally srun: Terminating job step 438782.1 slurmd[curie1150]: *** STEP 438782.1 KILLED AT 2012-06-10T18:45:41 WITH SIGNAL 9 *** slurmd[curie1151]: *** STEP 438782.1 KILLED AT 2012-06-10T18:45:41 WITH SIGNAL 9 *** srun: Job step aborted: Waiting up to 2 seconds for job step to finish. slurmd[curie1150]: *** STEP 438782.1 KILLED AT 2012-06-10T18:45:41 WITH SIGNAL 9 *** slurmd[curie1151]: *** STEP 438782.1 KILLED AT 2012-06-10T18:45:41 WITH SIGNAL 9 ***
don't ask questions! Type clean_PeriodLenght.job and restart the simulation.
The file system $CCCWORKDIR, $CCCSTOREDIR, $SCRATCHDIR are delicate. The error messages look like:
Input/output error Cannot send after transport endpoint shutdown
Don't ask question and resubmit the job.
/var/spool/slurmd/job637061/slurm_script: line 534: 458 Segmentation fault /bin/ksh -x ${TEMPO_SCRIPT}
If you have this kind of message don't ask question and resubmit the job.
This message:
error: Batch job submission failed: Job violates accounting policy (job submit limit, user's size and/or time limits)
means that you have submitted too many jobs (wait for the jobs to end and resubmit), that your headers are not properly written, or that you did not specify on which genci project the computing time must be deducted. The ccc_mqinfo command returns the maximum number of jobs (to this day: 300 for 24h-max jobs, 8 for 72h-max jobs and 2 for test jobs (30 min and max 8 nodes)):
ccc_mqinfo Name Priority MaxCPUs MaxNodes MaxRun MaxSub MaxTime ------ -------- ------- -------- ------ ------ ---------- long 18 1024 2 8 3-00:00:00 normal 20 300 1-00:00:00 test 40 8 2 00:30:00
The computation of the users priority is based on 3 cumulated criteria:
If your job is far down the waiting list and if you are working on different projects, use the project with the least computing time used.
This computation is not satisfying because we would prefer to encourage long simulations. We are looking for real examples of abnormal waiting situations. Please take the time to give us your feedback.
Be careful to quotas on /scratch! Monitor them with the command ccc_quota. Destroy the temporary directories created by jobs that ended too early and that did not clear the $SCRATCHDIR/TMPDIR_IGCM and $SCRATCHDIR/RUN_DIR directories. You should have a 20 To quota on curie.
> ccc_quota Disk quotas for user xxxx: ------------------ VOLUME -------------------- ------------------- INODE -------------------- Filesystem usage soft hard grace files soft hard grace ---------- ----- ---- ---- ----- ----- ---- ---- ----- scratch 3.53T 20T 20T - 42.61k 2M 2M - store - - - - 93.76k 100k 101k - work 232.53G 1T 1.1T - 844.8k 1.5M 1.5M -
This message appears when time limit is reached. Increase requested time in job's header or reduce NbPeriod in your job to reduce the number of loop's iteration.
Simulations with the IPSLCM5/IPSLCM6 coupled model are reproducible if you use the same Bands file for LMDZ. See trusting TGCC/curie on this web page: http://webservices.ipsl.jussieu.fr/trusting/