Opened 4 years ago
Closed 4 years ago
#2673 closed Request (fixed)
Make SETTE better
Reported by: | acc | Owned by: | acc |
---|---|---|---|
Priority: | low | Milestone: | 2021 WP |
Component: | tools | Version: | trunk |
Severity: | minor | Keywords: | |
Cc: |
Description
Context
Following on from discussions during the merge, there are some relatively quick improvements to make to the sette scripts to help with the management of multiple SETTE runs with different options. Mostly it is quick and simple stuff but it involves code sufficient rearrangement to justify a development branch until all the changes have been checked
Proposal
First stage is to move all the option setting into sette.sh and allow most to be set through command-line options. It will also be good to get sette.sh to record its setup each time. Here is a first draft of the planned functionality:
./sette.sh -n ORCA2_ICE_PISCES -v HALO2 -F -z -X -A -x RESTART ================================== -n: Configuration ORCA2_ICE_PISCES will be tested if it is available -v: HALO2 validation sub-directory requested -F: key_loop_fusion will not be activated -z: key_nosignedzero will NOT be activated -X: key_xios will not be activated -A: Tasks will be run in attached (SPMD) mode -x: RESTART tests requested Carrying out the following tests : RESTART requested by the command : ./sette.sh -n ORCA2_ICE_PISCES -v HALO2 -F -z -X -A -x RESTART USING_TIMING : yes USING_ICEBERGS : yes USING_EXTRA_HALO : yes USING_TILING : yes USING_NOSIGNED0 : no USING_QCO : yes USING_LOOP_FUSION : no USING_XIOS : no USING_MPMD : no USING_RK3 : no Common compile keys to be added : key_qco Common compile keys to be deleted : key_xios key_nosignedzero key_loop_fusion key_RK3 Validation records to appear under: /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/HALO2 ==================================
with a typical usage message of:
./sette.sh -h sette.sh with no arguments (in this case all configuration will be tested with default options) -T to set ln_timing false for all non-AGRIF configurations (default: true) -t set ln_tile false in all tests that support it (default: true) -e set nn_hls=1 (default: nn_hls=2) -i set ln_icebergs false (default: true) -z to remove the key_nosignedzero key (default: added) -q to remove the key_qco key (default: added) -X to remove the key_xios key (default: added) -F to remove the key_loop_fusion key (default: added) -Q to remove the key_RK3 key (currently a null-op since key_RK3 is not used) -A to run tests in attached (SPMD) mode (default: MPMD with key_xios) -n "CFG1_to_test CFG2_to_test ..." to test some specific configurations -x "TEST_type TEST_type ..." to specify particular types of test (RESTART is mandatory) -v "subdir" optional validation record subdirectory to be created below NEMO_VALIDATION_DIR -c to clean each configuration -s to synchronise the sette MY_SRC and EXP00 with the reference MY_SRC and EXPREF
Comments and suggestions are welcome. Please add to this ticket and if the ideas are too complex you can do them yourselves once I add the associated branch information :)
Commit History (11)
Changeset | Author | Time | ChangeLog |
---|---|---|---|
14981 | acc | 2021-06-11T15:47:19+02:00 | #2673 . Reintegrate sette developments back onto main sette branch. This action closes #2673 |
14980 | acc | 2021-06-11T15:37:22+02:00 | #2673 . merge in changes from main sette branch to sette_ticket2673 |
14897 | acc | 2021-05-24T14:55:18+02:00 | Branch sette_ticket2673. Add a super_sette.sh suggestion for running multiple tests and a sette_eval.sh which is a stripped down version of sette_rpt.sh to perform just the check for differences between different sets. A key difference though is the -q (quiet mode) switch whichreduces output to a single line evaluation. Examples will be added to #2673 |
14890 | acc | 2021-05-19T18:20:24+02:00 | Branch: sette_ticket2673. Minor adjustment of sette_list_avail_rev.sh to use MAIN as a default subdirectory. #2673 |
14888 | acc | 2021-05-19T15:46:48+02:00 | Branch: sette_ticket2673. Fixed too hasty edit in sette_rpt.sh and introduced new utility function: sette_fetch_inputs.sh. #2673 |
14884 | acc | 2021-05-18T20:49:47+02:00 | Branch: sette_ticket2673. Make sure a dry-run performs no action even if NEMO_VALIDATION_DIR does not exist. #2673 |
14883 | acc | 2021-05-18T20:34:36+02:00 | Branch: sette_ticket2673. Prompt for user confirmation rather than implicitly change options if a non-viable combination has been asked for. #2673 |
14874 | acc | 2021-05-17T21:03:23+02:00 | Branch: sette_ticket2673. Fix a bug in sette_rpt.sh and add an option to over-ride the reference revision number. See #2673 |
14867 | acc | 2021-05-14T18:51:03+02:00 | Branch: sette_ticket2673. Fully working version of revamped submission scripts. See #2673 for details of changes. Still some work to do to bring the associated scripts such as sette_list_avail_rev.sh in line with new design. |
14861 | acc | 2021-05-13T19:12:34+02:00 | First stage developments for #2673 (SETTE improvements). Not fully tested |
14860 | acc | 2021-05-13T18:34:21+02:00 | SETTE development branch associated with ticket #2673 |
Change History (26)
comment:1 Changed 4 years ago by acc
comment:2 Changed 4 years ago by acc
In 14861:
comment:3 follow-up: ↓ 4 Changed 4 years ago by francesca
Could be useful to add in the new SETTE the option to activate the use of MPI3 neighbours collectives communications, too.
It is controlled by the namelist option nn_comm in &nammpp section.
Its default value is 1 for the point to point communications, any different value activates MPI3 exchanges. So nn_comm=2 could be used.
comment:4 in reply to: ↑ 3 Changed 4 years ago by acc
Replying to francesca:
Could be useful to add in the new SETTE the option to activate the use of MPI3 neighbours collectives communications, too.
It is controlled by the namelist option nn_comm in &nammpp section.
Its default value is 1 for the point to point communications, any different value activates MPI3 exchanges. So nn_comm=2 could be used.
Ok, that is easy. I can do that :)
More of a problem is that makenemo insists on rebuilding the code each time with this new arrangement. One thought I had is that it is because del_keys often has content now (even though there is often nothing to remove). This may not be the issue but, whilst mk/Fadd_keys.sh explicitly doesn't touch the keys file if the key is already there, mk/Fdel_keys.sh is strange. Does anyone recall why it does this:
echo "Removing keys in : ${NEW_CONF}" for i in ${list_del_key} ; do if [ "$(echo ${i} | grep -c key_nproc )" -ne 0 ]; then sed -e "s/key_nproc[ij]=.* //" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \ > ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm echo " " elif [ "$(cat ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm | grep -c "$i" )" -ne 0 ]; then sed -e "s/\b${i}\b//" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \ > ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm echo "deleted key $i in ${NEW_CONF}" fi done
i.e. what is key_nproc all about?
comment:5 Changed 4 years ago by acc
Ok, this wasn't the issue. Adding an explicit report when the key is not present, confirms the key file is untouched:
echo "Removing keys in : ${NEW_CONF}" for i in ${list_del_key} ; do if [ "$(echo ${i} | grep -c key_nproc )" -ne 0 ]; then sed -e "s/key_nproc[ij]=.* //" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \ > ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm echo " " elif [ "$(cat ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm | grep -c "$i" )" -ne 0 ]; then sed -e "s/\b${i}\b//" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \ > ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm echo "deleted key $i in ${NEW_CONF}" else echo $i "is not present and does not need to be deleted" fi done
It was rebuilding because I had genuinely changed the keys. There is probably something to tidy up here though.
comment:6 Changed 4 years ago by acc
In 14867:
comment:7 Changed 4 years ago by acc
Some details on changeset: 14867
The full set of command-line options for sette.sh is now:
./sette.sh -h sette.sh with no arguments (in this case all configuration will be tested with default options) -T to set ln_timing false for all non-AGRIF configurations (default: true) -t set ln_tile false in all tests that support it (default: true) -e set nn_hls=1 (default: nn_hls=2) -i set ln_icebergs false (default: true) -C set nn_comm=1 (default: nn_comm=2 ==> use MPI3 collective comms) -z to remove the key_nosignedzero key (default: added) -q to remove the key_qco key (default: added) -X to remove the key_xios key (default: added) -F to remove the key_loop_fusion key (default: added) -Q to remove the key_RK3 key (currently a null-op since key_RK3 is not used) -A to run tests in attached (SPMD) mode (default: MPMD with key_xios) -n "CFG1_to_test CFG2_to_test ..." to test some specific configurations -x "TEST_type TEST_type ..." to specify particular types of test (RESTART is mandatory) -v "subdir" optional validation record subdirectory to be created below NEMO_VALIDATION_DIR -d to perform a dryrun to simply report what settings will be used -c to clean each configuration -s to synchronise the sette MY_SRC and EXP00 with the reference MY_SRC and EXPREF
and all namelist and key setting options are on by default. There is a dryrun option (-d) which will report what settings will be used, i.e.:
./sette.sh -i -T -e -F -t -v HALO1 -n "WED025 ISOMIP+" -d -i: ln_icebergs will be set to false -T: ln_timing will be set to false -e: nn_hls will be set to 1 -F: key_loop_fusion will not be activated -t: ln_tile will be set to false -v: HALO1 validation sub-directory requested ================================== -n: Configurations WED025 ISOMIP+ will be tested if they are available Carrying out the following tests : RESTART REPRO PHYOPTS CORRUPT requested by the command : ./sette.sh -i -T -e -F -t -v HALO1 -n WED025 ISOMIP+ -d USING_TIMING : no USING_ICEBERGS : no USING_EXTRA_HALO : no USING_TILING : no USING_COLLECTIVES : yes USING_NOSIGNED0 : yes USING_QCO : yes USING_LOOP_FUSION : no USING_XIOS : yes USING_MPMD : yes USING_RK3 : no Common compile keys to be added : key_xios key_nosignedzero key_qco Common compile keys to be deleted : key_loop_fusion key_RK3 Validation records to appear under: /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/HALO1 ================================== dryrun only: no tests performed
Having all options on by default causes some issues since not all configurations can accept all choices. Notably, those using linssh=.true. or ln_hpg_isf=.true. can't accept key_qco. The key is specifically removed for those configurations (in sette_reference-configurations.sh and sette_test-cases.sh) but it isn't good to have such changes buried in the scripts. It may be better to have another script which invokes sette.sh twice with different command-line options; once for those configurations not using key_qco and once for the rest). The top-level information reported by sette.sh would then accurately reflect the options that each configuration was run with.
param.cfg is now used only by sette.sh and variables exported to other scripts.
One design choice that may be up for debate has been to change the arrangement of the validation records. The -v option allows a subdirectory to be defined for the results of the current set of tests. This subdirectory will reside below NEMO_VALIDATION_DIR (as defined in param.cfg). sette_rpt.sh now accepts the same -v argument for re-running the report. The layout of the reports has been changed in two ways:
- The directory order is now: machine-->revision-->config-->exp
- config names (in the records) no longer have the W prefix and have the trailing _ST stripped
Thus a typical layout for the example command shown above is:
./NEMO_VALIDATION |-HALO1 |---X86_ARCHER2-Cray |-----14845+ |-------AGRIF_DEMO |---------LONG |---------ORCA2 |---------REPRO_2_8 |---------REPRO_4_4 |---------SHORT |-------AGRIF_DEMO_NOAGRIF |---------ORCA2 |-------AMM12 |---------LONG |---------REPRO_4_8 |---------REPRO_8_4 |---------SHORT |-------GYRE_PISCES |---------LONG |---------REPRO_2_4 |---------REPRO_4_2 |---------SHORT . . .
This changeset has been tested and successfully runs the full suite. Some tidying up remains to be done and the associated scripts, such as sette_list_avail_rev.sh still need to be adapted to the new layout. It should be complete enough to test more widely.
comment:8 Changed 4 years ago by acc
In 14869:
comment:9 Changed 4 years ago by acc
In 14870:
comment:10 Changed 4 years ago by acc
In 14873:
comment:11 Changed 4 years ago by acc
In 14874:
comment:12 Changed 4 years ago by acc
In 14876:
comment:13 Changed 4 years ago by acc
In 14878:
comment:14 Changed 4 years ago by acc
In 14879:
comment:15 Changed 4 years ago by acc
In 14881:
comment:16 Changed 4 years ago by acc
In 14883:
comment:17 Changed 4 years ago by acc
In 14884:
comment:18 Changed 4 years ago by acc
In 14887:
comment:19 Changed 4 years ago by acc
In 14888:
comment:20 Changed 4 years ago by acc
In 14890:
comment:21 Changed 4 years ago by acc
In 14893:
comment:22 Changed 4 years ago by acc
In 14895:
comment:23 Changed 4 years ago by acc
In 14897:
The super_sette.sh example suggests full SETTE testing with the default options and the closest equivalents with a single halo and without key_qco. The results of these tests go into the MAIN, HALO1 and NO_QCO subdirectories respectively. Additional test are slso performed with ORCA2_ICE_PISCES only without icebergs for nn_hls =1 and 2 and the default set without MPI3 collectives. These results go into the NO_ICB1, NO_ICB2 and NO_COLL subdirectories, respectively. sette_eval.sh can be used for a quick evaluation of differences. For example:
./sette_eval.sh -V MAIN -v HALO1 -R 14886 Current code is : NEMO/trunk @ r14896 ( last change @ r14886 ) SETTE evaluation for : NEMO/trunk @ r14886 (last changed revision) on X86_ARCHER2-Cray arch file !----result comparison check----! check result differences between : VALID directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/HALO1 at rev 14886 and REFERENCE directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/MAIN at rev 14886 GYRE_PISCES run.stat files are identical GYRE_PISCES tracer.stat files are identical ORCA2_ICE_PISCES run.stat files are DIFFERENT (results are different after 71 time steps) ORCA2_ICE_PISCES tracer.stat files are DIFFERENT (results are different after 73 time steps) ORCA2_OFF_PISCES tracer.stat files are DIFFERENT (results are different after 17 time steps) AMM12 run.stat files are identical ORCA2_SAS_ICE run.stat files are identical AGRIF_DEMO run.stat files are identical WED025 run.stat files are identical ISOMIP+ run.stat files are identical VORTEX run.stat files are identical ICE_AGRIF run.stat files are identical OVERFLOW run.stat files are identical LOCK_EXCHANGE run.stat files are identical SWG run.stat files are identical
or in "quiet mode":
./sette_eval.sh -V MAIN -v HALO1 -R 14886 -q 3 differences from 13 matches.
It is also useful when tests have only been run with a few configurations. Take the ORCA2_ICE_PISCES only tests suggested by super_sette.sh, for example:
./sette_eval.sh -V MAIN -v NO_ICB2 -R 14886 -q 2 differences from 1 matches. 0 missing from REFERENCE 12 missing from VALID
The two differences here being the run.stat and tracer.stat differences for ORCA2_ICE_PISCES. Reassuringly:
./sette_eval.sh -V NO_ICB2 -v NO_ICB1 -R 14886 -q 0 differences from 1 matches. 12 missing from REFERENCE 12 missing from VALID
and:
./sette_eval.sh -V MAIN -v NO_COLL -R 14886 -q 0 differences from 1 matches. 0 missing from REFERENCE 12 missing from VALID
TBI:
./sette_eval.sh -V MAIN -v NO_QCO -R 14886 -q 7 differences from 12 matches. 0 missing from REFERENCE 1 missing from VALID
comment:24 Changed 4 years ago by acc
A note on the differences observed between the MAIN and NO_QCO sets, namely:
./sette_eval.sh -V MAIN -v NO_QCO -R 14886 -q 7 differences from 12 matches. 0 missing from REFERENCE 1 missing from VALID
or in full:
check result differences between : VALID directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/NO_QCO at rev 14886 and REFERENCE directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/MAIN at rev 14886 GYRE_PISCES run.stat files are identical GYRE_PISCES tracer.stat files are identical ORCA2_ICE_PISCES run.stat files are DIFFERENT (results are different after 1 time steps) ORCA2_ICE_PISCES tracer.stat files are DIFFERENT (results are different after 2 time steps) ORCA2_OFF_PISCES tracer.stat files are identical AMM12 run.stat files are DIFFERENT (results are different after 1 time steps) ORCA2_SAS_ICE run.stat files are identical AGRIF_DEMO run.stat files are DIFFERENT (results are different after 1 time steps) WED025 run.stat files are identical ISOMIP+ run.stat files are identical VORTEX run.stat files are DIFFERENT (results are different after 1 time steps) ICE_AGRIF run.stat files are identical OVERFLOW run.stat files are DIFFERENT (results are different after 1 time steps) LOCK_EXCHANGE run.stat files are DIFFERENT (results are different after 3 time steps) SWG VALID directory at 14886 is MISSING
Sibylle has reminded me that differences are to be expected between QCO and non-QCO runs because of small differences in the way some of the vertical metrics are computed. The apparent inconsistency in the list above is due to the fact that key_qco is not applied to those configurations that use either ln_linssh=.true. or ln_hpg_isf=.true.. In these cases (GYRE_PISCES, SAS, OFF, WED025, ICE_AGRIF and ISOMIP+) the same code is tested regardless of USING_QCO and so the results are identical. Finally SWG missing from the NO_QCO set is also expected since this configuration can only be run with key_qco.
comment:25 Changed 4 years ago by acc
In 14980:
comment:26 Changed 4 years ago by acc
- Owner set to acc
- Resolution set to fixed
- Status changed from new to closed
In 14981:
In 14860: