New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
#2673 (Make SETTE better) – NEMO

Opened 4 years ago

Closed 4 years ago

#2673 closed Request (fixed)

Make SETTE better

Reported by: acc Owned by: acc
Priority: low Milestone: 2021 WP
Component: tools Version: trunk
Severity: minor Keywords:
Cc:

Description

Context

Following on from discussions during the merge, there are some relatively quick improvements to make to the sette scripts to help with the management of multiple SETTE runs with different options. Mostly it is quick and simple stuff but it involves code sufficient rearrangement to justify a development branch until all the changes have been checked

Proposal

First stage is to move all the option setting into sette.sh and allow most to be set through command-line options. It will also be good to get sette.sh to record its setup each time. Here is a first draft of the planned functionality:

./sette.sh -n ORCA2_ICE_PISCES -v HALO2 -F -z -X -A -x RESTART
==================================
-n: Configuration ORCA2_ICE_PISCES will be tested if it is available

-v: HALO2 validation sub-directory requested

-F: key_loop_fusion will not be activated

-z: key_nosignedzero will NOT be activated

-X: key_xios will not be activated

-A: Tasks will be run in attached (SPMD) mode

-x: RESTART tests requested

Carrying out the following tests  : RESTART
requested by the command          : ./sette.sh -n ORCA2_ICE_PISCES -v HALO2 -F -z -X -A -x RESTART
USING_TIMING                      : yes
USING_ICEBERGS                    : yes
USING_EXTRA_HALO                  : yes
USING_TILING                      : yes
USING_NOSIGNED0                   : no
USING_QCO                         : yes
USING_LOOP_FUSION                 : no
USING_XIOS                        : no
USING_MPMD                        : no
USING_RK3                         : no
Common compile keys to be added   : key_qco
Common compile keys to be deleted : key_xios key_nosignedzero key_loop_fusion key_RK3
Validation records to appear under: /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/HALO2
==================================

with a typical usage message of:

./sette.sh -h
sette.sh with no arguments (in this case all configuration will be tested with default options)
-T to set ln_timing false for all non-AGRIF configurations (default: true)
-t set ln_tile false in all tests that support it (default: true)
-e set nn_hls=1 (default: nn_hls=2)
-i set ln_icebergs false (default: true)
-z to remove the key_nosignedzero key (default: added)
-q to remove the key_qco key (default: added)
-X to remove the key_xios key (default: added)
-F to remove the key_loop_fusion key (default: added)
-Q to remove the key_RK3 key (currently a null-op since key_RK3 is not used)
-A to run tests in attached (SPMD) mode (default: MPMD with key_xios)
-n "CFG1_to_test CFG2_to_test ..." to test some specific configurations
-x "TEST_type TEST_type ..." to specify particular types of test (RESTART is mandatory)
-v "subdir" optional validation record subdirectory to be created below NEMO_VALIDATION_DIR
-c to clean each configuration
-s to synchronise the sette MY_SRC and EXP00 with the reference MY_SRC and EXPREF

Comments and suggestions are welcome. Please add to this ticket and if the ideas are too complex you can do them yourselves once I add the associated branch information :)

Commit History (11)

ChangesetAuthorTimeChangeLog
14981acc2021-06-11T15:47:19+02:00

#2673 . Reintegrate sette developments back onto main sette branch. This action closes #2673

14980acc2021-06-11T15:37:22+02:00

#2673 . merge in changes from main sette branch to sette_ticket2673

14897acc2021-05-24T14:55:18+02:00

Branch sette_ticket2673. Add a super_sette.sh suggestion for running multiple tests and a sette_eval.sh which is a stripped down version of sette_rpt.sh to perform just the check for differences between different sets. A key difference though is the -q (quiet mode) switch whichreduces output to a single line evaluation. Examples will be added to #2673

14890acc2021-05-19T18:20:24+02:00

Branch: sette_ticket2673. Minor adjustment of sette_list_avail_rev.sh to use MAIN as a default subdirectory. #2673

14888acc2021-05-19T15:46:48+02:00

Branch: sette_ticket2673. Fixed too hasty edit in sette_rpt.sh and introduced new utility function: sette_fetch_inputs.sh. #2673

14884acc2021-05-18T20:49:47+02:00

Branch: sette_ticket2673. Make sure a dry-run performs no action even if NEMO_VALIDATION_DIR does not exist. #2673

14883acc2021-05-18T20:34:36+02:00

Branch: sette_ticket2673. Prompt for user confirmation rather than implicitly change options if a non-viable combination has been asked for. #2673

14874acc2021-05-17T21:03:23+02:00

Branch: sette_ticket2673. Fix a bug in sette_rpt.sh and add an option to over-ride the reference revision number. See #2673

14867acc2021-05-14T18:51:03+02:00

Branch: sette_ticket2673. Fully working version of revamped submission scripts. See #2673 for details of changes. Still some work to do to bring the associated scripts such as sette_list_avail_rev.sh in line with new design.

14861acc2021-05-13T19:12:34+02:00

First stage developments for #2673 (SETTE improvements). Not fully tested

14860acc2021-05-13T18:34:21+02:00

SETTE development branch associated with ticket #2673

Change History (26)

comment:1 Changed 4 years ago by acc

In 14860:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:2 Changed 4 years ago by acc

In 14861:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:3 follow-up: Changed 4 years ago by francesca

Could be useful to add in the new SETTE the option to activate the use of MPI3 neighbours collectives communications, too.
It is controlled by the namelist option nn_comm in &nammpp section.
Its default value is 1 for the point to point communications, any different value activates MPI3 exchanges. So nn_comm=2 could be used.

comment:4 in reply to: ↑ 3 Changed 4 years ago by acc

Replying to francesca:

Could be useful to add in the new SETTE the option to activate the use of MPI3 neighbours collectives communications, too.
It is controlled by the namelist option nn_comm in &nammpp section.
Its default value is 1 for the point to point communications, any different value activates MPI3 exchanges. So nn_comm=2 could be used.

Ok, that is easy. I can do that :)

More of a problem is that makenemo insists on rebuilding the code each time with this new arrangement. One thought I had is that it is because del_keys often has content now (even though there is often nothing to remove). This may not be the issue but, whilst mk/Fadd_keys.sh explicitly doesn't touch the keys file if the key is already there, mk/Fdel_keys.sh is strange. Does anyone recall why it does this:

echo "Removing keys in : ${NEW_CONF}"

for i in ${list_del_key} ; do

     if [ "$(echo ${i} | grep -c key_nproc )" -ne 0                                      ]; then
        sed -e "s/key_nproc[ij]=.* //" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \
            >  ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp
        mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm
        echo " "
     elif [ "$(cat ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm | grep -c "$i" )" -ne 0 ]; then
         sed -e "s/\b${i}\b//" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \
             >  ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp
         mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm
         echo "deleted key $i in ${NEW_CONF}"
     fi

done

i.e. what is key_nproc all about?

comment:5 Changed 4 years ago by acc

Ok, this wasn't the issue. Adding an explicit report when the key is not present, confirms the key file is untouched:

echo "Removing keys in : ${NEW_CONF}"

for i in ${list_del_key} ; do

     if [ "$(echo ${i} | grep -c key_nproc )" -ne 0                                      ]; then
        sed -e "s/key_nproc[ij]=.* //" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \
	    >  ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp
        mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm
        echo " "
     elif [ "$(cat ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm | grep -c "$i" )" -ne 0 ]; then
         sed -e "s/\b${i}\b//" ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm \
	     >  ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp
         mv ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm.tmp ${CONFIG_DIR}/${NEW_CONF}/cpp_${NEW_CONF}.fcm
         echo "deleted key $i in ${NEW_CONF}"
     else
       echo $i "is not present and does not need to be deleted"
     fi

done

It was rebuilding because I had genuinely changed the keys. There is probably something to tidy up here though.

comment:6 Changed 4 years ago by acc

In 14867:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:7 Changed 4 years ago by acc

Some details on changeset: 14867
The full set of command-line options for sette.sh is now:

./sette.sh -h
sette.sh with no arguments (in this case all configuration will be tested with default options)
-T to set ln_timing false for all non-AGRIF configurations (default: true)
-t set ln_tile false in all tests that support it (default: true)
-e set nn_hls=1 (default: nn_hls=2)
-i set ln_icebergs false (default: true)
-C set nn_comm=1 (default: nn_comm=2 ==> use MPI3 collective comms)
-z to remove the key_nosignedzero key (default: added)
-q to remove the key_qco key (default: added)
-X to remove the key_xios key (default: added)
-F to remove the key_loop_fusion key (default: added)
-Q to remove the key_RK3 key (currently a null-op since key_RK3 is not used)
-A to run tests in attached (SPMD) mode (default: MPMD with key_xios)
-n "CFG1_to_test CFG2_to_test ..." to test some specific configurations
-x "TEST_type TEST_type ..." to specify particular types of test (RESTART is mandatory)
-v "subdir" optional validation record subdirectory to be created below NEMO_VALIDATION_DIR
-d to perform a dryrun to simply report what settings will be used
-c to clean each configuration
-s to synchronise the sette MY_SRC and EXP00 with the reference MY_SRC and EXPREF

and all namelist and key setting options are on by default. There is a dryrun option (-d) which will report what settings will be used, i.e.:

./sette.sh -i -T -e -F -t -v HALO1 -n "WED025 ISOMIP+" -d
-i: ln_icebergs will be set to false

-T: ln_timing will be set to false

-e: nn_hls will be set to 1

-F: key_loop_fusion will not be activated

-t: ln_tile will be set to false

-v: HALO1 validation sub-directory requested

==================================
-n: Configurations WED025 ISOMIP+ will be tested if they are available


Carrying out the following tests  : RESTART REPRO PHYOPTS CORRUPT
requested by the command          : ./sette.sh -i -T -e -F -t -v HALO1 -n WED025 ISOMIP+ -d
USING_TIMING                      : no
USING_ICEBERGS                    : no
USING_EXTRA_HALO                  : no
USING_TILING                      : no
USING_COLLECTIVES                 : yes
USING_NOSIGNED0                   : yes
USING_QCO                         : yes
USING_LOOP_FUSION                 : no
USING_XIOS                        : yes
USING_MPMD                        : yes
USING_RK3                         : no
Common compile keys to be added   : key_xios key_nosignedzero key_qco
Common compile keys to be deleted : key_loop_fusion key_RK3
Validation records to appear under: /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/HALO1
==================================

dryrun only: no tests performed

Having all options on by default causes some issues since not all configurations can accept all choices. Notably, those using linssh=.true. or ln_hpg_isf=.true. can't accept key_qco. The key is specifically removed for those configurations (in sette_reference-configurations.sh and sette_test-cases.sh) but it isn't good to have such changes buried in the scripts. It may be better to have another script which invokes sette.sh twice with different command-line options; once for those configurations not using key_qco and once for the rest). The top-level information reported by sette.sh would then accurately reflect the options that each configuration was run with.

param.cfg is now used only by sette.sh and variables exported to other scripts.

One design choice that may be up for debate has been to change the arrangement of the validation records. The -v option allows a subdirectory to be defined for the results of the current set of tests. This subdirectory will reside below NEMO_VALIDATION_DIR (as defined in param.cfg). sette_rpt.sh now accepts the same -v argument for re-running the report. The layout of the reports has been changed in two ways:

  • The directory order is now: machine-->revision-->config-->exp
  • config names (in the records) no longer have the W prefix and have the trailing _ST stripped

Thus a typical layout for the example command shown above is:

./NEMO_VALIDATION
   |-HALO1
     |---X86_ARCHER2-Cray
       |-----14845+
         |-------AGRIF_DEMO
           |---------LONG
           |---------ORCA2
           |---------REPRO_2_8
           |---------REPRO_4_4
           |---------SHORT
         |-------AGRIF_DEMO_NOAGRIF
           |---------ORCA2
         |-------AMM12
           |---------LONG
           |---------REPRO_4_8
           |---------REPRO_8_4
           |---------SHORT
         |-------GYRE_PISCES
           |---------LONG
           |---------REPRO_2_4
           |---------REPRO_4_2
           |---------SHORT
.
.
.

This changeset has been tested and successfully runs the full suite. Some tidying up remains to be done and the associated scripts, such as sette_list_avail_rev.sh still need to be adapted to the new layout. It should be complete enough to test more widely.

comment:8 Changed 4 years ago by acc

In 14869:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:9 Changed 4 years ago by acc

In 14870:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:10 Changed 4 years ago by acc

In 14873:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:11 Changed 4 years ago by acc

In 14874:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:12 Changed 4 years ago by acc

In 14876:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:13 Changed 4 years ago by acc

In 14878:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:14 Changed 4 years ago by acc

In 14879:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:15 Changed 4 years ago by acc

In 14881:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:16 Changed 4 years ago by acc

In 14883:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:17 Changed 4 years ago by acc

In 14884:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:18 Changed 4 years ago by acc

In 14887:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:19 Changed 4 years ago by acc

In 14888:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:20 Changed 4 years ago by acc

In 14890:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:21 Changed 4 years ago by acc

In 14893:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:22 Changed 4 years ago by acc

In 14895:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:23 Changed 4 years ago by acc

In 14897:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

The super_sette.sh example suggests full SETTE testing with the default options and the closest equivalents with a single halo and without key_qco. The results of these tests go into the MAIN, HALO1 and NO_QCO subdirectories respectively. Additional test are slso performed with ORCA2_ICE_PISCES only without icebergs for nn_hls =1 and 2 and the default set without MPI3 collectives. These results go into the NO_ICB1, NO_ICB2 and NO_COLL subdirectories, respectively. sette_eval.sh can be used for a quick evaluation of differences. For example:

./sette_eval.sh -V MAIN -v HALO1 -R 14886

Current code is : NEMO/trunk @ r14896  ( last change @ r14886 )

SETTE evaluation for :

       NEMO/trunk @ r14886 (last changed revision)

       on X86_ARCHER2-Cray arch file


   !----result comparison check----!

check result differences between :
VALID directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/HALO1 at rev 14886
and
REFERENCE directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/MAIN at rev 14886

GYRE_PISCES           run.stat    files are identical
GYRE_PISCES           tracer.stat files are identical
ORCA2_ICE_PISCES      run.stat    files are DIFFERENT (results are different after  71  time steps)
ORCA2_ICE_PISCES      tracer.stat files are DIFFERENT (results are different after  73  time steps)
ORCA2_OFF_PISCES      tracer.stat files are DIFFERENT (results are different after  17  time steps)
AMM12                 run.stat    files are identical
ORCA2_SAS_ICE         run.stat    files are identical
AGRIF_DEMO            run.stat    files are identical
WED025                run.stat    files are identical
ISOMIP+               run.stat    files are identical
VORTEX                run.stat    files are identical
ICE_AGRIF             run.stat    files are identical
OVERFLOW              run.stat    files are identical
LOCK_EXCHANGE         run.stat    files are identical
SWG                   run.stat    files are identical

or in "quiet mode":

./sette_eval.sh -V MAIN -v HALO1 -R 14886 -q
3  differences from 13 matches.

It is also useful when tests have only been run with a few configurations. Take the ORCA2_ICE_PISCES only tests suggested by super_sette.sh, for example:

./sette_eval.sh -V MAIN -v NO_ICB2 -R 14886 -q
2  differences from 1 matches. 0 missing from REFERENCE 12 missing from VALID

The two differences here being the run.stat and tracer.stat differences for ORCA2_ICE_PISCES. Reassuringly:

./sette_eval.sh -V NO_ICB2 -v NO_ICB1 -R 14886 -q
0  differences from 1 matches. 12 missing from REFERENCE 12 missing from VALID

and:

./sette_eval.sh -V MAIN -v NO_COLL -R 14886 -q
0  differences from 1 matches. 0 missing from REFERENCE 12 missing from VALID

TBI:

./sette_eval.sh -V MAIN -v NO_QCO -R 14886 -q
7  differences from 12 matches. 0 missing from REFERENCE 1 missing from VALID
Last edited 4 years ago by acc (previous) (diff)

comment:24 Changed 4 years ago by acc

A note on the differences observed between the MAIN and NO_QCO sets, namely:

./sette_eval.sh -V MAIN -v NO_QCO -R 14886 -q
7  differences from 12 matches. 0 missing from REFERENCE 1 missing from VALID

or in full:

check result differences between :
VALID directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/NO_QCO at rev 14886
and
REFERENCE directory : /work/n01/n01/acc/NEMO/2021/midmerge/dev_sette/NEMO_VALIDATION/MAIN at rev 14886

GYRE_PISCES           run.stat    files are identical
GYRE_PISCES           tracer.stat files are identical
ORCA2_ICE_PISCES      run.stat    files are DIFFERENT (results are different after  1  time steps)
ORCA2_ICE_PISCES      tracer.stat files are DIFFERENT (results are different after  2  time steps)
ORCA2_OFF_PISCES      tracer.stat files are identical
AMM12                 run.stat    files are DIFFERENT (results are different after  1  time steps)
ORCA2_SAS_ICE         run.stat    files are identical
AGRIF_DEMO            run.stat    files are DIFFERENT (results are different after  1  time steps)
WED025                run.stat    files are identical
ISOMIP+               run.stat    files are identical
VORTEX                run.stat    files are DIFFERENT (results are different after  1  time steps)
ICE_AGRIF             run.stat    files are identical
OVERFLOW              run.stat    files are DIFFERENT (results are different after  1  time steps)
LOCK_EXCHANGE         run.stat    files are DIFFERENT (results are different after  3  time steps)
SWG                          VALID     directory at 14886 is MISSING

Sibylle has reminded me that differences are to be expected between QCO and non-QCO runs because of small differences in the way some of the vertical metrics are computed. The apparent inconsistency in the list above is due to the fact that key_qco is not applied to those configurations that use either ln_linssh=.true. or ln_hpg_isf=.true.. In these cases (GYRE_PISCES, SAS, OFF, WED025, ICE_AGRIF and ISOMIP+) the same code is tested regardless of USING_QCO and so the results are identical. Finally SWG missing from the NO_QCO set is also expected since this configuration can only be run with key_qco.

comment:25 Changed 4 years ago by acc

In 14980:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:26 Changed 4 years ago by acc

  • Owner set to acc
  • Resolution set to fixed
  • Status changed from new to closed

In 14981:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found
Note: See TracTickets for help on using tickets.