Opened 3 weeks ago

Last modified 35 hours ago

#2418 new Bug

model does not stop properly in stpctl

Reported by: smasson Owned by: systeam
Priority: low Milestone:
Component: MULTIPLE Version: trunk
Severity: minor Keywords:
Cc:

Description

Context

If an error is detected in stpctl, the model will or will not stop properly according to the namelist choices used for sn_cfctl

Analysis

For example sn_cfctl%l_glochk = .true. and all others sn_cfctl% defined to .false. the model does not stop because of a dead lock problem. The variable lsomeoce is also a potential source of dead lock.

Fix

Large debugging/cleaning of stpctl is needed…

Commit History (7)

ChangesetAuthorTimeChangeLog
12718smasson2020-04-08T17:21:05+02:00

r12581_ticket2418: bugfix for C1D and STATION_ASF, see #2418

12685smasson2020-04-06T11:52:15+02:00

r12581_ticket2418: end cleaning, see #2418

12684smasson2020-04-05T18:47:37+02:00

r12581_ticket2418: additional cleaning, see #2418

12655smasson2020-04-03T11:35:09+02:00

r12581_ticket2418: merge with trunk@12654, see #2418

12623smasson2020-03-28T08:38:26+01:00

r12581_ticket2418: merge with trunk@12622, see #2418

12593smasson2020-03-24T16:52:17+01:00

r12581_ticket2418, first commit see #2418

12582smasson2020-03-21T11:58:26+01:00

r12581_ticket2418: create branch from trunk@12581, see #2418

Change History (9)

comment:1 Changed 3 weeks ago by smasson

In 12582:

r12581_ticket2418: create branch from trunk@12581, see #2418

comment:2 Changed 2 weeks ago by smasson

In 12593:

r12581_ticket2418, first commit see #2418

comment:3 Changed 2 weeks ago by smasson

This is quite a large commit for a bugfix. So I used a branch to share it before merge it back to the trunk.

One of the complexity of this ticket is coming from the duplication of the "same" routines in different directories. Maintaining and synchronizing this directories is not always properly done, especially when these configurations/tests cases are not tested by sette…

There are the main points of this commit

  • fix the reported bug
  • get back the process number of which the error is found
  • add a "CALL SLEEP (60)" in urgent and imperative stop to make sure that all processes have time to write their error messages and their abort file
  • some minor bugfixes: arguments order in one call to dia_wri_state
  • some minor optimisations: use of llmsk, do not look for min/max if not needed etc…
  • general cleaning of stpctl (with more comments to follow which processus is doing what)
  • synchronization of nemogcm, step and stpctl in OCE, SAS, C1D. version of OFF seems to be OK.
  • add error tests in SAS/stpctl
  • partly rewriting stpctl to minimize the differences between stpctl in OCE and SAS.
  • suppres sn_cfctl%l_glochk which had no real use (or I miss it)

There are the missing points of this commit

  • c1d and STATION_ASF are not in the sette tests and must therefore be tested (and included in the sette test, I guess). One must check that this configurations are still working AND that they are properly stoping when an error has to be detected. This last test must be done by forcing an error in step, just before calling stp_ctl
  • I would like to suppress sn_cfctl%l_allon and sn_cfctl%l_config which I find not very usefull
  • test on ice temperature has been move from -100 to -101 as errors where detected in ICE_AGRIF (could not get sette reproducibility). This error in ICE_AGRIF should be solved independently of this ticket
  • we should try to limit the number of subroutines in STATION_ASF/MY_SRC to make it sustainable…

this branch pass all sette tests and gives the same results as trunk@12563:

-bash-4.2$ ./sette_rpt.sh

Current code is : NEMO/branches/2020/r12581_ticket2418 @ r12582  ( last change @ r12582 )

SETTE validation report generated for :

       NEMO/branches/2020/r12581_ticket2418 @ r12582+ (last changed revision)

       on X64_IRENE arch file


!!---------------1st pass------------------!!

   !----restart----!
WGYRE_PISCES_ST              run.stat    restartability  passed :  12582+
WGYRE_PISCES_ST              tracer.stat restartability  passed :  12582+
WORCA2_ICE_PISCES_ST         run.stat    restartability  passed :  12582+
WORCA2_ICE_PISCES_ST         tracer.stat restartability  passed :  12582+
WORCA2_OFF_PISCES_ST         tracer.stat restartability  passed :  12582+
WAMM12_ST                    run.stat    restartability  passed :  12582+
WORCA2_SAS_ICE_ST            run.stat    restartability  passed :  12582+
WAGRIF_DEMO_ST               run.stat    restartability  passed :  12582+
WSPITZ12_ST                  run.stat    restartability  passed :  12582+
WISOMIP_ST                   run.stat    restartability  passed :  12582+
WOVERFLOW_ST                 run.stat    restartability  passed :  12582+
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  12582+
WVORTEX_ST                   run.stat    restartability  passed :  12582+
WICE_AGRIF_ST                run.stat    restartability  passed :  12582+

   !----repro----!
WGYRE_PISCES_ST              run.stat    reproducibility passed :  12582+
WGYRE_PISCES_ST              tracer.stat reproducibility passed :  12582+
WORCA2_ICE_PISCES_ST         run.stat    reproducibility passed :  12582+
WORCA2_ICE_PISCES_ST         tracer.stat reproducibility passed :  12582+
WORCA2_OFF_PISCES_ST         tracer.stat reproducibility passed :  12582+
WAMM12_ST                    run.stat    reproducibility passed :  12582+
WORCA2_SAS_ICE_ST            run.stat    reproducibility passed :  12582+
WORCA2_ICE_OBS_ST            run.stat    reproducibility passed :  12582+
WAGRIF_DEMO_ST               run.stat    reproducibility passed :  12582+
WSPITZ12_ST                  run.stat    reproducibility passed :  12582+
WISOMIP_ST                   run.stat    reproducibility passed :  12582+
WVORTEX_ST                   run.stat    reproducibility passed :  12582+
WICE_AGRIF_ST                run.stat    reproducibility passed :  12582+

   !----agrif check----!
ORCA2 AGRIF vs ORCA2 NOAGRIF run.stat    unchanged  -    passed :  12582+ 12582+

   !----result comparison check----!

check result differences between :
VALID directory : /ccc/scratch/cont005/ra0542/massons/r12581_ticket2418/NEMO_VALIDATION at rev 12582+
and
REFERENCE directory : /ccc/scratch/cont005/ra0542/massons/trunk/NEMO_VALIDATION at rev 12563

WGYRE_PISCES_ST       run.stat    files are identical
WGYRE_PISCES_ST       tracer.stat files are identical
WORCA2_ICE_PISCES_ST  run.stat    files are identical
WORCA2_ICE_PISCES_ST  tracer.stat files are identical
WORCA2_OFF_PISCES_ST  tracer.stat files are identical
WAMM12_ST             run.stat    files are identical
WISOMIP_ST            run.stat    files are identical
WORCA2_SAS_ICE_ST     run.stat    files are identical
WAGRIF_DEMO_ST        run.stat    files are identical
WSPITZ12_ST           run.stat    files are identical
WISOMIP_ST            run.stat    files are identical
WVORTEX_ST            run.stat    files are identical
WICE_AGRIF_ST         run.stat    files are identical

comment:4 Changed 13 days ago by smasson

In 12623:

r12581_ticket2418: merge with trunk@12622, see #2418

comment:5 Changed 7 days ago by smasson

In 12655:

r12581_ticket2418: merge with trunk@12654, see #2418

comment:6 Changed 4 days ago by smasson

In 12684:

r12581_ticket2418: additional cleaning, see #2418

comment:7 Changed 4 days ago by smasson

this branch pass all sette tests with GCC and gives the same results as trunk@12650:

Current code is : NEMO/branches/2020/r12581_ticket2418 @ r12683  ( last change @ r12655 )

SETTE validation report generated for :

       NEMO/branches/2020/r12581_ticket2418 @ r12655+ (last changed revision)

       on X64_IRENE_GCC arch file


!!---------------1st pass------------------!!

   !----restart----!
WGYRE_PISCES_ST              run.stat    restartability  passed :  12655+
WGYRE_PISCES_ST              tracer.stat restartability  passed :  12655+
WORCA2_ICE_PISCES_ST         run.stat    restartability  passed :  12655+
WORCA2_ICE_PISCES_ST         tracer.stat restartability  passed :  12655+
WORCA2_OFF_PISCES_ST         tracer.stat restartability  passed :  12655+
WAMM12_ST                    run.stat    restartability  passed :  12655+
WORCA2_SAS_ICE_ST            run.stat    restartability  passed :  12655+
WAGRIF_DEMO_ST               run.stat    restartability  passed :  12655+
WSPITZ12_ST                  run.stat    restartability  passed :  12655+
WISOMIP_ST                   run.stat    restartability  passed :  12655+
WOVERFLOW_ST                 run.stat    restartability  passed :  12655+
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  12655+
WVORTEX_ST                   run.stat    restartability  passed :  12655+
WICE_AGRIF_ST                run.stat    restartability  passed :  12655+

   !----repro----!
WGYRE_PISCES_ST              run.stat    reproducibility passed :  12655+
WGYRE_PISCES_ST              tracer.stat reproducibility passed :  12655+
WORCA2_ICE_PISCES_ST         run.stat    reproducibility passed :  12655+
WORCA2_ICE_PISCES_ST         tracer.stat reproducibility passed :  12655+
WORCA2_OFF_PISCES_ST         tracer.stat reproducibility passed :  12655+
WAMM12_ST                    run.stat    reproducibility passed :  12655+
WORCA2_SAS_ICE_ST            run.stat    reproducibility passed :  12655+
WORCA2_ICE_OBS_ST            run.stat    reproducibility passed :  12655+
WAGRIF_DEMO_ST               run.stat    reproducibility passed :  12655+
WSPITZ12_ST                  run.stat    reproducibility passed :  12655+
WISOMIP_ST                   run.stat    reproducibility passed :  12655+
WVORTEX_ST                   run.stat    reproducibility passed :  12655+
WICE_AGRIF_ST                run.stat    reproducibility passed :  12655+

   !----agrif check----!
ORCA2 AGRIF vs ORCA2 NOAGRIF run.stat    unchanged  -    passed :  12655+ 12655+

   !----result comparison check----!

check result differences between :
VALID directory : /ccc/work/cont005/ra0542/massons/NEMO_ALL_VALIDATIONS/trunk/NEMO_VALIDATION at rev 12655+
and
REFERENCE directory : /ccc/work/cont005/ra0542/massons/NEMO_ALL_VALIDATIONS/trunk/NEMO_VALIDATION at rev 12650

WGYRE_PISCES_ST       run.stat    files are identical
WGYRE_PISCES_ST       tracer.stat files are identical
WORCA2_ICE_PISCES_ST  run.stat    files are identical
WORCA2_ICE_PISCES_ST  tracer.stat files are identical
WORCA2_OFF_PISCES_ST  tracer.stat files are identical
WAMM12_ST             run.stat    files are identical
WISOMIP_ST            run.stat    files are identical
WORCA2_SAS_ICE_ST     run.stat    files are identical
WAGRIF_DEMO_ST        run.stat    files are identical
WSPITZ12_ST           run.stat    files are identical
WISOMIP_ST            run.stat    files are identical
WVORTEX_ST            run.stat    files are identical
WICE_AGRIF_ST         run.stat    files are identical

comment:8 Changed 4 days ago by smasson

In 12685:

r12581_ticket2418: end cleaning, see #2418

comment:9 Changed 35 hours ago by smasson

In 12718:

r12581_ticket2418: bugfix for C1D and STATION_ASF, see #2418

Note: See TracTickets for help on using tickets.