Opened 6 months ago

Closed 6 months ago

#2420 closed Bug (fixed)

nstop>0 on a child grid is not working properly

Reported by: smasson Owned by: systeam
Priority: low Milestone:
Component: AGRIF Version: trunk
Severity: minor Keywords:
Cc:

Description

Context

With AGRIF, If an error occurs in the child grid and if this error does not occurs at the last sub-timestep of child grid, the model may not stop properly

Analysis

Agrif children grids are call with

CALL Agrif_Integrate_ChildGrids( stp )  

This function will, for example, do 3 calls to stp if the temporal refinement of the chid grid is 3.
The problem is that this function does not take into account the value of the nstop (> 0 if an error occurred).
So, if the child of our example has an error detected at sub-timestep 2, Agrif_Integrate_ChildGrids will still do the third call to stp before "going back" to the parent grid.
Calling step with nstop > 0 may cause several problems on top of the original error (and therefore complicate the understanding of the problem…).

1) any kind of floating point exception (with no proper error message)
2) with xios, iom_context_finalize( cxios_context ) was called at the end to the 2nd sub-timestep as an error was detected:

      IF( kstp == nitend .OR. indic < 0 ) THEN 
                      CALL iom_context_finalize(      cxios_context          ) ! needed for XIOS+AGRIF
                      IF(lrxios) CALL iom_context_finalize(      crxios_context          )
         IF( ln_crs ) CALL iom_context_finalize( trim(cxios_context)//"_crs" ) ! 
      ENDIF

Using this cxios_context in the 3rd sub-timestep will create xios errors with stranges messages such as

'/.../xios-2.5/src/type/type_ref_impl.hpp', line 245 -> Not enough data in buffer to unqueue the data.

Fix

1) To me, the simplest fix (which works) is to add a test on nstop at the beginning of step:

#if defined key_agrif
      IF( nstop > 0 ) return   ! avoid to go further if an error was detected during previous time step 
      kstp = nit000 + Agrif_Nb_Step()
      Kbb_a = Nbb; Kmm_a = Nnn; Krhs_a = Nrhs   ! agrif_oce module copies of time level indices

By doing this, all the children grids will "end" their sub-timesteps, we will be able to exit the step loop of the top-parent grid and follow the usual procedure if nstop > 0

2) I think that if, an error was detected, we should not do the call to Agrif_update_all( ) in order to limit additional errors.
So add a test on nstop in the following lines

IF( Agrif_NbStepint() == 0 .AND. nstop == 0 ) THEN
   CALL Agrif_update_all( )                  ! Update all components
ENDIF

3) When using Agrif, we should also add something in the ocean.output file saying that if no clear error message is visible, user should also look at the children *_ocean.output files as the error may come from one of the children grids.

Commit History (1)

ChangesetAuthorTimeChangeLog
12650smasson2020-04-03T09:27:30+02:00

trunk: nstop>0 on a child grid is ok, see #2420

Change History (2)

comment:1 Changed 6 months ago by smasson

In 12650:

trunk: nstop>0 on a child grid is ok, see #2420

comment:2 Changed 6 months ago by smasson

  • Resolution set to fixed
  • Status changed from new to closed

After agreement with Rachid, fixed in [12650]

Two comments:

  • the third point will be done in #2418

This commit pass all sette tests and gives the same results as [12615]

-bash-4.2$ ./sette_rpt.sh 12642+

Current code is : NEMO/trunk @ r12649  ( last change @ r12649 )

SETTE validation report generated for :

       NEMO/trunk @ r12642+ (last changed revision)

       on X64_IRENE arch file


!!---------------1st pass------------------!!

   !----restart----!
WGYRE_PISCES_ST              run.stat    restartability  passed :  12642+
WGYRE_PISCES_ST              tracer.stat restartability  passed :  12642+
WORCA2_ICE_PISCES_ST         run.stat    restartability  passed :  12642+
WORCA2_ICE_PISCES_ST         tracer.stat restartability  passed :  12642+
WORCA2_OFF_PISCES_ST         tracer.stat restartability  passed :  12642+
WAMM12_ST                    run.stat    restartability  passed :  12642+
WORCA2_SAS_ICE_ST            run.stat    restartability  passed :  12642+
WAGRIF_DEMO_ST               run.stat    restartability  passed :  12642+
WSPITZ12_ST                  run.stat    restartability  passed :  12642+
WISOMIP_ST                   run.stat    restartability  passed :  12642+
WOVERFLOW_ST                 run.stat    restartability  passed :  12642+
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  12642+
WVORTEX_ST                   run.stat    restartability  passed :  12642+
WICE_AGRIF_ST                run.stat    restartability  passed :  12642+

   !----repro----!
WGYRE_PISCES_ST              run.stat    reproducibility passed :  12642+
WGYRE_PISCES_ST              tracer.stat reproducibility passed :  12642+
WORCA2_ICE_PISCES_ST         run.stat    reproducibility passed :  12642+
WORCA2_ICE_PISCES_ST         tracer.stat reproducibility passed :  12642+
WORCA2_OFF_PISCES_ST         tracer.stat reproducibility passed :  12642+
WAMM12_ST                    run.stat    reproducibility passed :  12642+
WORCA2_SAS_ICE_ST            run.stat    reproducibility passed :  12642+
WORCA2_ICE_OBS_ST            run.stat    reproducibility passed :  12642+
WAGRIF_DEMO_ST               run.stat    reproducibility passed :  12642+
WSPITZ12_ST                  run.stat    reproducibility passed :  12642+
WISOMIP_ST                   run.stat    reproducibility passed :  12642+
WVORTEX_ST                   run.stat    reproducibility passed :  12642+
WICE_AGRIF_ST                run.stat    reproducibility passed :  12642+

   !----agrif check----!
ORCA2 AGRIF vs ORCA2 NOAGRIF run.stat    unchanged  -    passed :  12642+ 12642+

   !----result comparison check----!

check result differences between :
VALID directory : /ccc/scratch/cont005/ra0542/massons/trunk/NEMO_VALIDATION at rev 12642+
and
REFERENCE directory : /ccc/work/cont005/ra0542/massons/NEMO_ALL_VALIDATIONS/trunk/NEMO_VALIDATION at rev 12615

WGYRE_PISCES_ST       run.stat    files are identical
WGYRE_PISCES_ST       tracer.stat files are identical
WORCA2_ICE_PISCES_ST  run.stat    files are identical
WORCA2_ICE_PISCES_ST  tracer.stat files are identical
WORCA2_OFF_PISCES_ST  tracer.stat files are identical
WAMM12_ST             run.stat    files are identical
WISOMIP_ST            run.stat    files are identical
WORCA2_SAS_ICE_ST     run.stat    files are identical
WAGRIF_DEMO_ST        run.stat    files are identical
WSPITZ12_ST           run.stat    files are identical
WISOMIP_ST            run.stat    files are identical
WVORTEX_ST            run.stat    files are identical
WICE_AGRIF_ST         run.stat    files are identical
Note: See TracTickets for help on using tickets.