Opened 5 years ago
Closed 5 years ago
#2420 closed Bug (fixed)
nstop>0 on a child grid is not working properly
Reported by: | smasson | Owned by: | systeam |
---|---|---|---|
Priority: | low | Milestone: | |
Component: | AGRIF | Version: | trunk |
Severity: | minor | Keywords: | |
Cc: |
Description
Context
With AGRIF, If an error occurs in the child grid and if this error does not occurs at the last sub-timestep of child grid, the model may not stop properly
Analysis
Agrif children grids are call with
CALL Agrif_Integrate_ChildGrids( stp )
This function will, for example, do 3 calls to stp if the temporal refinement of the chid grid is 3.
The problem is that this function does not take into account the value of the nstop (> 0 if an error occurred).
So, if the child of our example has an error detected at sub-timestep 2, Agrif_Integrate_ChildGrids will still do the third call to stp before "going back" to the parent grid.
Calling step with nstop > 0 may cause several problems on top of the original error (and therefore complicate the understanding of the problem...).
1) any kind of floating point exception (with no proper error message)
2) with xios, iom_context_finalize( cxios_context ) was called at the end to the 2nd sub-timestep as an error was detected:
IF( kstp == nitend .OR. indic < 0 ) THEN CALL iom_context_finalize( cxios_context ) ! needed for XIOS+AGRIF IF(lrxios) CALL iom_context_finalize( crxios_context ) IF( ln_crs ) CALL iom_context_finalize( trim(cxios_context)//"_crs" ) ! ENDIF
Using this cxios_context in the 3rd sub-timestep will create xios errors with stranges messages such as
'/.../xios-2.5/src/type/type_ref_impl.hpp', line 245 -> Not enough data in buffer to unqueue the data.
Fix
1) To me, the simplest fix (which works) is to add a test on nstop at the beginning of step:
#if defined key_agrif IF( nstop > 0 ) return ! avoid to go further if an error was detected during previous time step kstp = nit000 + Agrif_Nb_Step() Kbb_a = Nbb; Kmm_a = Nnn; Krhs_a = Nrhs ! agrif_oce module copies of time level indices
By doing this, all the children grids will "end" their sub-timesteps, we will be able to exit the step loop of the top-parent grid and follow the usual procedure if nstop > 0
2) I think that if, an error was detected, we should not do the call to Agrif_update_all( ) in order to limit additional errors.
So add a test on nstop in the following lines
IF( Agrif_NbStepint() == 0 .AND. nstop == 0 ) THEN CALL Agrif_update_all( ) ! Update all components ENDIF
3) When using Agrif, we should also add something in the ocean.output file saying that if no clear error message is visible, user should also look at the children *_ocean.output files as the error may come from one of the children grids.
Commit History (1)
Changeset | Author | Time | ChangeLog |
---|---|---|---|
12650 | smasson | 2020-04-03T09:27:30+02:00 | trunk: nstop>0 on a child grid is ok, see #2420 |
Change History (2)
comment:1 Changed 5 years ago by smasson
comment:2 Changed 5 years ago by smasson
- Resolution set to fixed
- Status changed from new to closed
After agreement with Rachid, fixed in [12650]
Two comments:
- nstop is share amount parent en children grids even if there is no agrif_do_not_treat as nstop is listed in https://forge.ipsl.jussieu.fr/nemo/browser/vendors/AGRIF/dev/agrif_oce.in (Thank you Rachid!)
- the third point will be done in #2418
This commit pass all sette tests and gives the same results as [12615]
-bash-4.2$ ./sette_rpt.sh 12642+ Current code is : NEMO/trunk @ r12649 ( last change @ r12649 ) SETTE validation report generated for : NEMO/trunk @ r12642+ (last changed revision) on X64_IRENE arch file !!---------------1st pass------------------!! !----restart----! WGYRE_PISCES_ST run.stat restartability passed : 12642+ WGYRE_PISCES_ST tracer.stat restartability passed : 12642+ WORCA2_ICE_PISCES_ST run.stat restartability passed : 12642+ WORCA2_ICE_PISCES_ST tracer.stat restartability passed : 12642+ WORCA2_OFF_PISCES_ST tracer.stat restartability passed : 12642+ WAMM12_ST run.stat restartability passed : 12642+ WORCA2_SAS_ICE_ST run.stat restartability passed : 12642+ WAGRIF_DEMO_ST run.stat restartability passed : 12642+ WSPITZ12_ST run.stat restartability passed : 12642+ WISOMIP_ST run.stat restartability passed : 12642+ WOVERFLOW_ST run.stat restartability passed : 12642+ WLOCK_EXCHANGE_ST run.stat restartability passed : 12642+ WVORTEX_ST run.stat restartability passed : 12642+ WICE_AGRIF_ST run.stat restartability passed : 12642+ !----repro----! WGYRE_PISCES_ST run.stat reproducibility passed : 12642+ WGYRE_PISCES_ST tracer.stat reproducibility passed : 12642+ WORCA2_ICE_PISCES_ST run.stat reproducibility passed : 12642+ WORCA2_ICE_PISCES_ST tracer.stat reproducibility passed : 12642+ WORCA2_OFF_PISCES_ST tracer.stat reproducibility passed : 12642+ WAMM12_ST run.stat reproducibility passed : 12642+ WORCA2_SAS_ICE_ST run.stat reproducibility passed : 12642+ WORCA2_ICE_OBS_ST run.stat reproducibility passed : 12642+ WAGRIF_DEMO_ST run.stat reproducibility passed : 12642+ WSPITZ12_ST run.stat reproducibility passed : 12642+ WISOMIP_ST run.stat reproducibility passed : 12642+ WVORTEX_ST run.stat reproducibility passed : 12642+ WICE_AGRIF_ST run.stat reproducibility passed : 12642+ !----agrif check----! ORCA2 AGRIF vs ORCA2 NOAGRIF run.stat unchanged - passed : 12642+ 12642+ !----result comparison check----! check result differences between : VALID directory : /ccc/scratch/cont005/ra0542/massons/trunk/NEMO_VALIDATION at rev 12642+ and REFERENCE directory : /ccc/work/cont005/ra0542/massons/NEMO_ALL_VALIDATIONS/trunk/NEMO_VALIDATION at rev 12615 WGYRE_PISCES_ST run.stat files are identical WGYRE_PISCES_ST tracer.stat files are identical WORCA2_ICE_PISCES_ST run.stat files are identical WORCA2_ICE_PISCES_ST tracer.stat files are identical WORCA2_OFF_PISCES_ST tracer.stat files are identical WAMM12_ST run.stat files are identical WISOMIP_ST run.stat files are identical WORCA2_SAS_ICE_ST run.stat files are identical WAGRIF_DEMO_ST run.stat files are identical WSPITZ12_ST run.stat files are identical WISOMIP_ST run.stat files are identical WVORTEX_ST run.stat files are identical WICE_AGRIF_ST run.stat files are identical
In 12650: