Version 37 (modified by cbricaud, 7 years ago) (diff) |
---|
Status of work on dev_merge_2017 branch
-
Status of work on dev_merge_2017 branch
- 26/03/2018
- 16 February 2018 : METO: eORCA1 and eORCA025 occasional failures with …
- 12 February 2018 : ORCA2_LIM3_PISCES with icebergs is not reproducible
- 12 February 2018 : ORCA2_LIM3_PISCES reproducibility problem
- 02 February 2018: crash ORCA2_LIM3_PISCES (SOLVED)
- 01 February 2018: reproducibility problems come from CNRS branch
- 30 January 2018: NOC: eORCA1 testing: Pb with Weddell Sea Polynya
- 12 January 2018: Pb in ORCA2_LIM3_OBS (SOLVED)
- 12 January 2018: Pb ORCA2OFFPIS (SOLVED)
- 12 January 2018: Pb lib_mpp ? (SOLVED in r9425)
- 11 January 2018: pb with dynspg_ts, time splitting (SOLVED)
- 11 January 2017 : AGRIF not restartable nor reproducible for now with …
- SETTE test results
Last edition on Wikinfo(changed_ts)? by Wikinfo(changed_by)?
The 2017/dev_merge_2017 branch has been created by merging all the 2017 developments during Merge party in Exeter. This branch is expected to be back into the trunk as the preliminary version of future NEMO release as soon as possible in 2018. This page allows developer to share information and results on the on going testing work on this branch.
For each new subject, please follow format as below (== for title level 2) in order to have them listed at top of page, and add SOLVED in the title when it is done.
26/03/2018
Clément: ORCA2 8*8*60 crashes
ORCA2LIMPIS: no icb, fsbc=5, intel/openmpi -O0: repro OK 1000 timesteps
ORCA2LIMPIS: no icb, fsbc=1, intel/openmpi -O0: repro OK, but crash 850 timesteps
16 February 2018 : METO: eORCA1 and eORCA025 occasional failures with NaNs (SOLVED in r9415)
I'm getting occasional failures with a segmentation fault in icb_ground in eORCA1 and eORCA025 built from the latest dev_merge_2017 branch (after several months of integration). I think icb_ground is acting as an error trap for NaN values produced in ice_dyn_rhg_evp. I will try to trace it back further next week. (Dave Storkey).
Update 21 Feb: This looks like some kind of instability which doesn't get picked up by the checks in stp_ctl. There is very thick ice (~100m) and a negative top-level thickness (e3t) at a single point which I think results in a divide-by-zero in sbc_isf_div. It looks different to the instability in the the 300 timestep ORCA2LIM3PIS_ST run in SETTE, which doesn't show any abnormally thick ice.
Update 21 March: The Ice thickness is not "low capped" by rn_himin before thermodynamics is running. In case of very small ice thickness, the ice diffusion blow up.
12 February 2018 : ORCA2_LIM3_PISCES with icebergs is not reproducible
12 February 2018 : ORCA2_LIM3_PISCES reproducibility problem
This continues the point from Feb 1rst and summarizes the work done and the mail exchanges by Clement Bricaud, Andrew Coward, Pierre Mathiot Tomas Lovato and others.
There is a problem for now on the MPI reproducibility of ORCA2_LIM3_PISCES without icebergs on head of dev_merge_2017 branch. All the following tests are made on ORCA2_LIM3_PISCES without icebergs
- head of dev_merge 2017 (add rev number): repro. test fails after xxx?? timsteps
- dev_merge 2017 rev 9018: repro. test OK after 1000 timsteps (this revision is the creation of the dev_merge_2017 from the trunk, before merge of developments
- dev_mercator_2017 and dev_METO_2017 repro. test OK after 1000 timsteps
- dev_CNRS_2017 repro test fails after 316 timsteps
The above results indicates that the repro. Failure comes from the dev_CNRS_2017 branch. This branch must now be investigated to see which revision causes the failure et find the error. (Now ongoing at CNRS)
02 February 2018: crash ORCA2_LIM3_PISCES (SOLVED)
a bug were found in dev_METO_MERCATOR_2017; now ORCA2 run 1500 timesteps: https://forge.ipsl.jussieu.fr/nemo/ticket/2006#ticket
change: https://forge.ipsl.jussieu.fr/nemo/changeset?reponame=&new=9337%40branches%2F2017%2Fdev_merge_2017%2FNEMOGCM%2FNEMOold=9328%40branches%2F2017%2Fdev_merge_2017%2FNEMOGCM%2FNEMO Commit by Jerome at r9337
but ORCA2 is still not reproducible
01 February 2018: reproducibility problems come from CNRS branch
dev_merge_2017 (rev9271) : ORCA2LIM3PIS not reproducible and crashes at 179 timesteps ( so restartabily NOK because this run 150 timesteps )
dev_METO_MERCATOR_2017 : ORCA2LIM3PIS reproducible and crashes at 134 timesteps ( so restartabily NOK because this run 150 timesteps )
dev_METO_2017 : ORCA2LIM3PIS reproducible and run 1500 timesteps
dev_MERCATOR_2017(rev8976): ORCA2LIM3PIS reproducible and crashes at 132 timesteps( so restartabily NOK because this run 150 timesteps )
dev_MERCATOR_2017(rev8973): ORCA2LIM3PIS reproducible and crashes at 132 timesteps( so restartabily NOK because this run 150 timesteps )
dev_MERCATOR_2017(rev8922): ORCA2LIM3PIS reproducible and run 1500 timesteps
30 January 2018: NOC: eORCA1 testing: Pb with Weddell Sea Polynya
George has been running an eORCA1 configuration with the dev_merge_2017 branch. The configuration runs successfully but fails after 6 years (from 1978) with issues associated with a Weddell Sea polynya. Initial impression is also that the mixed layer depths (using OSMOSIS) are shallower than in runs using the pre-merge development branch.
12 January 2018: Pb in ORCA2_LIM3_OBS (SOLVED)
Sette, as it is, is successful with ORCA2_LIM3_OBS on MetO machine BUT ocean output contains an E R R O R. Only one time step is present in run.stat. Error is in dia_obs, the sea-ice model is not detected by dia_obs. I (PM) added a test in sette_rpt (r9223) to catch these cases (it looks for E R R O R in ocean.output). Not perfect (we should compare the number of lines in run.stat to nitend as well) but good enough for now I think. Solved at revision 9354.
12 January 2018: Pb ORCA2OFFPIS (SOLVED)
Sette ORCA2OFFPIS failed to run on MetO machine. It stop during the initialisation phase. I (PM) added a test on the presence of the config directory in NEMO_VALIDATION repository to catch it (r9221).
12 January 2018: Pb lib_mpp ? (SOLVED in r9425)
Dave Storkey tested the dev_merge_2017@9209. eORCA1 job currently blows up on the first timestep with this lib_mpp issue. eORCA025 job runs OK for a month. He thinks that it does look like it might be an F-pivot/T-pivot thing.
UPDATE 15 Feb: The failure is a segmentation fault in a deallocate statement in mpp_nfd_3d_ptr in a piece of code protected by l_north_nogather=.true. If I switch ln_nnogather to .false. (default value) then it runs OK (which probably explains why George can run ORCA1).
11 January 2018: pb with dynspg_ts, time splitting (SOLVED)
Clément Rousset thinks there might have been a problem during merge of developments with dynspg_ts: looks like drag is computed twice so that energy is wrong and too big. This may be the reason for the crash of ORCA2 configuration for now after 200 timesteps (so that is is not send through SETTE tests sopping after 150 timesteps! Could Jérôme have a look at this?
update at 09/03/2018: The fix was already done and was not the cause of ORCA2 crash and no-reproducibility
11 January 2017 : AGRIF not restartable nor reproducible for now with sea-ice
Clément Rousset and Simona are working on it, building a test case for that. It is an important issue to solve at first.
Please add new subjects ABOVE this line
SETTE test results
To be completed:
Date | revision | who did it? | SETTE results | Comments |
Dec 2017 | end merge party | all ok?? | ||
01/02/2017 | 9274 | Mercator(ClémentB) | GYRE OK, ORCA2LIM3PIS restart ok, repro nok ( in O0), ORCA2_OFF_PISCES AMM12 SAS ISOMIP ORCA2_LIM3_OBS OK, ORCA2AGR_ST restar OK, repro NOK,ORCA2AGUL_NOZOOM_ST/AGRIFNOZ/run.stat =/ ORCA2AGUL_NOAGR_ST/AGRIFNO/run.stat | ORCA2LIM3PIS_ST crashes at 179 timesteps |
22/02/2018 | 9354 | MET-Office | For ORCA2_LIM3_PISCES need to move from -O2 compilation option to -O0 | |
01/10/2017 | INGV | all ok | pb to create rebuild nemo executable with NETCDF4 (ok with NETCDF3) | |
01/10/2017 | CMCC | all ok | ||
01/30/2018 | 9286 | INGV | WORCA2LIM3PIS_ST reproducibility FAILED; WORCA2AGR_ST reproducibility FAILED; AGRIFNOZ vs AGRIFNO FAILED; WORCA2_LIM3_OBS_ST restartability NOT TESTED; tests for the other CONFIGS are OK |