Opened 4 years ago

Closed 4 years ago

#1639 closed Task (fixed)

Reactivate Trusting on trunk prior MP2015

Reported by: nicolasmartin Owned by: nemo
Priority: normal Milestone: 2015 WP
Component: env Version: trunk
Severity: Keywords: trusting
Cc: Review:
MP ready?:
Progress:

Description

All the trusting lights have turned red since the merge of the simplification branch dev_r5721_CNRS9_NOC3_LDF.
Now that the trunk is frozen for the Merge Party (revision 5936), it would be useful to fix it or at least having identified where are the potential issues.

Commit History (0)

(No commits)

Change History (3)

comment:1 Changed 4 years ago by nicolasmartin

Regarding the 2 versions of the code (3.6 & trunk) and for now 3 configurations (ORCA2, ORCA1 & AMM12) followed by the tool, I summarized the mainly events on each combination

  • 3.6 (commit 5519)
  • trunk : a status is on standby regarding the merge of simplification branch, the change of tracers advection scheme from TVD to FCT seems to be the only modification
    • ORCA2: actual benchmark from AGRIF merge with its minor corrections 5798 & 5803.
    • ORCA1 (SHACONEMO 45): configuration is down since the merge, no personal ability to put this in working
    • AMM12: no changes since 3.6 release except last merge

Cdo diffn on ORCA2 results:

O2LP_trust_00000150_restart.nc            : no record differ

O2LP_trust_00000150_restart_ice.nc        : 49 of 53 records differ
Parameter names : frld fsbbq hicif hsnif qstoif sist stress1 stress12 stress2 sxa sxc0 sxc1 sxc2 sxice sxsn sxst sxxa sxxc0 sxxc1 sxxc2 sxxice sxxsn sxxst sxya sxyc0 sxyc1 sxyc2 sxyice sxysn sxyst sya syc0 syc1 syc2 syice sysn syst syya syyc0 syyc1 syyc2 syyice syysn syyst tbif1 tbif2 tbif3 u v

O2LP_trust_00000150_restart_trc.nc (Ada)  : 1495 of 1549 records differ,  1267 of 1549 records differ more than 0.001
O2LP_trust_00000150_restart_trc.nc (Curie): 1495 of 1549 records differ,  941 of 1549 records differ more than 0.001
Parameter names : PH sbc_Alkalini sbc_BFe sbc_CaCO3 sbc_DCHL sbc_DFe sbc_DIC sbc_DOC sbc_DSi sbc_Fer sbc_GOC sbc_GSi sbc_NCHL sbc_NFe sbc_NH4 sbc_NO3 sbc_O2 sbc_PHY sbc_PHY2 sbc_PO4 sbc_POC sbc_SFe sbc_Si sbc_ZOO sbc_ZOO2 Silicamax TRBAlkalini TRBBFe TRBCaCO3 TRBDCHL TRBDFe TRBDIC TRBDOC TRBDSi TRBFer TRBGOC TRBGSi TRBNCHL TRBNFe TRBNH4 TRBNO3 TRBO2 TRBPHY TRBPHY2 TRBPO4 TRBPOC TRBSFe TRBSi TRBZOO TRBZOO2 TRNAlkalini TRNBFe TRNCaCO3 TRNDCHL TRNDFe TRNDIC TRNDOC TRNDSi TRNFer TRNGOC TRNGSi TRNNCHL TRNNFe TRNNH4 TRNNO3 TRNO2 TRNPHY TRNPHY2 TRNPO4 TRNPOC TRNSFe TRNSi TRNZOO TRNZOO2

Cdo diffn on AMM12 results:

cdo diffn (Abort): Input streams have different number of variables per timestep!

comment:2 Changed 4 years ago by nicolasmartin

Following the point discussed during last NEMO ST VC, an update on the status of the trusting on 3.6 after the merge of the MP temporary branch at commit 6205 on 4th of January (each compared to last benchmark):

  • ORCA2

New results from commit 6113 Langmuir cell parameterization should not be used below sea ice by Clément on 18th December

ocean.output                            : abs(U) max:   0.2870626203516  | abs(U) max:   0.2870626203508
                                        :    SSS min:   4.52066224011752 |    SSS min:   4.52048581417686
                                        : ----TRACER STAT----
solver.stat (1st diff)                  : it:       1 iter: 310 r: 0.2746742957E-09 b: 0.3984952235E+03 | it:       1 iter: 310 r: 0.2746814246E-09 b: 0.3984949664E+03
tracer.stat (1st diff)                  : 3  0.5506961168E-02 | 3  0.5506961170E-02
O2LP_trust_00000150_restart.nc          : 609 of 645 records differ,  559 of 645 records differ more than 0.001
O2LP_trust_00000150_restart_ice.nc      : 49 of 53 records differ
O2LP_trust_00000150_restart_trc.nc      : 1495 of 1549 records differ,  930 of 1549 records differ more than 0.001

New results from MP merge

ocean.output                            : abs(U) max:   0.2870626203516  | abs(U) max:   0.2870626203508
                                        :    SSS min:   4.52065843917204 |    SSS min:   4.52048581417686
                                        : ----TRACER STAT----
solver.stat (1st diff)                  : it:       1 iter: 310 r: 0.2746742957E-09 b: 0.3984952235E+03 | it:       1 iter: 310 r: 0.2746814246E-09 b: 0.3984949664E+03
tracer.stat (1st diff)                  : 2  0.5506960123E-02 | 2  0.5506960124E-02
O2LP_trust_00000150_restart.nc          : 610 of 645 records differ,  560 of 645 records differ more than 0.001
O2LP_trust_00000150_restart_ice.nc      : 49 of 53 records differ
O2LP_trust_00000150_restart_trc.nc      : 1495 of 1549 records differ,  959 of 1549 records differ more than 0.001

Regarding the forcing archive ORCA2_LIM_nemo_v3.6.tar, 2 files have been added ('sic_01.nc' & 'vel_01.nc')

  • ORCA1

New results from commit 6113

ocean.output                            : abs(U) max:   0.3235192112219 | abs(U) max:   0.2391563786014
                                        : ----TRACER STAT----
solver.stat (1st diff)                  : it :       1 ssh2: 0.6878043461E+03 Umax: 0.3235192112E+00 S | it :       1 ssh2: 0.6878043461E+03 Umax: 0.2391563786E+00 S
tracer.stat (1st diff)                  : 3  0.5504508868E-02 | 3  0.5504508867E-02
O1L3P_trust_00000240_restart.nc         : 1629 of 1902 records differ,  1404 of 1902 records differ more than 0.001
O1L3P_trust_00000240_restart_ice.nc     : 257 of 261 records differ
O1L3P_trust_00000240_restart_trc.nc     : 2887 of 3705 records differ,  2011 of 3705 records differ more than 0.001

New results from MP merge, different behavior compared to ORCA2 (no change on restart.nc & restart_trc.nc)

ocean.output                            : abs(U) max:   0.3235192112219 | abs(U) max:   0.2391563786014
                                        : ----TRACER STAT----
solver.stat (1st diff)                  : it :       1 ssh2: 0.6878043461E+03 Umax: 0.3235192112E+00 S | it :       1 ssh2: 0.6878043461E+03 Umax: 0.2391563786E+00 S
tracer.stat (1st diff)                  : 3  0.5504508868E-02 | 3  0.5504508867E-02
O1L3P_trust_00000240_restart.nc         : no record differ
O1L3P_trust_00000240_restart_ice.nc     : 257 of 261 records differ
O1L3P_trust_00000240_restart_trc.nc     : no record differ

  • AMM12

New results from commit 6036 Fix for ticket #1617 by Tim on 11th December

AMM12_trust_00000576_restart_oce_out.nc : 103 of 1089 records differ

New results from MP merge (no difference with commit 6036)

AMM12_trust_00000576_restart_oce_out.nc : 103 of 1089 records differ


Well, if no one has anything to say with the previous results, I will update the benchmark for each configuration after the upcoming !Developers Committee.

comment:3 Changed 4 years ago by nicolasmartin

  • Resolution set to fixed
  • Status changed from new to closed

With difficulties and delays, I reinstated the trusting on the trunk for the 2 french HPC (Ada & Curie) for now ORCA2LIM2, ORCA1LIM3 et AMM12.
For 3.6 stable, the trusting is always running for the same configurations since the start-up of each monitoring.
There is still a open topic on the result changes related to computing optimizations of a new release of Intel environment on Ada (#1710).

For the short term, we planned to upgrade from XIOS 1 to XIOS 2 on the trunk when the conditions are met (keep XIOS 1 with 3.6) and to add little by little the remaining reference configurations.
For the long-term, I have an development action in order to improve its diagnostics and to get into a more mature tool as SETTE (reproducibility & restartability tests).

Last but no least, I prompt the developers to keep an eye on the trusting results especially just after their commits. Of course, it should be useless if all the tests have been conducted beforehand.

Note: See TracTickets for help on using tickets.