Opened 4 months ago

Last modified 9 days ago

#2605 new Task

KNL-01_Sibylle_RK3_stage1

Reported by: techene Owned by: techene
Priority: high Milestone: IMMERSE 2021
Component: MULTIPLE Version: trunk
Severity: minor Keywords: time-stepping, RK3
Cc: Branch review:
MP ready?: Task progress:

Description

Workplan action

Wikipage: wiki:2021WP/KNL-01_Sibylle_RK3_stage1

Commit History (14)

ChangesetAuthorTimeChangeLog
14808jchanut2021-05-07T14:00:45+02:00

#2605: reproducibility issue with RK3: hdiv

14800jchanut2021-05-06T17:42:46+02:00

#2605: Main changes for a straightforward use of AGRIF with RK3

14799jchanut2021-05-06T15:57:06+02:00

#2605: Correct Vortex setup by reverting some changes introduced @r14223

14784techene2021-05-04T21:08:42+02:00

#2605 : enable diahsb and fix rnf forcing accordingly

14782acc2021-05-04T17:15:21+02:00

Branch dev_r14318_RK3_stage1: minor changes to fix compilation errors with VORTEX test case. stp2d.F90 and stprk3_stg.F90 only. #2605

14604techene2021-03-09T14:12:13+01:00

#2605 : allow time filtering of barotropic variables

14587techene2021-03-04T18:51:35+01:00

#2605 : cosmetic only

14549techene2021-02-26T11:03:18+01:00

#2605 small bug correction in ln_linssh=T

14548techene2021-02-25T18:16:27+01:00

#2605 small bug correction

14547techene2021-02-25T18:07:15+01:00

#2605 RK3 time-stepping on OCE only (run on GYRE_PISCES without key_top)

14428techene2021-02-10T19:12:36+01:00

#2605 loop optimisation from gm

14418techene2021-02-08T21:55:49+01:00

#2605 correct the truncature change of ww induced by diawri when ln_zad_Aimp=T along a time step

14381techene2021-02-03T13:36:25+01:00

#2605 : add advection velocity as input in advective flux form routines

14353techene2021-01-27T15:57:05+01:00

#2605 branch for RK3 time-stepping

Change History (20)

comment:1 Changed 4 months ago by techene

In 14353:

#2605 branch for RK3 time-stepping

comment:2 Changed 3 months ago by techene

In 14381:

#2605 : add advection velocity as input in advective flux form routines

comment:3 Changed 3 months ago by techene

In 14418:

#2605 correct the truncature change of ww induced by diawri when ln_zad_Aimp=T along a time step

comment:4 Changed 3 months ago by techene

In 14428:

#2605 loop optimisation from gm

comment:5 Changed 3 months ago by techene

In 14547:

#2605 RK3 time-stepping on OCE only (run on GYRE_PISCES without key_top)

comment:6 Changed 3 months ago by techene

In 14548:

#2605 small bug correction

comment:7 Changed 3 months ago by techene

In 14549:

#2605 small bug correction in ln_linssh=T

comment:8 Changed 3 months ago by techene

In 14587:

#2605 : cosmetic only

comment:9 Changed 2 months ago by techene

In 14604:

#2605 : allow time filtering of barotropic variables

comment:10 Changed 2 weeks ago by acc

In 14782:

Branch dev_r14318_RK3_stage1: minor changes to fix compilation errors with VORTEX test case. stp2d.F90 and stprk3_stg.F90 only. #2605

comment:11 Changed 2 weeks ago by techene

In 14784:

#2605 : enable diahsb and fix rnf forcing accordingly

comment:12 Changed 12 days ago by jchanut

In 14799:

#2605: Correct Vortex setup by reverting some changes introduced @r14223

comment:13 Changed 12 days ago by jchanut

In 14800:

#2605: Main changes for a straightforward use of AGRIF with RK3

comment:14 Changed 11 days ago by acc

SETTE update: After disappearing down a few rabbit-holes, I've finally traced the reproducibility problems to inconsistent ww values in the halo at RK3 stage3. The halo value is used in dyn_zad when averaging to uw-, vw- points; and this leads to the differences. My, simple-minded solution is to add an lbc_lnk:

  • stprk3_stg.F90

     
    165165      ! 
    166166!===>>>>>> Modify dyn_adv_... dyn_keg routines so that Krhs to zero useless 
    167167      !                                         ! advection (VF or FF)  ==> RHS 
     168      CALL lbc_lnk_multi( 'stp_RK3_stg', ww(:,:,:), 'T', 1. ) 
    168169      CALL dyn_adv( kstp, Kbb, Kmm      , uu, vv, Krhs, zaU, zaV, ww ) 
    169170      !                                         ! Coriolis / vorticity  ==> RHS 
    170171      CALL dyn_vor( kstp,      Kmm      , uu, vv, Krhs ) 

but there may be other solutions (and there may be more to do if ln_zad_Aimp is active). This fixes reproducibility for AMM12 and VORTEX (which are the only two, currently active tests). Interestingly, it also fixes restartability for LOCK_EXCHANGE; but not the others:

Using DEUG settings
   !----restart----!
WAMM12_ST                    run.stat    restartability  FAILED :  14784+  (results are different after  7      time steps)
WOVERFLOW_ST                 run.stat    restartability  FAILED :  14784+  (results are different after  8      time steps)
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  14784+
WVORTEX_ST                   run.stat    restartability  FAILED :  14784+  (results are different after  7      time steps)

   !----repro----!
WAMM12_ST                    run.stat    reproducibility passed :  14784+
WVORTEX_ST                   run.stat    reproducibility passed :  14784+

I'll move on to looking at restartability and won't submit a change until alternative solutions have been ruled out.

comment:15 Changed 11 days ago by jchanut

In 14808:

#2605: reproducibility issue with RK3: hdiv

comment:16 Changed 11 days ago by acc

SETTE update: Confirmation that changeset:14808 more elegantly fixes the reproducibility error described in comment 15:

Using DEBUG settings
   !----restart----!
WAMM12_ST                    run.stat    restartability  FAILED :  14808  (results are different after  7      time steps)
WOVERFLOW_ST                 run.stat    restartability  FAILED :  14808  (results are different after  8      time steps)
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  14808
WVORTEX_ST                   run.stat    restartability  FAILED :  14808  (results are different after  7      time steps)

   !----repro----!
WAMM12_ST                    run.stat    reproducibility passed :  14808
WVORTEX_ST                   run.stat    reproducibility passed :  14808

comment:17 Changed 11 days ago by acc

SETTE update: No answer to the restartability issue yet but I'm puzzled by the dom_qco initialisation. AMM12_ST/SHORT/ocean.output, for example, reports:

 rst_read_ssh : ssh initialization
 ~~~~~~~~~~~~

       Kbb sea surface height read in the restart file
           read sshb (rec:      1) in ./AMM12_LONG_00000006_restart_0000.nc ok

 dom_qco_init : Variable volume activated
 ~~~~~~~~~~~~

but dom_qco_init needs sshb and sshn to set all the r3 fields. I'm puzzled because attempts to fix this by including sshn in the restart don't seem to make any difference.

comment:18 Changed 10 days ago by acc

SETTE update Sibylle confirms that only sshb/r3t(:,:,Kbb) is required for RK3; the initialisation of r3t(:,:,Kmm) in dom_qco_init is to support MLF and is redundant/harmless for RK3. Further tracking of restart issues suggests there are compounded causes:

  • First up is a small difference in uu_b and vv_b between continuous and restarted runs at the start of the restart step. All inputs to the calculation match between the restarted job and the continuous case at the same stage. Possibly this is just a result of the difference in the order of numerical operations? Adding uu_b and vv_b to the restart and disabling their recalculation in istate.F90, removes this difference. The restart error is slightly reduced but not by much.
  • The next discovered difference is in the ts trends after tra_sbc at stage 2. tra_sbc does very little at stage 2. As a test, disabling the updates that use sbc_tsc and rnf_tsc removes any difference. Deciding what is the correct solution is trickier. Note, though that there may be some mixup with the correct before flux for the restart since values appear to be written/overwritten at each stage:
grep sbc_sc_b ocean.output
           read sbc_sc_b (rec:      1) in ./AMM12_LONG_00000006_restart_0000.nc ok
           read sbc_sc_b (rec:      1) in ./AMM12_LONG_00000006_restart_0000.nc ok
           read sbc_sc_b (rec:      1) in ./AMM12_LONG_00000006_restart_0000.nc ok
           iom_nf90_rp0123d, file: ./AMM12_SHORT_00000012_restart_0000.nc, var: sbc_sc_b defined ok
           iom_nf90_rp0123d, file: ./AMM12_SHORT_00000012_restart_0000.nc, var: sbc_sc_b written ok
           iom_nf90_rp0123d, file: ./AMM12_SHORT_00000012_restart_0000.nc, var: sbc_sc_b written ok
           iom_nf90_rp0123d, file: ./AMM12_SHORT_00000012_restart_0000.nc, var: sbc_sc_b written ok
  • Unfortunately, restartability differences persist even after temporarily. by-passing tra_sbc generated differences.

Investigations continue…

comment:19 Changed 10 days ago by acc

SETTE update Ha, turns out sshn IS required. The final difference (detectable in tracer trends after tra_ldf) turns out to be due to slope differences traced back to ldf_slp and eos_rab. This is because buried in these routines is the use of gdept(:,:,:,:,Kmm). This dependency could be removed for eos_rab and bn2 by using Nbb as the last argument instead of Nnn. However, the use of Kmm is more deeply ingrained in ldf_slp (possibly a trunk-based bug?) and the consequences of changing this are less obvious. I leave this to the real experts. Meanwhile, here are the uncommitted changes that can restore restartability:

  1. Additions to restart (need to revisit the uu_b, vv_b case to see if the small differences were significant now the larger errors have been found)
    • DOM/istate.F90

       
      148148      ! Do it whatever the free surface method, these arrays being eventually used 
      149149      ! 
      150150      uu_b(:,:,Kmm) = 0._wp   ;   vv_b(:,:,Kmm) = 0._wp 
      151       uu_b(:,:,Kbb) = 0._wp   ;   vv_b(:,:,Kbb) = 0._wp 
       151      !!uu_b(:,:,Kbb) = 0._wp   ;   vv_b(:,:,Kbb) = 0._wp 
      152152      ! 
      153153!!gm  the use of umsak & vmask is not necessary below as uu(:,:,:,Kmm), vv(:,:,:,Kmm), uu(:,:,:,Kbb), vv(:,:,:,Kbb) are always masked 
      154154      DO_3D( 1, 1, 1, 1, 1, jpkm1 ) 
       
      155155         uu_b(ji,jj,Kmm) = uu_b(ji,jj,Kmm) + e3u(ji,jj,jk,Kmm) * uu(ji,jj,jk,Kmm) * umask(ji,jj,jk) 
      156156         vv_b(ji,jj,Kmm) = vv_b(ji,jj,Kmm) + e3v(ji,jj,jk,Kmm) * vv(ji,jj,jk,Kmm) * vmask(ji,jj,jk) 
      157157         ! 
      158          uu_b(ji,jj,Kbb) = uu_b(ji,jj,Kbb) + e3u(ji,jj,jk,Kbb) * uu(ji,jj,jk,Kbb) * umask(ji,jj,jk) 
      159          vv_b(ji,jj,Kbb) = vv_b(ji,jj,Kbb) + e3v(ji,jj,jk,Kbb) * vv(ji,jj,jk,Kbb) * vmask(ji,jj,jk) 
       158         !!uu_b(ji,jj,Kbb) = uu_b(ji,jj,Kbb) + e3u(ji,jj,jk,Kbb) * uu(ji,jj,jk,Kbb) * umask(ji,jj,jk) 
       159         !!vv_b(ji,jj,Kbb) = vv_b(ji,jj,Kbb) + e3v(ji,jj,jk,Kbb) * vv(ji,jj,jk,Kbb) * vmask(ji,jj,jk) 
      160160      END_3D 
      161161      ! 
      162162      uu_b(:,:,Kmm) = uu_b(:,:,Kmm) * r1_hu(:,:,Kmm) 
      163163      vv_b(:,:,Kmm) = vv_b(:,:,Kmm) * r1_hv(:,:,Kmm) 
      164164      ! 
      165       uu_b(:,:,Kbb) = uu_b(:,:,Kbb) * r1_hu(:,:,Kbb) 
      166       vv_b(:,:,Kbb) = vv_b(:,:,Kbb) * r1_hv(:,:,Kbb) 
       165      !!uu_b(:,:,Kbb) = uu_b(:,:,Kbb) * r1_hu(:,:,Kbb) 
       166      !!vv_b(:,:,Kbb) = vv_b(:,:,Kbb) * r1_hv(:,:,Kbb) 
      167167      ! 
      168168   END SUBROUTINE istate_init 
    • IOM/restart.F90

       
      162162      ! 
      163163      IF( .NOT.ln_diurnal_only ) THEN 
      164164         CALL iom_rstput( kt, nitrst, numrow, 'sshb'   , ssh(:,:        ,Kbb) )     ! before fields 
       165         CALL iom_rstput( kt, nitrst, numrow, 'sshn'   , ssh(:,:        ,Kmm) )     ! before fields 
      165166         CALL iom_rstput( kt, nitrst, numrow, 'ub'     , uu(:,:,:       ,Kbb) ) 
      166167         CALL iom_rstput( kt, nitrst, numrow, 'vb'     , vv(:,:,:       ,Kbb) ) 
       168         CALL iom_rstput( kt, nitrst, numrow, 'uu_b'   , uu_b(:,:       ,Kbb) )     ! before fields 
       169         CALL iom_rstput( kt, nitrst, numrow, 'vv_b'   , vv_b(:,:       ,Kbb) )     ! before fields 
      167170         IF( .NOT.lk_SWE ) THEN 
      168171            CALL iom_rstput( kt, nitrst, numrow, 'tb'  , ts(:,:,:,jp_tem,Kbb) ) 
      169172            CALL iom_rstput( kt, nitrst, numrow, 'sb'  , ts(:,:,:,jp_sal,Kbb) ) 
       
      287290      IF(lwp) WRITE(numout,*) '           Kbb u, v and T-S fields read in the restart file' 
      288291      CALL iom_get( numror, jpdom_auto, 'ub'   , uu(:,:,:       ,Kbb), cd_type = 'U', psgn = -1._wp ) 
      289292      CALL iom_get( numror, jpdom_auto, 'vb'   , vv(:,:,:       ,Kbb), cd_type = 'V', psgn = -1._wp ) 
       293      CALL iom_get( numror, jpdom_auto, 'uu_b'   , uu_b(:,:     ,Kbb), cd_type = 'U', psgn = -1._wp ) 
       294      CALL iom_get( numror, jpdom_auto, 'vv_b'   , vv_b(:,:     ,Kbb), cd_type = 'V', psgn = -1._wp ) 
       295      CALL iom_get( numror, jpdom_auto, 'vb'   , vv(:,:,:       ,Kbb), cd_type = 'V', psgn = -1._wp ) 
      290296      IF( .NOT.lk_SWE ) THEN 
      291297         CALL iom_get( numror, jpdom_auto, 'tb', ts(:,:,:,jp_tem,Kbb) ) 
      292298         CALL iom_get( numror, jpdom_auto, 'sb', ts(:,:,:,jp_sal,Kbb) ) 
       
      377383         CALL iom_get( numror, jpdom_auto, 'sshb'   , ssh(:,:,Kbb) ) 
      378384         ! 
      379385         !                                     !*  RK3: Set ssh at Kmm for AGRIF 
      380          ssh(:,:,Kmm) = 0._wp 
       386         !ssh(:,:,Kmm) = 0._wp 
       387         CALL iom_get( numror, jpdom_auto, 'sshn'   , ssh(:,:,Kmm) ) 
      381388#else 
      382389         !                                     !*  MLF: Read ssh at Kmm 
      383390         IF(lwp) WRITE(numout,*) 
  2. Temporary by-pass of tra_sbc induced differences (NOT A PROPER SOLUTION)
    • TRA/trasbc.F90

       
      161161      ! 
      162162      DO jn = 1, jpts               !==  update tracer trend  ==! 
      163163         DO_2D( 0, 0, 0, 0 ) 
      164             pts(ji,jj,1,jn,Krhs) = pts(ji,jj,1,jn,Krhs) + zfact * ( sbc_tsc_b(ji,jj,jn) + sbc_tsc(ji,jj,jn) )    & 
      165                &                                                / e3t(ji,jj,1,Kmm) 
       164!!          pts(ji,jj,1,jn,Krhs) = pts(ji,jj,1,jn,Krhs) + zfact * ( sbc_tsc_b(ji,jj,jn) + sbc_tsc(ji,jj,jn) )    & 
       165!!             &                                                / e3t(ji,jj,1,Kmm) 
      166166         END_2D 
      167167      END DO 
      168168      ! 
       
      185185#if defined key_RK3 
      186186               zdep = 1._wp / h_rnf(ji,jj) 
      187187               DO jk = 1, nk_rnf(ji,jj) 
      188                                      pts(ji,jj,jk,jp_tem,Krhs) = pts(ji,jj,jk,jp_tem,Krhs)  + rnf_tsc(ji,jj,jp_tem) * zdep 
      189                   IF( ln_rnf_sal )   pts(ji,jj,jk,jp_sal,Krhs) = pts(ji,jj,jk,jp_sal,Krhs)  + rnf_tsc(ji,jj,jp_sal) * zdep 
       188!!                                   pts(ji,jj,jk,jp_tem,Krhs) = pts(ji,jj,jk,jp_tem,Krhs)  + rnf_tsc(ji,jj,jp_tem) * zdep 
       189!!                IF( ln_rnf_sal )   pts(ji,jj,jk,jp_sal,Krhs) = pts(ji,jj,jk,jp_sal,Krhs)  + rnf_tsc(ji,jj,jp_sal) * zdep 
      190190               END DO 
      191191 
      192192#else 

With these changes I can achieve:

./sette.sh -Q -t "AMM12 OVERFLOW VORTEX LOCK_EXCHANGE"
   !----restart----!
WAMM12_ST                    run.stat    restartability  passed :  14808+
WOVERFLOW_ST                 run.stat    restartability  passed :  14808+
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  14808+
WVORTEX_ST                   run.stat    restartability  FAILED :  14808+
   !----repro----!
WAMM12_ST                    run.stat    reproducibility passed :  14808+
WVORTEX_ST                   run.stat    reproducibility passed :  14808+

Just VORTEX holding out now :)

comment:20 Changed 9 days ago by acc

SETTE update: towards a better solution for tra_sbc.

Firstly, a quick check has confirmed that uu_b and vv_b do need to be passed via the restart to ensure restartability.

Secondly, a closer look at tra_sbc suggests several issues:

  1. There is no issue with the rnf terms only the sbc_tsc terms
  2. stprk3_stg does not pass the kstg optional argument so the RK3 logic is not activated
  3. Correcting this is not enough because zfact is then inconsistent between continuous and restarted runs. In fact it is undefined for stage 2. Also the before value of sbc_tsc is written to the restart at the wrong point.

These changes (as an alternative to step 2 in comment 20) recover restartability with an active tra_sbc; but this may not be what is intended for zfact:

  • stprk3_stg.F90

     
    212212 
    213213!===>>>>>> stg1&2:  Verify the necessity of these trends (we may need it as there are in the RHS of dynspg_ts ?) 
    214214!!gm ====>>>>   needed for heat and salt fluxes associated with mass/volume flux 
    215                         CALL tra_sbc( kstp,      Kmm, ts, Krhs )   ! surface boundary condition 
     215                        CALL tra_sbc( kstp,      Kmm, ts, Krhs, kstg )   ! surface boundary condition 
    216216 
    217217      IF( ln_isf )      CALL tra_isf( kstp,      Kmm, ts, Krhs )   ! ice shelf heat flux 
    218218      IF( ln_traqsr )   CALL tra_qsr( kstp,      Kmm, ts, Krhs )   ! penetrative solar radiation qsr 
  • TRA/trasbc.F90

     
    121121      !        EMP, SFX and QNS effects 
    122122      !---------------------------------------- 
    123123      !                             !==  Set before sbc tracer content fields  ==! 
     124      zfact = 0.5_wp 
    124125      IF( kt == nit000 .AND. istg_1 == 1 ) THEN             !* 1st time-step 
    125126         IF( ln_rstart .AND. .NOT.l_1st_euler ) THEN      ! Restart: read in restart file 
    126             zfact = 0.5_wp 
    127127            IF( ntile == 0 .OR. ntile == 1 )  THEN                       ! Do only on the first tile 
    128128               IF(lwp) WRITE(numout,*) '          nit000-1 sbc tracer content field read in the restart file' 
    129129               sbc_tsc(:,:,:) = 0._wp 
     
    142142         DO_2D( isi, iei, isj, iej ) 
    143143            sbc_tsc_b(ji,jj,:) = sbc_tsc(ji,jj,:) 
    144144         END_2D 
     145         IF( ntile == 0 .OR. ntile == nijtile )  THEN                ! Do only on the last tile 
     146            IF( lrst_oce ) THEN           !==  write sbc_tsc in the ocean restart file  ==! 
     147               CALL iom_rstput( kt, nitrst, numrow, 'sbc_hc_b', sbc_tsc(:,:,jp_tem) ) 
     148               CALL iom_rstput( kt, nitrst, numrow, 'sbc_sc_b', sbc_tsc(:,:,jp_sal) ) 
     149            ENDIF 
     150         ENDIF 
    145151      ENDIF 
    146152      !                             !==  Now sbc tracer content fields  ==! 
    147153      DO_2D( isi, iei, isj, iej ) 
     
    166172         END_2D 
    167173      END DO 
    168174      ! 
    169       IF( ntile == 0 .OR. ntile == nijtile )  THEN                ! Do only on the last tile 
    170          IF( lrst_oce ) THEN           !==  write sbc_tsc in the ocean restart file  ==! 
    171             CALL iom_rstput( kt, nitrst, numrow, 'sbc_hc_b', sbc_tsc(:,:,jp_tem) ) 
    172             CALL iom_rstput( kt, nitrst, numrow, 'sbc_sc_b', sbc_tsc(:,:,jp_sal) ) 
    173          ENDIF 
    174       ENDIF 
    175175      ! 
    176176      !---------------------------------------- 
    177177      !        River Runoff effects 
Note: See TracTickets for help on using tickets.