Opened 21 months ago

Closed 10 months ago

Last modified 10 months ago

#2238 closed Bug (fixed)

icb reproducibility issue

Reported by: mathiot Owned by: mathiot
Priority: low Milestone:
Component: ICB Version: release-4.0
Severity: minor Keywords: ICB, reproductibility,
Cc:

Description

BE CAREFUL !!! Due to dynamic behaviour of this ticket creation page, it is highly recommend to set first all other fields before writing the ticket description below.
If you have lost your draft after an unwanted reload, you can click on the link 'Restore Form' in the contextual menu upper right to recover it.
Remove these lines after reading.

Context

When I seed iceberg everywhere in the southern ocean, ORCA2 (in a sette test run like) is no more reproducible.

Analysis

The main issue is that icb can be on the halo during the acceleration evaluation. During this steps, some ocean value are interpolated at the icb location. In each icb_bilin_h routine, there is this bits of code:

      ! find position in this processor. Prevent near edge problems (see #1389)
      !
      IF    ( ii < mig( 1 ) ) THEN   ;   ii = 1
      ELSEIF( ii > mig(jpi) ) THEN   ;   ii = jpi
      ELSE                           ;   ii = mi1(ii)
      ENDIF
      IF    ( ij < mjg( 1 ) ) THEN   ;   ij = 1
      ELSEIF( ij > mjg(jpj) ) THEN   ;   ij = jpj
      ELSE                           ;   ij  = mj1(ij)
      ENDIF
      !
      IF( ii == jpi )   ii = ii-1
      IF( ij == jpj )   ij = ij-1
      !
      icb_utl_bilin_h = ( pfld(ii,ij  ) * (1.-zi) + pfld(ii+1,ij  ) * zi ) * (1.-zj)   &
         &            + ( pfld(ii,ij+1) * (1.-zi) + pfld(ii+1,ij+1) * zi ) *     zj

This bits of code, basically change the location where the interpolation has to be done if the icb is on the halo and we want to interpolate variable on U/V grid.

The code as it is do not use the fact that pfld has extra halo defined in icb_copy.

Fix

There is 3 fixes possible:

  • solution 1: correct the condition above to use the extra halo (mi1(0) not defined, so need to had an extra condition):
      ! find position in this processor. Prevent near edge problems (see #1389)
      !
      IF    ( ii == mig(1)-1 ) THEN   ;   ii = 0
      ELSEIF( ii  < mig(1)-1 ) THEN   ;   ii = 0
      ELSEIF( ii  > mig(jpi) ) THEN   ;   ii = jpi
      ELSE                            ;   ii = mi1(ii)
      ENDIF
      IF    ( ij == mjg(1)-1 ) THEN   ;   ij = 0
      ELSEIF( ij  < mjg(1)-1 ) THEN   ;   ij = 0
      ELSEIF( ij  > mjg(jpj) ) THEN   ;   ij = jpj
      ELSE                            ;   ij = mj1(ij)
      ENDIF
      !
      !
      icb_utl_bilin_h = ( pfld(ii,ij  ) * (1.-zi) + pfld(ii+1,ij  ) * zi ) * (1.-zj)   &
         &            + ( pfld(ii,ij+1) * (1.-zi) + pfld(ii+1,ij+1) * zi ) *     zj
  • solution 2: ensure that each time we call icb_accel, there is no more icb over the halo. It need to split the RK4 time stepping of the icb in 4 parts and exchange icb across processor between each step.

pseudo code for one of the block:

LOOP over all the icb
   LOAD intermediate velocity/accel need for this step
   CALL icb_accel
   EVALUATE new position
   CALL icb_ground
   SAVE all intermediate position
END LOOP over all icb
CALL icb_lbc_mpp  (in addition of the usual variable, need to exchange all the intermediate variables needed for RK4)

The advantages of this are that icb_copy, icb_bilin_h can be removed from the code and no more call of lbc_lnk_icb for u/v/sst/ssh/ice var/…
The inconvenients of this are that icbdyn is slightly more complicated to read and we call more often icb_lbc_mpp

  • solution 3: using one extra halo in NEMO in general: It will allow to keep icbdyn as it is and get rid of the same bits of code mentioned in solution 2

I will build branch for solution 1 and 2. The minimum to do is solution 1.

Remaining issue: much better, but still not perfectly reproducible. With a GYRE test (initialisation with 1 iceberg in each cell), jobs 4x5 and 5x4 start to separate after 1000 steps instead of 100 with solution 1 and it is the same for solution 2.

It affect trunk and version 4.0

Commit History (14)

ChangesetAuthorTimeChangeLog
12291mathiot2019-12-30T17:43:47+01:00

fix icb reproducibility into the trunk (ticket #2238)

12275mathiot2019-12-18T16:42:19+01:00

fix icb reproducibility (ticket #2238)

12274mathiot2019-12-18T16:42:04+01:00

fix icb reproducibility (ticket #2238)

10726mathiot2019-02-27T16:06:35+01:00

changes related to bug fixes described in tickets #2228, #2229, #2238 and #1595 in NEMO 4.0

10705mathiot2019-02-20T13:22:46+01:00

delete branch fix_ticket2238_solution1 (ticket #2238)

10702mathiot2019-02-20T10:44:07+01:00

merge branch fix_ticket2238_solution1 into the trunk (ticket #2238)

10701mathiot2019-02-19T20:15:53+01:00

update branch to head of the trunk (ticket #2238)

10700mathiot2019-02-19T17:37:56+01:00

allocate/define/initialise hi_e only if key_si3 is define (ticket #2238)

10697mathiot2019-02-19T14:07:55+01:00

add if define si3 in icbini (ticket #2238)

10695mathiot2019-02-18T18:36:23+01:00

change PRINT in CALL ctl_stop (ticket #2238 solution 1)

10693mathiot2019-02-15T18:58:27+01:00

add line to move across the north fold cut line intermediate positions/velocities/accelerations (ticket #2238 solution 2)

10680mathiot2019-02-14T14:16:33+01:00

cosmetic change (ticket #2238)

10679mathiot2019-02-14T14:11:43+01:00

branch for solution 2 of ticket #2238

10673mathiot2019-02-13T14:27:25+01:00

creation of branch for ticket #2238 suggestion 1

Change History (22)

comment:1 Changed 21 months ago by mathiot

In 10673:

creation of branch for ticket #2238 suggestion 1

comment:2 Changed 21 months ago by mathiot

In 10679:

branch for solution 2 of ticket #2238

comment:3 Changed 21 months ago by mathiot

In 10680:

cosmetic change (ticket #2238)

comment:4 Changed 21 months ago by mathiot

When icb are seeded every where in the southern ocean instead of in small patch.

In the trunk
ORCA2_ICE_PISCES is not reproducible after 25 time steps

Suggested solution 1 (NEMO/branches/2019/fix_ticket2238_solution1):
ORCA2_ICE_PISCES is not reproducible after 667 time steps

Suggested solution 2 (NEMO/branches/2019/fix_ticket2238_solution2):
ORCA2_ICE_PISCES is not reproducible after 614 time steps

Solutions 1 and 2 are different after 225 time steps

Suggested solutions are different form the trunk after 25 time steps (as expected because tests with the trunk are different after 25 time steps).

I have no idea why it is still not reproducible after more than 600 time steps.

comment:5 Changed 21 months ago by mathiot

In 10693:

add line to move across the north fold cut line intermediate positions/velocities/accelerations (ticket #2238 solution 2)

comment:6 Changed 21 months ago by mathiot

Based on Andrew's suggestion, I did some extra tests. instead of seeding ICB every where in the souther ocean, I did it in the Arctic. Results are :

Solution 1: reproducible (still the same results after 990 time steps)
Solution 2: not reproducible after time step 856

Solution 1 and 2 are different after 514 time steps

After discussion with Andrew, we agree that :

  • solution 1 looks safer at this stage and is much easier to understand.(bug fixes)
  • solution 2 has an elegance to it but we are uncomfortable assessing such a big change without time to properly think through the consequences. (restructuration)
  • keep branch for solution 2 alive. Could be useful when work on adding extra haloes will be undertaken.

comment:7 Changed 21 months ago by acc

Review of changes on fix_ticket2238_solution1 branch:

First changes in icbutl.F90 remove some apparent redundancy in initialising ssh values in the extra halo to zero. Concerned that this apparent redundancy might have been there to account for land suppression cases where there is no exchange with an all-land neighbour. In such cases ssh may remain uninitialised even after abc_lnk_icb

Rest of the changes to make full use of the extra halo and set explicit precision for constants are ok.

Not so sure about the 'Should not see this' print* messages. Agree the user should not see these but if something goes wrong and they do occur the user is still unlikely to see them buried in the output and for very large configurations they could generate large volumes. Better to make a formal stop just in case.

comment:8 Changed 21 months ago by mathiot

Reply:
For 1: All ssh_e value are initialized to 0.

   #if defined key_si3
         hicth(:,:) = 0._wp ;  hicth(1:jpi,1:jpj) = hm_i (:,:) 
         ui_e(:,:) = 0._wp ;   ui_e(1:jpi, 1:jpj) = u_ice(:,:)
         vi_e(:,:) = 0._wp ;   vi_e(1:jpi, 1:jpj) = v_ice(:,:)
         !     
         ! compute ssh slope using ssh_lead if embedded
         zssh_lead_m(:,:) = ice_var_sshdyn(ssh_m, snwice_mass, snwice_mass_b)
         ssh_e(:,:) = 0._wp ;  ssh_e(1:jpi, 1:jpj) = zssh_lead_m(:,:) * tmask(:,:,1)
         !
         CALL lbc_lnk_icb( 'icbutl', hicth, 'T', +1._wp, 1, 1 )
         CALL lbc_lnk_icb( 'icbutl', ui_e , 'U', -1._wp, 1, 1 )
         CALL lbc_lnk_icb( 'icbutl', vi_e , 'V', -1._wp, 1, 1 )
   #else
         ssh_e(:,:) = 0._wp ;  ssh_e(1:jpi, 1:jpj) = ssh_m(:,:) * tmask(:,:,1)
   #endif
         CALL lbc_lnk_icb( 'icbutl', ssh_e, 'T', +1._wp, 1, 1 )

So un-initialized value seems not possible. And if there is land processor on the side, I will argue that ssh should be set to 0 as all the other variables as SST, ice thickness, ice fraction because it is the land value. Then the interpolation scheme should be clever enough to deal with land value (use it or not depending if it is a valid data or not). This is the purpose of the fix ticket #2229.

For 2: no comments required

For 3: I agree, I will add an error counter ierr and then a proper stop.

       ! find position in this processor. Prevent near edge problems (see #1389)
      !
      ierr = 0
      IF    ( ii < mig( 1 ) ) THEN   ;   ii = 1       ; ierr = ierr + 1
      ELSEIF( ii > mig(jpi) ) THEN   ;   ii = jpi     ; ierr = ierr + 1
      ELSE                           ;   ii = mi1(ii)
      ENDIF
      IF    ( ij < mjg( 1 ) ) THEN   ;   ij = 1       ; ierr = ierr + 1
      ELSEIF( ij > mjg(jpj) ) THEN   ;   ij = jpj     ; ierr = ierr + 1
      ELSE                           ;   ij  = mj1(ij)
      ENDIF
      !
      IF( ii == jpi ) THEN ; ii = ii-1 ; ierr = ierr + 1 ; END IF
      IF( ij == jpj ) THEN ; ij = ij-1 ; ierr = ierr + 1 ; END IF
      !
      IF ( ierr > 0 ) CALL ctl_stop('STOP','icb_utl_bilin_e: an icebergs coordinates is out of valid range (out of bound error)')

Using a simple CALL ctl_stop('icb_utl_bilin_e: an icebergs coordinates is out of valid range (out of bound error)') could lead the model to hang at the mppsync step because not all the processors will go through the MPI_BARRIER.

comment:9 Changed 21 months ago by mathiot

In 10695:

change PRINT in CALL ctl_stop (ticket #2238 solution 1)

comment:10 Changed 21 months ago by mathiot

With Andrew's comments on initialisation, I realised that the extended array for the icb need only to be initialized once for all to 0. in icbini.F90 and not at each time step in icbutl.F90. So, I will move the initialisation into icbini.F90 and rename hicth to hi_e to be consistent with other extended variable names.

comment:11 Changed 21 months ago by mathiot

In 10697:

add if define si3 in icbini (ticket #2238)

comment:12 Changed 21 months ago by acc

Looks ok but there appears to be an inconsistency in the treatment of ui_e, vi_e and hi_e. It was always there but is more apparent now that hi_e has been renamed. Looks as if hi_e should only be allocated and initialised if key_si3 or key_cice is defined just like ui_e and vi_e. Not critical since it is unlikely that icebergs will be used without a sea-ice model; but may as well tidy up.

comment:13 Changed 21 months ago by mathiot

In 10700:

allocate/define/initialise hi_e only if key_si3 is define (ticket #2238)

comment:14 Changed 21 months ago by mathiot

In 10701:

update branch to head of the trunk (ticket #2238)

comment:15 Changed 20 months ago by mathiot

  • Resolution set to fixed
  • Status changed from new to closed

In 10702:

merge branch fix_ticket2238_solution1 into the trunk (ticket #2238)

comment:16 Changed 20 months ago by mathiot

In 10705:

delete branch fix_ticket2238_solution1 (ticket #2238)

comment:17 Changed 20 months ago by mathiot

In 10726:

changes related to bug fixes described in tickets #2228, #2229, #2238 and #1595 in NEMO 4.0

comment:18 Changed 10 months ago by mathiot

  • Resolution fixed deleted
  • Status changed from closed to reopened

As mentioned in comments 4 (https://forge.ipsl.jussieu.fr/nemo/ticket/2238#comment:4), if the iceberg model is stress a bit (test box covering the whole southern ocean for example instead of only part of it), the results are not reproducible.

The issue comes from the fact that the order of operations to compute the melt/hflx on the NEMO grid change when iceberg moving from one processor to another. The reason for this is simply because when an iceberg move from one processor to another in one of the REPRO simulation, its order in the icb list is not the same, so the order in which the iceberg are treated in the cumulative sum to computed the total melt on the NEMO grid is different.

Suggested fix: use the function DDPDD to compute the cumulative sum.

  • icbthm.F90

     
    1919   USE lib_mpp        ! NEMO MPI routines, ctl_stop in particular 
    2020   USE phycst         ! NEMO physical constants 
    2121   USE sbc_oce 
     22   USE lib_fortran, ONLY : DDPDD 
    2223 
    2324   USE icb_oce        ! define iceberg arrays 
    2425   USE icbutl         ! iceberg utility routines 
     
    5455      REAL(wp) ::   zxi, zyj, zff, z1_rday, z1_e1e2, zdt, z1_dt, z1_dt_e1e2 
    5556      TYPE(iceberg), POINTER ::   this, next 
    5657      TYPE(point)  , POINTER ::   pt 
     58      ! 
     59      COMPLEX(wp), DIMENSION(jpi,jpj) :: cicb_melt, cicb_hflx 
    5760      !!---------------------------------------------------------------------- 
    5861      ! 
     62!! initialiaze cicb_melt and cicb_heat 
     63 
    5964      z1_rday = 1._wp / rday 
    6065      z1_12   = 1._wp / 12._wp 
    6166      zdt     = berg_dt 
     
    7075      ! calving_hflx re-used here as temporary workspace for the heat flux associated with melting 
    7176      berg_grid%calving_hflx(:,:)  = 0._wp 
    7277      ! 
     78      cicb_melt = CMPLX( 0.e0, 0.e0, wp )  
     79      cicb_hflx = CMPLX( 0.e0, 0.e0, wp )  
     80      ! 
    7381      this => first_berg 
    7482      DO WHILE( ASSOCIATED(this) ) 
    7583         ! 
     
    165173            z1_e1e2    = r1_e1e2t(ii,ij) * this%mass_scaling 
    166174            z1_dt_e1e2 = z1_dt * z1_e1e2 
    167175            zmelt    = ( zdM - ( zdMbitsE - zdMbitsM ) ) * z1_dt   ! kg/s 
    168             berg_grid%floating_melt(ii,ij) = berg_grid%floating_melt(ii,ij) + zmelt    * z1_e1e2    ! kg/m2/s 
     176            CALL DDPDD( CMPLX( zmelt * z1_e1e2, 0.e0, wp ), cicb_melt(ii,ij) ) 
    169177            !! NB. The src_calving_hflx field is currently hardwired to zero in icb_stp, which means that the 
    170178            !!     heat density of the icebergs is zero and the heat content flux to the ocean from iceberg 
    171179            !!     melting is always zero. Leaving the term in the code until such a time as this is fixed. DS. 
    172180            zheat_hcflux = zmelt * pt%heat_density       ! heat content flux : kg/s x J/kg = J/s 
    173181            zheat_latent = - zmelt * rLfus               ! latent heat flux:  kg/s x J/kg = J/s 
    174             berg_grid%calving_hflx (ii,ij) = berg_grid%calving_hflx (ii,ij) + ( zheat_hcflux + zheat_latent ) * z1_e1e2    ! W/m2 
     182            CALL DDPDD( CMPLX( ( zheat_hcflux + zheat_latent ) * z1_e1e2, 0.e0, wp ), cicb_hflx(ii,ij) ) 
    175183            CALL icb_dia_melt( ii, ij, zMnew, zheat_hcflux, zheat_latent, this%mass_scaling,       & 
    176184               &                       zdM, zdMbitsE, zdMbitsM, zdMb, zdMe,   & 
    177185               &                       zdMv, z1_dt_e1e2 ) 
     
    213221         this=>next 
    214222         ! 
    215223      END DO 
    216  
     224      ! 
     225      berg_grid%floating_melt = REAL(cicb_melt,wp)    ! kg/m2/s 
     226      berg_grid%calving_hflx  = REAL(cicb_hflx,wp) 
     227      ! 
    217228      ! now use melt and associated heat flux in ocean (or not) 
    218229      ! 
    219230      IF(.NOT. ln_passive_mode ) THEN 


Above fix works for ORCA2_ICE_PISCES with iceberg over the whole Southern Ocean

comment:19 Changed 10 months ago by mathiot

In 12274:

fix icb reproducibility (ticket #2238)

comment:20 Changed 10 months ago by mathiot

In 12275:

fix icb reproducibility (ticket #2238)

comment:21 Changed 10 months ago by mathiot

  • Resolution set to fixed
  • Status changed from reopened to closed

Fix included in 2019 merge branches in commit 12274 and 12275.

comment:22 Changed 10 months ago by mathiot

In 12291:

fix icb reproducibility into the trunk (ticket #2238)

Note: See TracTickets for help on using tickets.