#2238 closed Bug (fixed)
icb reproducibility issue
Reported by: | mathiot | Owned by: | mathiot |
---|---|---|---|
Priority: | low | Milestone: | |
Component: | ICB | Version: | v4.0 |
Severity: | minor | Keywords: | ICB reproductibility v4.0 |
Cc: |
Description
BE CAREFUL !!! Due to dynamic behaviour of this ticket creation page, it is highly recommend to set first all other fields before writing the ticket description below.
If you have lost your draft after an unwanted reload, you can click on the link 'Restore Form' in the contextual menu upper right to recover it.
Remove these lines after reading.
Context
When I seed iceberg everywhere in the southern ocean, ORCA2 (in a sette test run like) is no more reproducible.
Analysis
The main issue is that icb can be on the halo during the acceleration evaluation. During this steps, some ocean value are interpolated at the icb location. In each icb_bilin_h routine, there is this bits of code:
! find position in this processor. Prevent near edge problems (see #1389) ! IF ( ii < mig( 1 ) ) THEN ; ii = 1 ELSEIF( ii > mig(jpi) ) THEN ; ii = jpi ELSE ; ii = mi1(ii) ENDIF IF ( ij < mjg( 1 ) ) THEN ; ij = 1 ELSEIF( ij > mjg(jpj) ) THEN ; ij = jpj ELSE ; ij = mj1(ij) ENDIF ! IF( ii == jpi ) ii = ii-1 IF( ij == jpj ) ij = ij-1 ! icb_utl_bilin_h = ( pfld(ii,ij ) * (1.-zi) + pfld(ii+1,ij ) * zi ) * (1.-zj) & & + ( pfld(ii,ij+1) * (1.-zi) + pfld(ii+1,ij+1) * zi ) * zj
This bits of code, basically change the location where the interpolation has to be done if the icb is on the halo and we want to interpolate variable on U/V grid.
The code as it is do not use the fact that pfld has extra halo defined in icb_copy.
Fix
There is 3 fixes possible:
- solution 1: correct the condition above to use the extra halo (mi1(0) not defined, so need to had an extra condition):
! find position in this processor. Prevent near edge problems (see #1389) ! IF ( ii == mig(1)-1 ) THEN ; ii = 0 ELSEIF( ii < mig(1)-1 ) THEN ; ii = 0 ELSEIF( ii > mig(jpi) ) THEN ; ii = jpi ELSE ; ii = mi1(ii) ENDIF IF ( ij == mjg(1)-1 ) THEN ; ij = 0 ELSEIF( ij < mjg(1)-1 ) THEN ; ij = 0 ELSEIF( ij > mjg(jpj) ) THEN ; ij = jpj ELSE ; ij = mj1(ij) ENDIF ! ! icb_utl_bilin_h = ( pfld(ii,ij ) * (1.-zi) + pfld(ii+1,ij ) * zi ) * (1.-zj) & & + ( pfld(ii,ij+1) * (1.-zi) + pfld(ii+1,ij+1) * zi ) * zj
- solution 2: ensure that each time we call icb_accel, there is no more icb over the halo. It need to split the RK4 time stepping of the icb in 4 parts and exchange icb across processor between each step.
pseudo code for one of the block:
LOOP over all the icb LOAD intermediate velocity/accel need for this step CALL icb_accel EVALUATE new position CALL icb_ground SAVE all intermediate position END LOOP over all icb CALL icb_lbc_mpp (in addition of the usual variable, need to exchange all the intermediate variables needed for RK4)
The advantages of this are that icb_copy, icb_bilin_h can be removed from the code and no more call of lbc_lnk_icb for u/v/sst/ssh/ice var/...
The inconvenients of this are that icbdyn is slightly more complicated to read and we call more often icb_lbc_mpp
- solution 3: using one extra halo in NEMO in general: It will allow to keep icbdyn as it is and get rid of the same bits of code mentioned in solution 2
I will build branch for solution 1 and 2. The minimum to do is solution 1.
Remaining issue: much better, but still not perfectly reproducible. With a GYRE test (initialisation with 1 iceberg in each cell), jobs 4x5 and 5x4 start to separate after 1000 steps instead of 100 with solution 1 and it is the same for solution 2.
It affect trunk and version 4.0
Commit History (14)
Changeset | Author | Time | ChangeLog |
---|---|---|---|
12291 | mathiot | 2019-12-30T17:43:47+01:00 | fix icb reproducibility into the trunk (ticket #2238) |
12275 | mathiot | 2019-12-18T16:42:19+01:00 | fix icb reproducibility (ticket #2238) |
12274 | mathiot | 2019-12-18T16:42:04+01:00 | fix icb reproducibility (ticket #2238) |
10726 | mathiot | 2019-02-27T16:06:35+01:00 | changes related to bug fixes described in tickets #2228, #2229, #2238 and #1595 in NEMO 4.0 |
10705 | mathiot | 2019-02-20T13:22:46+01:00 | delete branch fix_ticket2238_solution1 (ticket #2238) |
10702 | mathiot | 2019-02-20T10:44:07+01:00 | merge branch fix_ticket2238_solution1 into the trunk (ticket #2238) |
10701 | mathiot | 2019-02-19T20:15:53+01:00 | update branch to head of the trunk (ticket #2238) |
10700 | mathiot | 2019-02-19T17:37:56+01:00 | allocate/define/initialise hi_e only if key_si3 is define (ticket #2238) |
10697 | mathiot | 2019-02-19T14:07:55+01:00 | add if define si3 in icbini (ticket #2238) |
10695 | mathiot | 2019-02-18T18:36:23+01:00 | change PRINT in CALL ctl_stop (ticket #2238 solution 1) |
10693 | mathiot | 2019-02-15T18:58:27+01:00 | add line to move across the north fold cut line intermediate positions/velocities/accelerations (ticket #2238 solution 2) |
10680 | mathiot | 2019-02-14T14:16:33+01:00 | cosmetic change (ticket #2238) |
10679 | mathiot | 2019-02-14T14:11:43+01:00 | branch for solution 2 of ticket #2238 |
10673 | mathiot | 2019-02-13T14:27:25+01:00 | creation of branch for ticket #2238 suggestion 1 |
Change History (23)
comment:1 Changed 6 years ago by mathiot
comment:2 Changed 6 years ago by mathiot
In 10679:
comment:3 Changed 6 years ago by mathiot
In 10680:
comment:4 Changed 6 years ago by mathiot
When icb are seeded every where in the southern ocean instead of in small patch.
In the trunk
ORCA2_ICE_PISCES is not reproducible after 25 time steps
Suggested solution 1 (NEMO/branches/2019/fix_ticket2238_solution1):
ORCA2_ICE_PISCES is not reproducible after 667 time steps
Suggested solution 2 (NEMO/branches/2019/fix_ticket2238_solution2):
ORCA2_ICE_PISCES is not reproducible after 614 time steps
Solutions 1 and 2 are different after 225 time steps
Suggested solutions are different form the trunk after 25 time steps (as expected because tests with the trunk are different after 25 time steps).
I have no idea why it is still not reproducible after more than 600 time steps.
comment:5 Changed 6 years ago by mathiot
In 10693:
comment:6 Changed 6 years ago by mathiot
Based on Andrew's suggestion, I did some extra tests. instead of seeding ICB every where in the souther ocean, I did it in the Arctic. Results are :
Solution 1: reproducible (still the same results after 990 time steps)
Solution 2: not reproducible after time step 856
Solution 1 and 2 are different after 514 time steps
After discussion with Andrew, we agree that :
- solution 1 looks safer at this stage and is much easier to understand.(bug fixes)
- solution 2 has an elegance to it but we are uncomfortable assessing such a big change without time to properly think through the consequences. (restructuration)
- keep branch for solution 2 alive. Could be useful when work on adding extra haloes will be undertaken.
comment:7 Changed 6 years ago by acc
Review of changes on fix_ticket2238_solution1 branch:
First changes in icbutl.F90 remove some apparent redundancy in initialising ssh values in the extra halo to zero. Concerned that this apparent redundancy might have been there to account for land suppression cases where there is no exchange with an all-land neighbour. In such cases ssh may remain uninitialised even after abc_lnk_icb
Rest of the changes to make full use of the extra halo and set explicit precision for constants are ok.
Not so sure about the 'Should not see this' print* messages. Agree the user should not see these but if something goes wrong and they do occur the user is still unlikely to see them buried in the output and for very large configurations they could generate large volumes. Better to make a formal stop just in case.
comment:8 Changed 6 years ago by mathiot
Reply:
For 1: All ssh_e value are initialized to 0.
#if defined key_si3 hicth(:,:) = 0._wp ; hicth(1:jpi,1:jpj) = hm_i (:,:) ui_e(:,:) = 0._wp ; ui_e(1:jpi, 1:jpj) = u_ice(:,:) vi_e(:,:) = 0._wp ; vi_e(1:jpi, 1:jpj) = v_ice(:,:) ! ! compute ssh slope using ssh_lead if embedded zssh_lead_m(:,:) = ice_var_sshdyn(ssh_m, snwice_mass, snwice_mass_b) ssh_e(:,:) = 0._wp ; ssh_e(1:jpi, 1:jpj) = zssh_lead_m(:,:) * tmask(:,:,1) ! CALL lbc_lnk_icb( 'icbutl', hicth, 'T', +1._wp, 1, 1 ) CALL lbc_lnk_icb( 'icbutl', ui_e , 'U', -1._wp, 1, 1 ) CALL lbc_lnk_icb( 'icbutl', vi_e , 'V', -1._wp, 1, 1 ) #else ssh_e(:,:) = 0._wp ; ssh_e(1:jpi, 1:jpj) = ssh_m(:,:) * tmask(:,:,1) #endif CALL lbc_lnk_icb( 'icbutl', ssh_e, 'T', +1._wp, 1, 1 )
So un-initialized value seems not possible. And if there is land processor on the side, I will argue that ssh should be set to 0 as all the other variables as SST, ice thickness, ice fraction because it is the land value. Then the interpolation scheme should be clever enough to deal with land value (use it or not depending if it is a valid data or not). This is the purpose of the fix ticket #2229.
For 2: no comments required
For 3: I agree, I will add an error counter ierr and then a proper stop.
! find position in this processor. Prevent near edge problems (see #1389) ! ierr = 0 IF ( ii < mig( 1 ) ) THEN ; ii = 1 ; ierr = ierr + 1 ELSEIF( ii > mig(jpi) ) THEN ; ii = jpi ; ierr = ierr + 1 ELSE ; ii = mi1(ii) ENDIF IF ( ij < mjg( 1 ) ) THEN ; ij = 1 ; ierr = ierr + 1 ELSEIF( ij > mjg(jpj) ) THEN ; ij = jpj ; ierr = ierr + 1 ELSE ; ij = mj1(ij) ENDIF ! IF( ii == jpi ) THEN ; ii = ii-1 ; ierr = ierr + 1 ; END IF IF( ij == jpj ) THEN ; ij = ij-1 ; ierr = ierr + 1 ; END IF ! IF ( ierr > 0 ) CALL ctl_stop('STOP','icb_utl_bilin_e: an icebergs coordinates is out of valid range (out of bound error)')
Using a simple CALL ctl_stop('icb_utl_bilin_e: an icebergs coordinates is out of valid range (out of bound error)') could lead the model to hang at the mppsync step because not all the processors will go through the MPI_BARRIER.
comment:9 Changed 6 years ago by mathiot
In 10695:
comment:10 Changed 6 years ago by mathiot
With Andrew's comments on initialisation, I realised that the extended array for the icb need only to be initialized once for all to 0. in icbini.F90 and not at each time step in icbutl.F90. So, I will move the initialisation into icbini.F90 and rename hicth to hi_e to be consistent with other extended variable names.
comment:11 Changed 6 years ago by mathiot
In 10697:
comment:12 Changed 6 years ago by acc
Looks ok but there appears to be an inconsistency in the treatment of ui_e, vi_e and hi_e. It was always there but is more apparent now that hi_e has been renamed. Looks as if hi_e should only be allocated and initialised if key_si3 or key_cice is defined just like ui_e and vi_e. Not critical since it is unlikely that icebergs will be used without a sea-ice model; but may as well tidy up.
comment:13 Changed 6 years ago by mathiot
In 10700:
comment:14 Changed 6 years ago by mathiot
In 10701:
comment:15 Changed 6 years ago by mathiot
- Resolution set to fixed
- Status changed from new to closed
In 10702:
comment:16 Changed 6 years ago by mathiot
In 10705:
comment:17 Changed 6 years ago by mathiot
In 10726:
comment:18 Changed 5 years ago by mathiot
- Resolution fixed deleted
- Status changed from closed to reopened
As mentioned in comments 4 (https://forge.ipsl.jussieu.fr/nemo/ticket/2238#comment:4), if the iceberg model is stress a bit (test box covering the whole southern ocean for example instead of only part of it), the results are not reproducible.
The issue comes from the fact that the order of operations to compute the melt/hflx on the NEMO grid change when iceberg moving from one processor to another. The reason for this is simply because when an iceberg move from one processor to another in one of the REPRO simulation, its order in the icb list is not the same, so the order in which the iceberg are treated in the cumulative sum to computed the total melt on the NEMO grid is different.
Suggested fix: use the function DDPDD to compute the cumulative sum.
-
icbthm.F90
19 19 USE lib_mpp ! NEMO MPI routines, ctl_stop in particular 20 20 USE phycst ! NEMO physical constants 21 21 USE sbc_oce 22 USE lib_fortran, ONLY : DDPDD 22 23 23 24 USE icb_oce ! define iceberg arrays 24 25 USE icbutl ! iceberg utility routines … … 54 55 REAL(wp) :: zxi, zyj, zff, z1_rday, z1_e1e2, zdt, z1_dt, z1_dt_e1e2 55 56 TYPE(iceberg), POINTER :: this, next 56 57 TYPE(point) , POINTER :: pt 58 ! 59 COMPLEX(wp), DIMENSION(jpi,jpj) :: cicb_melt, cicb_hflx 57 60 !!---------------------------------------------------------------------- 58 61 ! 62 !! initialiaze cicb_melt and cicb_heat 63 59 64 z1_rday = 1._wp / rday 60 65 z1_12 = 1._wp / 12._wp 61 66 zdt = berg_dt … … 70 75 ! calving_hflx re-used here as temporary workspace for the heat flux associated with melting 71 76 berg_grid%calving_hflx(:,:) = 0._wp 72 77 ! 78 cicb_melt = CMPLX( 0.e0, 0.e0, wp ) 79 cicb_hflx = CMPLX( 0.e0, 0.e0, wp ) 80 ! 73 81 this => first_berg 74 82 DO WHILE( ASSOCIATED(this) ) 75 83 ! … … 165 173 z1_e1e2 = r1_e1e2t(ii,ij) * this%mass_scaling 166 174 z1_dt_e1e2 = z1_dt * z1_e1e2 167 175 zmelt = ( zdM - ( zdMbitsE - zdMbitsM ) ) * z1_dt ! kg/s 168 berg_grid%floating_melt(ii,ij) = berg_grid%floating_melt(ii,ij) + zmelt * z1_e1e2 ! kg/m2/s176 CALL DDPDD( CMPLX( zmelt * z1_e1e2, 0.e0, wp ), cicb_melt(ii,ij) ) 169 177 !! NB. The src_calving_hflx field is currently hardwired to zero in icb_stp, which means that the 170 178 !! heat density of the icebergs is zero and the heat content flux to the ocean from iceberg 171 179 !! melting is always zero. Leaving the term in the code until such a time as this is fixed. DS. 172 180 zheat_hcflux = zmelt * pt%heat_density ! heat content flux : kg/s x J/kg = J/s 173 181 zheat_latent = - zmelt * rLfus ! latent heat flux: kg/s x J/kg = J/s 174 berg_grid%calving_hflx (ii,ij) = berg_grid%calving_hflx (ii,ij) + ( zheat_hcflux + zheat_latent ) * z1_e1e2 ! W/m2182 CALL DDPDD( CMPLX( ( zheat_hcflux + zheat_latent ) * z1_e1e2, 0.e0, wp ), cicb_hflx(ii,ij) ) 175 183 CALL icb_dia_melt( ii, ij, zMnew, zheat_hcflux, zheat_latent, this%mass_scaling, & 176 184 & zdM, zdMbitsE, zdMbitsM, zdMb, zdMe, & 177 185 & zdMv, z1_dt_e1e2 ) … … 213 221 this=>next 214 222 ! 215 223 END DO 216 224 ! 225 berg_grid%floating_melt = REAL(cicb_melt,wp) ! kg/m2/s 226 berg_grid%calving_hflx = REAL(cicb_hflx,wp) 227 ! 217 228 ! now use melt and associated heat flux in ocean (or not) 218 229 ! 219 230 IF(.NOT. ln_passive_mode ) THEN
Above fix works for ORCA2_ICE_PISCES with iceberg over the whole Southern Ocean
comment:19 Changed 5 years ago by mathiot
In 12274:
comment:20 Changed 5 years ago by mathiot
In 12275:
comment:21 Changed 5 years ago by mathiot
- Resolution set to fixed
- Status changed from reopened to closed
Fix included in 2019 merge branches in commit 12274 and 12275.
comment:22 Changed 5 years ago by mathiot
In 12291:
comment:23 Changed 3 years ago by nemo
- Keywords v4.0 added; removed
In 10673: