Version 16 (modified by acc, 4 months ago) (diff)

Name and subject of the action

Last edition: 05/29/20 19:31:52 by acc

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Option 2 revisited
  3. Preview
  4. Tests
  5. Review

Summary

Action ENHANCE-10_acc_fix_traqsr
PI(S) acc
Digest Reduce use of 3D allocatable arrays in RGB light penetration schemes
Dependencies If any
Branch source:/NEMO/branches/{YEAR}/dev_r{REV}_{ACTION_NAME}
Previewer(s) Names
Reviewer(s) Names
Ticket #2469

Description

The current implementation of RGB light penetration in traqsr (either varying or constant chlorophyll) uses 6, domain-sized 3D, temporary arrays which can be reduced to a few 2D arrays. The impact of the current implementation is most evident at lower processor counts where the impact of the extra 3D arrays can cause cache-misses and memory band-width issues. In an extreme case traqsr can switch from consuming 2% of run-time to 68% (comparing ORCA025 running on 200 cores vs 48 cores).

A simple redesign of the algorithm should remove this behaviour.

Implementation

The code in question is contained within a CASE construct which deals with both varying surface Chlorophyll or constant values; in both cases the following temporary arrays are allocated:

      CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==!
         !
         ALLOCATE( zekb(jpi,jpj)     , zekg(jpi,jpj)     , zekr  (jpi,jpj)     , &
            &      ze0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2   (jpi,jpj,jpk) , &
            &      ze3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk)   )

In the case of varying Chlorophyll; surface values are read and then used in a 3D loop to set subsurface value. For constant Chlorophyll a single value is used throughout:

PART 1
-------
         IF( nqsr == np_RGBc ) THEN          !*  Variable Chlorophyll
            CALL fld_read( kt, 1, sf_chl )         ! Read Chl data and provides it at the current time step
            DO jk = 1, nksr + 1
               DO jj = 2, jpjm1                       ! Separation in R-G-B depending of the surface Chl
                  DO ji = 2, jpim1
                     zchl    = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) )
                     zCtot   = 40.6  * zchl**0.459
                     zze     = 568.2 * zCtot**(-0.746)
                     IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)
                     zpsi    = gdepw(ji,jj,jk,Kmm) / zze
                     !
                     zlogc   = LOG( zchl )
                     zlogc2  = zlogc * zlogc
                     zlogc3  = zlogc * zlogc * zlogc
                     zCb     = 0.768 + 0.087 * zlogc - 0.179 * zlogc2 - 0.025 * zlogc3
                     zCmax   = 0.299 - 0.289 * zlogc + 0.579 * zlogc2
                     zpsimax = 0.6   - 0.640 * zlogc + 0.021 * zlogc2 + 0.115 * zlogc3
                     zdelpsi = 0.710 + 0.159 * zlogc + 0.021 * zlogc2
                     zCze    = 1.12  * (zchl)**0.803
                     !
                     zchl3d(ji,jj,jk) = zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) )
                  END DO
                  !
               END DO
            END DO
         ELSE                                !* constant chrlorophyll
           DO jk = 1, nksr + 1
              zchl3d(:,:,jk) = 0.05
            ENDDO
         ENDIF

These values are then used in a second 3D loop where Chlorophyll values are converted to attenuation coefficients (for R G and B) via indirect indexing to a prepared look-up table: {{{!f PART 2


zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00

ze0(ji,jj,1) = rn_abs * qsr(ji,jj) ze1(ji,jj,1) = zcoef * qsr(ji,jj) ze2(ji,jj,1) = zcoef * qsr(ji,jj) ze3(ji,jj,1) = zcoef * qsr(ji,jj) zea(ji,jj,1) = qsr(ji,jj)

END_2D ! DO jk = 2, nksr+1 !* interior equi-partition in R-G-B depending of vertical profile of Chl

DO_2D_00_00

zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) zekb(ji,jj) = rkrgb(1,irgb) zekg(ji,jj) = rkrgb(2,irgb) zekr(ji,jj) = rkrgb(3,irgb)

END_2D

DO_2D_00_00

zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r ) zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) ze0(ji,jj,jk) = zc0 ze1(ji,jj,jk) = zc1 ze2(ji,jj,jk) = zc2 ze3(ji,jj,jk) = zc3 zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk)

END_2D

END DO ! DO_3D_00_00( 1, nksr )

qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( zea(ji,jj,jk) - zea(ji,jj,jk+1) )

END_3D ! DEALLOCATE( zekb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d ) !

}}} Part 2 has obvious inefficiencies because most of the temporary, full-depth arrays are not necessary since only two vertical levels are required at any one time. In fact even the zea array is unnecessary since the zchl3d array could be repurposed once its value has been used. Recalling the chlorophyll value and converting this to a index within the second loop also seems wasteful because this conversion could have been done before the chlorophyll value was stored in part 1 and the index stored instead.

Part 1 has even greater inefficiencies though since it repeats some computationally expensive calculations in a 3D loop even though much of it is 2D in nature. In fact the complexity of the calculations can be reduced further by operating in log-space.

After trying a few options and iterating at the preview stage (see earlier investigations below), the final solution choose was:

{{{!f

CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==!

! ALLOCATE( ze0 (jpi,jpj) , ze1 (jpi,jpj) , &

& ze2 (jpi,jpj) , ze3 (jpi,jpj) , & & ztmp3d(jpi,jpj,nksr + 1) )

! IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll

CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step ! ! Separation in R-G-B depending on the surface Chl ! perform and store as many of the 2D calculations as possible ! before the 3D loop (use the temporary 2D arrays to replace the ! most expensive calculations) ! ze0(:,:) = LOG ( MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) ) ! ze0 = log(zchl) ze1(:,:) = 0.113328685307 + 0.803 * ze0(:,:) ! ze1 : log(zCze) = log (1.12 * zchl0.803) ze2(:,:) = 3.703768066608 + 0.459 * ze0(:,:) ! ze2 : log(zCtot) = log(40.6 * zchl0.459) ze3(:,:) = 6.34247346942 - 0.746 * ze2(:,:) ! ze3 : log(zze) = log(568.2 * zCtot(-0.746)) WHERE( ze3 > 4.62497281328 ) ze3 = 5.298317366548 - 0.293 * ze2 ! IF( log(zze) > log(102.) ) log(zze) = ! ! log(200.0 * zCtot(-0.293)) ze1(:,:) = EXP(ze1(:,:)) ! ze1 = zCze ze2(:,:) = 1._wp / ( 0.710 + ze0(:,:) * ( 0.159 + ze0(:,:) * 0.021 ) ) ! ze2 = 1/zdelpsi ze3(:,:) = 1._wp / EXP(ze3(:,:)) ! ze3 = 1/zze ! DO_3D_00_00 ( 1, nksr + 1 )

! zchl = ALOG( ze0(ji,jj) ) zlogc = ze0(ji,jj) ! zCb = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) zCmax = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) zpsimax = 0.6 - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) ! zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) ! zCze = ze1(ji,jj) zrdpsi = ze2(ji,jj) ! 1/zdelpsi zpsi = ze3(ji,jj) * gdepw(ji,jj,jk,Kmm) ! gdepw/zze ! ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) * zrdpsi )2 ) ) ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15

END_3D

ELSE !* constant chlorophyll

zchl = 0.05 ! NB. make sure constant value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index zlui = 41 + 20.*LOG10(zchl) + 1.e-15 DO jk = 1, nksr + 1

ztmp3d(:,:,jk) = zlui

END DO

ENDIF ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00

ze0(ji,jj) = rn_abs * qsr(ji,jj) ze1(ji,jj) = zcoef * qsr(ji,jj) ze2(ji,jj) = zcoef * qsr(ji,jj) ze3(ji,jj) = zcoef * qsr(ji,jj) ! store the surface SW radiation; re-use the surface ztmp3d array ! since the surface attenuation coefficient is not used ztmp3d(ji,jj,1) = qsr(ji,jj)

END_2D ! !* interior equi-partition in R-G-B depending on vertical profile of Chl DO_3D_00_00 ( 2, nksr + 1 )

ze3t = e3t(ji,jj,jk-1,Kmm) irgb = NINT( ztmp3d(ji,jj,jk) ) zc0 = ze0(ji,jj) * EXP( - ze3t * xsi0r ) zc1 = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) zc2 = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) zc3 = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) ze0(ji,jj) = zc0 ze1(ji,jj) = zc1 ze2(ji,jj) = zc2 ze3(ji,jj) = zc3 ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk)

END_3D ! DO_3D_00_00( 1, nksr )

qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) )

END_3D ! DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d )

}}}

where the difference from the trunk is:

  • traqsr.F90

     
    109109      REAL(wp) ::   zchl, zcoef, z1_2        ! local scalars 
    110110      REAL(wp) ::   zc0 , zc1 , zc2 , zc3    !    -         - 
    111111      REAL(wp) ::   zzc0, zzc1, zzc2, zzc3   !    -         - 
    112       REAL(wp) ::   zz0 , zz1                !    -         - 
    113       REAL(wp) ::   zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze 
    114       REAL(wp) ::   zlogc, zlogc2, zlogc3 
    115       REAL(wp), ALLOCATABLE, DIMENSION(:,:)   :: zekb, zekg, zekr 
    116       REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ze0, ze1, ze2, ze3, zea, ztrdt 
    117       REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: zetot, zchl3d 
     112      REAL(wp) ::   zz0 , zz1 , ze3t, zlui   !    -         - 
     113      REAL(wp) ::   zCb, zCmax, zpsi, zpsimax, zrdpsi, zCze 
     114      REAL(wp) ::   zlogc, zlogze, zlogCtot, zlogCze 
     115      REAL(wp), ALLOCATABLE, DIMENSION(:,:)   :: ze0, ze1, ze2, ze3 
     116      REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d 
    118117      !!---------------------------------------------------------------------- 
    119118      ! 
    120119      IF( ln_timing )   CALL timing_start('tra_qsr') 
     
    159158         ! 
    160159      CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==! 
    161160         ! 
    162          ALLOCATE( zekb(jpi,jpj)     , zekg(jpi,jpj)     , zekr  (jpi,jpj)     , & 
    163             &      ze0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2   (jpi,jpj,jpk) , & 
    164             &      ze3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk)   ) 
     161         ALLOCATE( ze0 (jpi,jpj)           , ze1 (jpi,jpj) ,  & 
     162            &      ze2 (jpi,jpj)           , ze3 (jpi,jpj) ,  & 
     163            &      ztmp3d(jpi,jpj,nksr + 1)                     ) 
    165164         ! 
    166165         IF( nqsr == np_RGBc ) THEN          !*  Variable Chlorophyll 
    167166            CALL fld_read( kt, 1, sf_chl )         ! Read Chl data and provides it at the current time step 
     167            ! 
     168            ! Separation in R-G-B depending on the surface Chl 
     169            ! perform and store as many of the 2D calculations as possible 
     170            ! before the 3D loop (use the temporary 2D arrays to replace the 
     171            ! most expensive calculations) 
     172            ! 
     173            ze0(:,:) = LOG ( MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) )     ! ze0 = log(zchl) 
     174            ze1(:,:) = 0.113328685307 + 0.803 * ze0(:,:)                           ! ze1 : log(zCze)  = log (1.12  * zchl**0.803) 
     175            ze2(:,:) = 3.703768066608 + 0.459 * ze0(:,:)                           ! ze2 : log(zCtot) = log(40.6  * zchl**0.459) 
     176            ze3(:,:) = 6.34247346942  - 0.746 * ze2(:,:)                           ! ze3 : log(zze)   = log(568.2 * zCtot**(-0.746)) 
     177            WHERE( ze3 > 4.62497281328 ) ze3 = 5.298317366548 - 0.293 * ze2        !       IF( log(zze) > log(102.) ) log(zze) = 
     178            !                                                                      !                    log(200.0 * zCtot**(-0.293)) 
     179            ze1(:,:) = EXP(ze1(:,:))                                               ! ze1 = zCze 
     180            ze2(:,:) = 1._wp / ( 0.710 + ze0(:,:) * ( 0.159 + ze0(:,:) * 0.021 ) ) ! ze2 = 1/zdelpsi 
     181            ze3(:,:) = 1._wp / EXP(ze3(:,:))                                       ! ze3 = 1/zze 
     182            ! 
     183            DO_3D_00_00 ( 1, nksr + 1 ) 
     184               ! zchl    = ALOG( ze0(ji,jj) ) 
     185               zlogc = ze0(ji,jj) 
     186               ! 
     187               zCb       = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 
     188               zCmax     = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) 
     189               zpsimax   = 0.6   - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) 
     190               ! zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 
     191               ! 
     192               zCze   = ze1(ji,jj) 
     193               zrdpsi = ze2(ji,jj)                                                 ! 1/zdelpsi 
     194               zpsi   = ze3(ji,jj) * gdepw(ji,jj,jk,Kmm)                           ! gdepw/zze 
     195               ! 
     196               ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 
     197               zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) * zrdpsi )**2 ) ) ) ) 
     198               ! Convert chlorophyll value to attenuation coefficient look-up table index 
     199               ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 
     200            END_3D 
     201         ELSE                                !* constant chlorophyll 
     202            zchl = 0.05 
     203            ! NB. make sure constant value is such that: 
     204            zchl = MIN( 10. , MAX( 0.03, zchl ) ) 
     205            ! Convert chlorophyll value to attenuation coefficient look-up table index 
     206            zlui = 41 + 20.*LOG10(zchl) + 1.e-15 
    168207            DO jk = 1, nksr + 1 
    169                DO jj = 2, jpjm1                       ! Separation in R-G-B depending of the surface Chl 
    170                   DO ji = 2, jpim1 
    171                      zchl    = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 
    172                      zCtot   = 40.6  * zchl**0.459 
    173                      zze     = 568.2 * zCtot**(-0.746) 
    174                      IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 
    175                      zpsi    = gdepw(ji,jj,jk,Kmm) / zze 
    176                      ! 
    177                      zlogc   = LOG( zchl ) 
    178                      zlogc2  = zlogc * zlogc 
    179                      zlogc3  = zlogc * zlogc * zlogc 
    180                      zCb     = 0.768 + 0.087 * zlogc - 0.179 * zlogc2 - 0.025 * zlogc3 
    181                      zCmax   = 0.299 - 0.289 * zlogc + 0.579 * zlogc2 
    182                      zpsimax = 0.6   - 0.640 * zlogc + 0.021 * zlogc2 + 0.115 * zlogc3 
    183                      zdelpsi = 0.710 + 0.159 * zlogc + 0.021 * zlogc2 
    184                      zCze    = 1.12  * (zchl)**0.803 
    185                      ! 
    186                      zchl3d(ji,jj,jk) = zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) 
    187                   END DO 
    188                   ! 
    189                END DO 
     208               ztmp3d(:,:,jk) = zlui 
    190209            END DO 
    191          ELSE                                !* constant chrlorophyll 
    192            DO jk = 1, nksr + 1 
    193               zchl3d(:,:,jk) = 0.05 
    194             ENDDO 
    195210         ENDIF 
    196211         ! 
    197212         zcoef  = ( 1. - rn_abs ) / 3._wp    !* surface equi-partition in R-G-B 
    198213         DO_2D_00_00 
    199             ze0(ji,jj,1) = rn_abs * qsr(ji,jj) 
    200             ze1(ji,jj,1) = zcoef  * qsr(ji,jj) 
    201             ze2(ji,jj,1) = zcoef  * qsr(ji,jj) 
    202             ze3(ji,jj,1) = zcoef  * qsr(ji,jj) 
    203             zea(ji,jj,1) =          qsr(ji,jj) 
     214            ze0(ji,jj) = rn_abs * qsr(ji,jj) 
     215            ze1(ji,jj) = zcoef  * qsr(ji,jj) 
     216            ze2(ji,jj) = zcoef  * qsr(ji,jj) 
     217            ze3(ji,jj) = zcoef  * qsr(ji,jj) 
     218            ! store the surface SW radiation; re-use the surface ztmp3d array 
     219            ! since the surface attenuation coefficient is not used 
     220            ztmp3d(ji,jj,1) =       qsr(ji,jj) 
    204221         END_2D 
    205222         ! 
    206          DO jk = 2, nksr+1                   !* interior equi-partition in R-G-B depending of vertical profile of Chl 
    207             DO_2D_00_00 
    208                zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) 
    209                irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) 
    210                zekb(ji,jj) = rkrgb(1,irgb) 
    211                zekg(ji,jj) = rkrgb(2,irgb) 
    212                zekr(ji,jj) = rkrgb(3,irgb) 
    213             END_2D 
    214  
    215             DO_2D_00_00 
    216                zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r       ) 
    217                zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) 
    218                zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) 
    219                zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) 
    220                ze0(ji,jj,jk) = zc0 
    221                ze1(ji,jj,jk) = zc1 
    222                ze2(ji,jj,jk) = zc2 
    223                ze3(ji,jj,jk) = zc3 
    224                zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 
    225             END_2D 
    226          END DO 
     223         !* interior equi-partition in R-G-B depending on vertical profile of Chl 
     224         DO_3D_00_00 ( 2, nksr + 1 ) 
     225            ze3t = e3t(ji,jj,jk-1,Kmm) 
     226            irgb = NINT( ztmp3d(ji,jj,jk) ) 
     227            zc0 = ze0(ji,jj) * EXP( - ze3t * xsi0r ) 
     228            zc1 = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) 
     229            zc2 = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) 
     230            zc3 = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) 
     231            ze0(ji,jj) = zc0 
     232            ze1(ji,jj) = zc1 
     233            ze2(ji,jj) = zc2 
     234            ze3(ji,jj) = zc3 
     235            ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 
     236         END_3D 
    227237         ! 
    228238         DO_3D_00_00( 1, nksr ) 
    229             qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( zea(ji,jj,jk) - zea(ji,jj,jk+1) ) 
     239            qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) 
    230240         END_3D 
    231241         ! 
    232          DEALLOCATE( zekb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d ) 
     242         DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d ) 
    233243         ! 
    234244      CASE( np_2BD  )            !==  2-bands fluxes  ==! 
    235245         ! 

Earlier Investigations

Option 1: Minmum memory usage

By rearranging the loop order and placing the vertical loop innermost then the code can be greatly simplified to an equivalent using minimal temporary storage. A couple of other changes seem sensible too, namely:

  • rename the zchl3d array to ztmp3d (since it is now used for two purposes)
  • only allocate ztmp3d to nksr+1; values below this are not used and nksr + 1 is likely << jpk
  • calculate and store the attenuation coefficient look-up table index as soon as the sub-surface chlorophyll value is known. This keeps all LOG operations in one loop and, in the case of constant chlorophyll, removes the LOG from the loop altogether.
      CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==!
         !
         ALLOCATE( ztmp3d(jpi,jpj,nksr + 1)   )
         !
         ! code to set ztmp3d(:,:,1:nskr+1)

         ! including the following changes after
         IF( nqsr == np_RGBc ) THEN          !*  Variable Chlorophyll
            .
            DO_3D_00_00( 1, nksr + 1 )
            .
            . no change until after
               zCze    = 1.12  * (zchl)**0.803
               !
               ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) )
               zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) )
               ! Convert chlorophyll value to attenuation coefficient look-up table index
               ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15
            END_3D
         ELSE                                !* constant chrlorophyll
           zchl = 0.05
           ! NB. make sure constant value is such that:
           zchl = MIN( 10. , MAX( 0.03, zchl ) )
           ! Convert chlorophyll value to attenuation coefficient look-up table index
           zlui = 41 + 20.*LOG10(zchl) + 1.e-15
           DO jk = 1, nksr + 1
              ztmp3d(:,:,jk) = zlui
           END DO
         ENDIF
         ! 
         !
         zcoef  = ( 1. - rn_abs ) / 3._wp    !* surface equi-partition in R-G-B
         ! store the surface SW radiation; re-purpose the surface ztmp3d array
         ! since the surface attenuation coefficient is not used
         ztmp3d(:,:,1) = qsr(:,:)
         !
         !* interior equi-partition in R-G-B depending of vertical profile of Chl
         DO_2D_00_00
            zc0 = rn_abs * qsr(ji,jj)
            zc1 = zcoef  * qsr(ji,jj)
            zc2 = zc1
            zc3 = zc1
            zc4 = e3t(ji,jj,1,Kmm)
            DO jk = 2, nksr+1
               irgb = NINT( ztmp3d(ji,jj,jk) )
               zc0 = zc0 * EXP( - zc4 * xsi0r       )
               zc1 = zc1 * EXP( - zc4 * rkrgb(1,irgb) )
               zc2 = zc2 * EXP( - zc4 * rkrgb(2,irgb) )
               zc3 = zc3 * EXP( - zc4 * rkrgb(3,irgb) )
               zc4 = e3t(ji,jj,jk,Kmm)
               ! store the SW radiation penetrating to this location
               ! re-purpose the ztmp3d array since the attenuation coefficient
               ! at this point will not be needed again
               ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk)
            END DO
         END_2D
         !
         DO_3D_00_00( 1, nksr )
            qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) )
         END_3D
         !
         DEALLOCATE( ztmp3d )

This is code and memory efficient but may perform poorly due to non-contiguous access to the array elements (see performance section below).

Option 2: Low memory use (retain loop order).

A compromise solution, which reduces memory use and should perform better is to remove all unnecessary full-depth arrays but maintain loop order by keeping a few 2D arrays. The same additional changes listed above are also made.

       CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==!
         !
         ALLOCATE( ze0 (jpi,jpj)           , ze1 (jpi,jpj) ,   &
            &      ze2 (jpi,jpj)           , ze3 (jpi,jpj) ,   &
            &      ztmp3d(jpi,jpj,nksr + 1)                     )
         !
         ! code to set ztmp3d(:,:,1:nskr+1)
         ! including the following changes after
         IF( nqsr == np_RGBc ) THEN          !*  Variable Chlorophyll
            .
            DO_3D_00_00( 1, nksr + 1 )
            .
            . no change until after
               zCze    = 1.12  * (zchl)**0.803
               !
               ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) )
               zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) )
               ! Convert chlorophyll value to attenuation coefficient look-up table index
               ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15
            END_3D
         ELSE                                !* constant chlorophyll
            zchl = 0.05
            ! NB. make sure constant value is such that:
            zchl = MIN( 10. , MAX( 0.03, zchl ) )
            ! Convert chlorophyll value to attenuation coefficient look-up table index
            zlui = 41 + 20.*LOG10(zchl) + 1.e-15
            DO jk = 1, nksr + 1
               ztmp3d(:,:,jk) = zlui
            END DO
         ENDIF
         !
         zcoef  = ( 1. - rn_abs ) / 3._wp    !* surface equi-partition in R-G-B
         DO_2D_00_00
            ze0(ji,jj) = rn_abs * qsr(ji,jj)
            ze1(ji,jj) = zcoef  * qsr(ji,jj)
            ze2(ji,jj) = zcoef  * qsr(ji,jj)
            ze3(ji,jj) = zcoef  * qsr(ji,jj)
            ! store the surface SW radiation; re-use the surface ztmp3d array
            ! since the surface attenuation coefficient is not used
            ztmp3d(ji,jj,1) =       qsr(ji,jj)
         END_2D
         !
          !* interior equi-partition in R-G-B depending of vertical profile of Chl
         DO_3D_00_00( 2, nksr+1 )
            irgb = NINT( ztmp3d(ji,jj,jk) )
            ze3t = e3t(ji,jj,jk-1,Kmm)
            ze0(ji,jj) = ze0(ji,jj) * EXP( - ze3t * xsi0r )
            ze1(ji,jj) = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) )
            ze2(ji,jj) = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) )
            ze3(ji,jj) = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) )
            ! store the SW radiation penetrating to this location
            ! re-use the ztmp3d array since the attenuation coefficient
            ! at this point will not be needed again
            ztmp3d(ji,jj,jk) = ( ze0(ji,jj) + ze1(ji,jj) + ze2(ji,jj) + ze3(ji,jj) ) * wmask(ji,jj,jk)
         END_3D
         !
         DO_3D_00_00( 1, nksr )
            qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) )
         END_3D
         !
         DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d )

Performance and evaluation

Both these options produce identical results to the original code (based on an ORCA2_ICE_PISCES test using SETTE (which includes variable surface chlorophyll inputs). ln_timing was activated and the CPU time (averaged across all processors) spent in tra_qsr used as a simple measure of performance. Unfortunately, variations in runtime between successive tests (even with the same code) on the NOC cluster were almost as great as any difference arising from algorithmic differences. Each test was repeated 6 times with the following results:

code option CPU seconds spent in tra_qsr Average
original code 0.34 0.34 0.35 0.35 0.34 0.34 0.3433
minimum memory option 0.36 0.36 0.37 0.36 0.36 0.37 0.3633
low memory option 0.35 0.35 0.35 0.36 0.36 0.35 0.3533

from which the tentative conclusion is that the minimum memory option does perform consistently worst but the low memory option appears to be a suitable replacement to the original code. More stringent tests are require to confirm this.

These initial tests were performed using the standard 32 processor SETTE test for ORCA2_ICE_PISCES. To search for a better distinction between the options further tests were made by varying the number of processors. Tests with 2, 8, 32 and 60 processors were performed (3 for each option at each core count). The following table shows the percentage of CPU time spent in tra_qsr and the rank of the tra_qsr routine in the CPU time-sorted list of routines (a higher rank means tra_qsr is taking proportionally less of the overall CPU time). In each case the average of the 3 samples is given.:

% CPU spent in tra_qsr
#CPUs original min-mem low-mem
2 1.76 1.82 1.83
8 1.38 1.48 1.46
32 0.48 0.49 0.5
60 0.24 0.26 0.26


Rank in sorted list of routines by CPU usage
#CPUs original min-mem low-mem
2 14 12.67 12
8 16.33 15.67 15
32 22.33 21.33 23.33
60 26 25 25

Unfortunately the message is still mixed and, although the new codes reduce the size and number of temporary arrays; they have a slight detrimental effect on performance and do not alter the tendency for tra_qsr to consume higher percentages of the CPU time as domain size increases.

The low-memory option does, at least, lend itself to tiling.

Option 2 revisited

Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop. Importantly, there are also opportunities to avoid some expensive operations by performing some calculations in log space. Here are the mathematically equivalent alternatives:

  • traqsr.F90

    old new  
    111111      REAL(wp) ::   zzc0, zzc1, zzc2, zzc3   !    -         - 
    112112      REAL(wp) ::   zz0 , zz1 , ze3t, zlui   !    -         - 
    113113      REAL(wp) ::   zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze 
    114       REAL(wp) ::   zlogc 
     114      REAL(wp) ::   zlogc, zlogze, zlogCtot, zlogCze 
    115115      REAL(wp), ALLOCATABLE, DIMENSION(:,:)   :: ze0, ze1, ze2, ze3 
    116116      REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d 
    117117      !!---------------------------------------------------------------------- 
     
    165165         IF( nqsr == np_RGBc ) THEN          !*  Variable Chlorophyll 
    166166            CALL fld_read( kt, 1, sf_chl )         ! Read Chl data and provides it at the current time step 
    167167            ! Separation in R-G-B depending of the surface Chl 
     168            ze0(:,:) = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) 
    168169            DO_3D_00_00 ( 1, nksr + 1 ) 
    169                zchl    = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 
    170                zCze    = 1.12  * zchl**0.803 
    171                zCtot   = 40.6  * zchl**0.459 
    172                zlogc   = LOG( zchl ) 
     170               ! zchl    = ze0(ji,jj) 
     171               zlogc   = LOG( ze0(ji,jj) ) 
     172               zlogCze = 0.113328685307 + 0.803 * zlogc   ! log(zCze  = 1.12  * zchl**0.803) 
     173               zlogCtot= 3.703768066608 + 0.459 * zlogc   ! log(zCtot = 40.6  * zchl**0.459) 
    173174               ! 
    174175               zCb     = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 
    175176               zCmax   = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) 
    176177               zpsimax = 0.6   - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) 
    177178               zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 
    178179               ! 
    179                zze     = 568.2 * zCtot**(-0.746) 
    180                IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 
    181                zpsi    = gdepw(ji,jj,jk,Kmm) / zze 
     180               zlogze  = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) 
     181               IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot 
     182                                                          ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) 
     183               zze  = EXP( zlogze ) 
     184               zpsi = gdepw(ji,jj,jk,Kmm) / zze 
     185               zCze = EXP( zlogCze ) 
    182186               ! 
    183187               ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 
    184188               zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) 
     
    207211            ztmp3d(ji,jj,1) =       qsr(ji,jj) 
    208212         END_2D 
    209213         ! 
    210          !* interior equi-partition in R-G-B depending of vertical profile of Chl 
     214         !* interior equi-partition in R-G-B depending on vertical profile of Chl 
    211215         DO_3D_00_00 ( 2, nksr + 1 ) 
    212216            ze3t = e3t(ji,jj,jk-1,Kmm) 
    213217            irgb = NINT( ztmp3d(ji,jj,jk) ) 

Despite significant variation between identical runs on the NOC cluster there is evidence that this second version of the low-memory option improves on the original code and is certainly no worse whilst using less storage. Here are the tables with the new results added. Graphs are shown below the code differences.

% CPU spent in tra_qsr
#CPUs original min-mem low-mem low-men v2
2 1.76 1.82 1.83 1.19
8 1.38 1.48 1.46 0.56
32 0.48 0.49 0.5 0.3
60 0.24 0.26 0.26 0.08


Rank in sorted list of routines by CPU usage
#CPUs original min-mem low-mem low-men v2
2 14 12.67 12 18.33
8 16.33 15.67 15 21
32 22.33 21.33 23.33 27
60 26 25 25 30

Here is the final set of differences between this improved low-memory solution and the original traqsr.F90:

  • traqsr.F90

    old new  
    109109      REAL(wp) ::   zchl, zcoef, z1_2        ! local scalars 
    110110      REAL(wp) ::   zc0 , zc1 , zc2 , zc3    !    -         - 
    111111      REAL(wp) ::   zzc0, zzc1, zzc2, zzc3   !    -         - 
    112       REAL(wp) ::   zz0 , zz1                !    -         - 
     112      REAL(wp) ::   zz0 , zz1 , ze3t, zlui   !    -         - 
    113113      REAL(wp) ::   zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze 
    114       REAL(wp) ::   zlogc, zlogc2, zlogc3 
    115       REAL(wp), ALLOCATABLE, DIMENSION(:,:)   :: zekb, zekg, zekr 
    116       REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ze0, ze1, ze2, ze3, zea, ztrdt 
    117       REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: zetot, zchl3d 
     114      REAL(wp) ::   zlogc, zlogze, zlogCtot, zlogCze 
     115      REAL(wp), ALLOCATABLE, DIMENSION(:,:)   :: ze0, ze1, ze2, ze3 
     116      REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d 
    118117      !!---------------------------------------------------------------------- 
    119118      ! 
    120119      IF( ln_timing )   CALL timing_start('tra_qsr') 
     
    159158         ! 
    160159      CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==! 
    161160         ! 
    162          ALLOCATE( zekb(jpi,jpj)     , zekg(jpi,jpj)     , zekr  (jpi,jpj)     , & 
    163             &      ze0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2   (jpi,jpj,jpk) , & 
    164             &      ze3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk)   ) 
     161         ALLOCATE( ze0 (jpi,jpj)           , ze1 (jpi,jpj) ,  & 
     162            &      ze2 (jpi,jpj)           , ze3 (jpi,jpj) ,  & 
     163            &      ztmp3d(jpi,jpj,nksr + 1)                     ) 
    165164         ! 
    166165         IF( nqsr == np_RGBc ) THEN          !*  Variable Chlorophyll 
    167166            CALL fld_read( kt, 1, sf_chl )         ! Read Chl data and provides it at the current time step 
     167            ! Separation in R-G-B depending of the surface Chl 
     168            ze0(:,:) = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) 
     169            DO_3D_00_00 ( 1, nksr + 1 ) 
     170               ! zchl    = ze0(ji,jj) 
     171               zlogc   = LOG( ze0(ji,jj) ) 
     172               zlogCze = 0.113328685307 + 0.803 * zlogc   ! log(zCze  = 1.12  * zchl**0.803) 
     173               zlogCtot= 3.703768066608 + 0.459 * zlogc   ! log(zCtot = 40.6  * zchl**0.459) 
     174               ! 
     175               zCb     = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 
     176               zCmax   = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) 
     177               zpsimax = 0.6   - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) 
     178               zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 
     179               ! 
     180               zlogze  = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) 
     181               IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot 
     182                                                          ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) 
     183               zze  = EXP( zlogze ) 
     184               zpsi = gdepw(ji,jj,jk,Kmm) / zze 
     185               zCze = EXP( zlogCze ) 
     186               ! 
     187               ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 
     188               zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) 
     189               ! Convert chlorophyll value to attenuation coefficient look-up table index 
     190               ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 
     191            END_3D 
     192         ELSE                                !* constant chlorophyll 
     193            zchl = 0.05 
     194            ! NB. make sure constant value is such that: 
     195            zchl = MIN( 10. , MAX( 0.03, zchl ) ) 
     196            ! Convert chlorophyll value to attenuation coefficient look-up table index 
     197            zlui = 41 + 20.*LOG10(zchl) + 1.e-15 
    168198            DO jk = 1, nksr + 1 
    169                DO jj = 2, jpjm1                       ! Separation in R-G-B depending of the surface Chl 
    170                   DO ji = 2, jpim1 
    171                      zchl    = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 
    172                      zCtot   = 40.6  * zchl**0.459 
    173                      zze     = 568.2 * zCtot**(-0.746) 
    174                      IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 
    175                      zpsi    = gdepw(ji,jj,jk,Kmm) / zze 
    176                      ! 
    177                      zlogc   = LOG( zchl ) 
    178                      zlogc2  = zlogc * zlogc 
    179                      zlogc3  = zlogc * zlogc * zlogc 
    180                      zCb     = 0.768 + 0.087 * zlogc - 0.179 * zlogc2 - 0.025 * zlogc3 
    181                      zCmax   = 0.299 - 0.289 * zlogc + 0.579 * zlogc2 
    182                      zpsimax = 0.6   - 0.640 * zlogc + 0.021 * zlogc2 + 0.115 * zlogc3 
    183                      zdelpsi = 0.710 + 0.159 * zlogc + 0.021 * zlogc2 
    184                      zCze    = 1.12  * (zchl)**0.803 
    185                      ! 
    186                      zchl3d(ji,jj,jk) = zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) 
    187                   END DO 
    188                   ! 
    189                END DO 
     199               ztmp3d(:,:,jk) = zlui 
    190200            END DO 
    191          ELSE                                !* constant chrlorophyll 
    192            DO jk = 1, nksr + 1 
    193               zchl3d(:,:,jk) = 0.05 
    194             ENDDO 
    195201         ENDIF 
    196202         ! 
    197203         zcoef  = ( 1. - rn_abs ) / 3._wp    !* surface equi-partition in R-G-B 
    198204         DO_2D_00_00 
    199             ze0(ji,jj,1) = rn_abs * qsr(ji,jj) 
    200             ze1(ji,jj,1) = zcoef  * qsr(ji,jj) 
    201             ze2(ji,jj,1) = zcoef  * qsr(ji,jj) 
    202             ze3(ji,jj,1) = zcoef  * qsr(ji,jj) 
    203             zea(ji,jj,1) =          qsr(ji,jj) 
     205            ze0(ji,jj) = rn_abs * qsr(ji,jj) 
     206            ze1(ji,jj) = zcoef  * qsr(ji,jj) 
     207            ze2(ji,jj) = zcoef  * qsr(ji,jj) 
     208            ze3(ji,jj) = zcoef  * qsr(ji,jj) 
     209            ! store the surface SW radiation; re-use the surface ztmp3d array 
     210            ! since the surface attenuation coefficient is not used 
     211            ztmp3d(ji,jj,1) =       qsr(ji,jj) 
    204212         END_2D 
    205213         ! 
    206          DO jk = 2, nksr+1                   !* interior equi-partition in R-G-B depending of vertical profile of Chl 
    207             DO_2D_00_00 
    208                zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) 
    209                irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) 
    210                zekb(ji,jj) = rkrgb(1,irgb) 
    211                zekg(ji,jj) = rkrgb(2,irgb) 
    212                zekr(ji,jj) = rkrgb(3,irgb) 
    213             END_2D 
    214  
    215             DO_2D_00_00 
    216                zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r       ) 
    217                zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) 
    218                zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) 
    219                zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) 
    220                ze0(ji,jj,jk) = zc0 
    221                ze1(ji,jj,jk) = zc1 
    222                ze2(ji,jj,jk) = zc2 
    223                ze3(ji,jj,jk) = zc3 
    224                zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 
    225             END_2D 
    226          END DO 
     214         !* interior equi-partition in R-G-B depending on vertical profile of Chl 
     215         DO_3D_00_00 ( 2, nksr + 1 ) 
     216            ze3t = e3t(ji,jj,jk-1,Kmm) 
     217            irgb = NINT( ztmp3d(ji,jj,jk) ) 
     218            zc0 = ze0(ji,jj) * EXP( - ze3t * xsi0r ) 
     219            zc1 = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) 
     220            zc2 = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) 
     221            zc3 = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) 
     222            ze0(ji,jj) = zc0 
     223            ze1(ji,jj) = zc1 
     224            ze2(ji,jj) = zc2 
     225            ze3(ji,jj) = zc3 
     226            ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 
     227         END_3D 
    227228         ! 
    228229         DO_3D_00_00( 1, nksr ) 
    229             qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( zea(ji,jj,jk) - zea(ji,jj,jk+1) ) 
     230            qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) 
    230231         END_3D 
    231232         ! 
    232          DEALLOCATE( zekb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d ) 
     233         DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d ) 
    233234         ! 
    234235      CASE( np_2BD  )            !==  2-bands fluxes  ==! 
    235236         ! 

Documentation updates

Using previous parts, define the main changes to be done in the NEMO literature (manuals, guide, web pages, …).

Preview

Since the preview step must be completed before the PI starts the coding, the previewer(s) answers are expected to be completed within the two weeks after the PI has sent the request to the previewer(s).
Then an iterative process should take place between PI and previewer(s) in order to find a consensus

Possible bottlenecks:

  • the methodology
  • the flowchart and list of routines to be changed
  • the new list of variables wrt coding rules
  • the summary of updates in literature

Once an agreement has been reached, preview is ended and the PI can start the development into his branch.

Tests

Once the development is done, the PI should complete the tests section below and after ask the reviewers to start their review.

This part should contain the detailed results of SETTE tests (restartability and reproducibility for each of the reference configuration) and detailed results of restartability and reproducibility when the option is activated on specified configurations used for this test

Regular checks:

  • Can this change be shown to produce expected impact (option activated)?
  • Can this change be shown to have a null impact (option not activated)?
  • Results of the required bit comparability tests been run: are there no differences when activating the development?
  • If some differences appear, is reason for the change valid/understood?
  • If some differences appear, is the impact as expected on model configurations?
  • Is this change expected to preserve all diagnostics?
  • If no, is reason for the change valid/understood?
  • Are there significant changes in run time/memory?

Review

A successful review is needed to schedule the merge of this development into the future NEMO release during next Merge Party (usually in November).

Assessments:

  • Is the proposed methodology now implemented?
  • Are the code changes in agreement with the flowchart defined at preview step?
  • Are the code changes in agreement with list of routines and variables as proposed at preview step?
    If, not, are the discrepancies acceptable?
  • Is the in-line documentation accurate and sufficient?
  • Do the code changes comply with NEMO coding standards?
  • Is the development documented with sufficient details for others to understand the impact of the change?
  • Is the project literature (manual, guide, web, …) now updated or completed following the proposed summary in preview section?

Finding:

Is the review fully successful? If not, please indicate what is still missing


Once review is successful, the development must be scheduled for merge during next Merge Party Meeting.

Attachments (13)

Download all attachments as: .zip