= Name and subject of the action Last edition: '''[[Wikinfo(changed_ts)]]''' by '''[[Wikinfo(changed_by)]]''' The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected. [[PageOutline(2, , inline)]] == Summary ||=Action ||ENHANCE-10_acc_fix_traqsr || ||=PI(S) || acc || ||=Digest || Reduce use of 3D allocatable arrays in RGB light penetration schemes || ||=Dependencies || If any || ||=Branch || source:/NEMO/branches/{YEAR}/dev_r{REV}_{ACTION_NAME} || ||=Previewer(s) || Names || ||=Reviewer(s) || Names || ||=Ticket || #2469 || === Description The current implementation of RGB light penetration in traqsr (either varying or constant chlorophyll) uses 6, domain-sized 3D, temporary arrays which can be reduced to a few 2D arrays. The impact of the current implementation is most evident at lower processor counts where the impact of the extra 3D arrays can cause cache-misses and memory band-width issues. In an extreme case traqsr can switch from consuming 2% of run-time to 68% (comparing ORCA025 running on 200 cores vs 48 cores). A simple redesign of the algorithm should remove this behaviour. === Implementation The current code is structured thus: {{{#!f CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! ALLOCATE( zekb(jpi,jpj) , zekg(jpi,jpj) , zekr (jpi,jpj) , & & ze0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2 (jpi,jpj,jpk) , & & ze3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk) ) ! ! code to set zchl3d(:,:,1:nskr+1) ! (the chlorophyll value projected from the surface values) ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00 ze0(ji,jj,1) = rn_abs * qsr(ji,jj) ze1(ji,jj,1) = zcoef * qsr(ji,jj) ze2(ji,jj,1) = zcoef * qsr(ji,jj) ze3(ji,jj,1) = zcoef * qsr(ji,jj) zea(ji,jj,1) = qsr(ji,jj) END_2D ! DO jk = 2, nksr+1 !* interior equi-partition in R-G-B depending of vertical profile of Chl DO_2D_00_00 zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) zekb(ji,jj) = rkrgb(1,irgb) zekg(ji,jj) = rkrgb(2,irgb) zekr(ji,jj) = rkrgb(3,irgb) END_2D DO_2D_00_00 zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r ) zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) ze0(ji,jj,jk) = zc0 ze1(ji,jj,jk) = zc1 ze2(ji,jj,jk) = zc2 ze3(ji,jj,jk) = zc3 zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) END_2D END DO ! DO_3D_00_00( 1, nksr ) qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( zea(ji,jj,jk) - zea(ji,jj,jk+1) ) END_3D ! DEALLOCATE( zekb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d ) ! }}} Where most of the temporary, full-depth arrays are not necessary because only two vertical levels are required at any one time. In fact even the zea array is unnecessary since the zchl3d array could be repurposed once its value has been used. === Option 1: Minmum memory usage By rearranging the loop order and placing the vertical loop innermost then the code can be greatly simplified to an equivalent using minimal temporary storage. A couple of other changes seem sensible too, namely: * rename the zchl3d array to ztmp3d (since it is now used for two purposes) * only allocate ztmp3d to nksr+1; values below this are not used and nksr + 1 is likely << jpk * calculate and store the attenuation coefficient look-up table index as soon as the sub-surface chlorophyll value is known. This keeps all LOG operations in one loop and, in the case of constant chlorophyll, removes the LOG from the loop altogether. {{{#!f CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! ALLOCATE( ztmp3d(jpi,jpj,nksr + 1) ) ! ! code to set ztmp3d(:,:,1:nskr+1) ! including the following changes after IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll . DO_3D_00_00( 1, nksr + 1 ) . . no change until after zCze = 1.12 * (zchl)**0.803 ! ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 END_3D ELSE !* constant chrlorophyll zchl = 0.05 ! NB. make sure constant value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index zlui = 41 + 20.*LOG10(zchl) + 1.e-15 DO jk = 1, nksr + 1 ztmp3d(:,:,jk) = zlui END DO ENDIF ! ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B ! store the surface SW radiation; re-purpose the surface ztmp3d array ! since the surface attenuation coefficient is not used ztmp3d(:,:,1) = qsr(:,:) ! !* interior equi-partition in R-G-B depending of vertical profile of Chl DO_2D_00_00 zc0 = rn_abs * qsr(ji,jj) zc1 = zcoef * qsr(ji,jj) zc2 = zc1 zc3 = zc1 zc4 = e3t(ji,jj,1,Kmm) DO jk = 2, nksr+1 irgb = NINT( ztmp3d(ji,jj,jk) ) zc0 = zc0 * EXP( - zc4 * xsi0r ) zc1 = zc1 * EXP( - zc4 * rkrgb(1,irgb) ) zc2 = zc2 * EXP( - zc4 * rkrgb(2,irgb) ) zc3 = zc3 * EXP( - zc4 * rkrgb(3,irgb) ) zc4 = e3t(ji,jj,jk,Kmm) ! store the SW radiation penetrating to this location ! re-purpose the ztmp3d array since the attenuation coefficient ! at this point will not be needed again ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) END DO END_2D ! DO_3D_00_00( 1, nksr ) qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) END_3D ! DEALLOCATE( ztmp3d ) }}} This is code and memory efficient but may perform poorly due to non-contiguous access to the array elements (see performance section below). === Option 2: Low memory use (retain loop order). A compromise solution, which reduces memory use and should perform better is to remove all unnecessary full-depth arrays but maintain loop order by keeping a few 2D arrays. The same additional changes listed above are also made. {{{#!f CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! ALLOCATE( ze0 (jpi,jpj) , ze1 (jpi,jpj) , & & ze2 (jpi,jpj) , ze3 (jpi,jpj) , & & ztmp3d(jpi,jpj,nksr + 1) ) ! ! code to set ztmp3d(:,:,1:nskr+1) ! including the following changes after IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll . DO_3D_00_00( 1, nksr + 1 ) . . no change until after zCze = 1.12 * (zchl)**0.803 ! ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 END_3D ELSE !* constant chlorophyll zchl = 0.05 ! NB. make sure constant value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index zlui = 41 + 20.*LOG10(zchl) + 1.e-15 DO jk = 1, nksr + 1 ztmp3d(:,:,jk) = zlui END DO ENDIF ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00 ze0(ji,jj) = rn_abs * qsr(ji,jj) ze1(ji,jj) = zcoef * qsr(ji,jj) ze2(ji,jj) = zcoef * qsr(ji,jj) ze3(ji,jj) = zcoef * qsr(ji,jj) ! store the surface SW radiation; re-use the surface ztmp3d array ! since the surface attenuation coefficient is not used ztmp3d(ji,jj,1) = qsr(ji,jj) END_2D ! !* interior equi-partition in R-G-B depending of vertical profile of Chl DO_3D_00_00( 2, nksr+1 ) irgb = NINT( ztmp3d(ji,jj,jk) ) ze3t = e3t(ji,jj,jk-1,Kmm) ze0(ji,jj) = ze0(ji,jj) * EXP( - ze3t * xsi0r ) ze1(ji,jj) = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) ze2(ji,jj) = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) ze3(ji,jj) = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) ! store the SW radiation penetrating to this location ! re-use the ztmp3d array since the attenuation coefficient ! at this point will not be needed again ztmp3d(ji,jj,jk) = ( ze0(ji,jj) + ze1(ji,jj) + ze2(ji,jj) + ze3(ji,jj) ) * wmask(ji,jj,jk) END_3D ! DO_3D_00_00( 1, nksr ) qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) END_3D ! DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d ) }}} === Performance and evaluation Both these options produce identical results to the original code (based on an ORCA2_ICE_PISCES test using SETTE (which includes variable surface chlorophyll inputs). ln_timing was activated and the CPU time (averaged across all processors) spent in tra_qsr used as a simple measure of performance. Unfortunately, variations in runtime between successive tests (even with the same code) on the NOC cluster were almost as great as any difference arising from algorithmic differences. Each test was repeated 6 times with the following results: || code option |||||||||||| CPU seconds spent in tra_qsr || Average || || original code || 0.34 || 0.34 || 0.35 || 0.35 || 0.34 || 0.34 || '''0.3433''' || || minimum memory option || 0.36 || 0.36 || 0.37 || 0.36 || 0.36 || 0.37 || '''0.3633 ''' || || low memory option || 0.35 || 0.35 || 0.35 || 0.36 || 0.36 || 0.35 || ''' 0.3533 ''' || from which the tentative conclusion is that the minimum memory option does perform consistently worst but the low memory option appears to be a suitable replacement to the original code. More stringent tests are require to confirm this. These initial tests were performed using the standard 32 processor SETTE test for ORCA2_ICE_PISCES. To search for a better distinction between the options further tests were made by varying the number of processors. Tests with 2, 8, 32 and 60 processors were performed (3 for each option at each core count). The following table shows the percentage of CPU time spent in tra_qsr and the rank of the tra_qsr routine in the CPU time-sorted list of routines (a higher rank means tra_qsr is taking proportionally less of the overall CPU time). In each case the average of the 3 samples is given.: |||||||| '''% CPU spent in tra_qsr''' || || #CPUs || original || min-mem || low-mem || || 2 || 1.76 || 1.82 || 1.83 || || 8 || 1.38 || 1.48 || 1.46 || || 32 || 0.48 || 0.49 || 0.5 || || 60 || 0.24 || 0.26 || 0.26 || \\ |||||||| '''Rank in sorted list of routines by CPU usage ''' || || #CPUs || original || min-mem || low-mem || || 2 || 14 || 12.67 || 12 || || 8 || 16.33 || 15.67 || 15 || || 32 || 22.33 || 21.33 || 23.33 || || 60 || 26 || 25 || 25 || Unfortunately the message is still mixed and, although the new codes reduce the size and number of temporary arrays; they have a slight detrimental effect on performance and do not alter the tendency for tra_qsr to consume higher percentages of the CPU time as domain size increases. The low-memory option does, at least, lend itself to tiling. == Option 2 revisited == Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop. Importantly, there are also opportunities to avoid some expensive operations by performing some calculations in log space. Here are the mathematically equivalent alternatives: {{{#!diff --- LOMEM3/traqsr.F90 2020-05-15 15:44:11.652736539 +0100 +++ traqsr.F90 2020-05-19 11:46:43.209540652 +0100 @@ -111,7 +111,7 @@ REAL(wp) :: zzc0, zzc1, zzc2, zzc3 ! - - REAL(wp) :: zz0 , zz1 , ze3t, zlui ! - - REAL(wp) :: zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze - REAL(wp) :: zlogc + REAL(wp) :: zlogc, zlogze, zlogCtot, zlogCze REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: ze0, ze1, ze2, ze3 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d !!---------------------------------------------------------------------- @@ -165,20 +165,24 @@ IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step ! Separation in R-G-B depending of the surface Chl + ze0(:,:) = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) DO_3D_00_00 ( 1, nksr + 1 ) - zchl = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) - zCze = 1.12 * zchl**0.803 - zCtot = 40.6 * zchl**0.459 - zlogc = LOG( zchl ) + ! zchl = ze0(ji,jj) + zlogc = LOG( ze0(ji,jj) ) + zlogCze = 0.113328685307 + 0.803 * zlogc ! log(zCze = 1.12 * zchl**0.803) + zlogCtot= 3.703768066608 + 0.459 * zlogc ! log(zCtot = 40.6 * zchl**0.459) ! zCb = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) zCmax = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) zpsimax = 0.6 - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) ! - zze = 568.2 * zCtot**(-0.746) - IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) - zpsi = gdepw(ji,jj,jk,Kmm) / zze + zlogze = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) + IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot + ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) + zze = EXP( zlogze ) + zpsi = gdepw(ji,jj,jk,Kmm) / zze + zCze = EXP( zlogCze ) ! ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) @@ -207,7 +211,7 @@ ztmp3d(ji,jj,1) = qsr(ji,jj) END_2D ! - !* interior equi-partition in R-G-B depending of vertical profile of Chl + !* interior equi-partition in R-G-B depending on vertical profile of Chl DO_3D_00_00 ( 2, nksr + 1 ) ze3t = e3t(ji,jj,jk-1,Kmm) irgb = NINT( ztmp3d(ji,jj,jk) ) }}} Despite significant variation between identical runs on the NOC cluster there is evidence that this second version of the low-memory option improves on the original code and is certainly no worse whilst using less storage. Here are the tables with the new results added. Graphs are shown below the code differences. |||||||||| '''% CPU spent in tra_qsr''' || || #CPUs || original || min-mem || low-mem || low-men v2 || || 2 || 1.76 || 1.82 || 1.83 || 1.19 || || 8 || 1.38 || 1.48 || 1.46 || 0.56 || || 32 || 0.48 || 0.49 || 0.5 || 0.3 || || 60 || 0.24 || 0.26 || 0.26 || 0.08 || \\ |||||||||| '''Rank in sorted list of routines by CPU usage ''' || || #CPUs || original || min-mem || low-mem || low-men v2 || || 2 || 14 || 12.67 || 12 || 18.33 || || 8 || 16.33 || 15.67 || 15 || 21|| || 32 || 22.33 || 21.33 || 23.33 || 27 || || 60 || 26 || 25 || 25 || 30 || Here is the final set of differences between this improved low-memory solution and the original traqsr.F90: {{{#!diff --- ORG/traqsr.F90 2020-05-13 11:37:57.094258396 +0100 +++ traqsr.F90 2020-05-19 11:46:43.209540652 +0100 @@ -109,12 +109,11 @@ REAL(wp) :: zchl, zcoef, z1_2 ! local scalars REAL(wp) :: zc0 , zc1 , zc2 , zc3 ! - - REAL(wp) :: zzc0, zzc1, zzc2, zzc3 ! - - - REAL(wp) :: zz0 , zz1 ! - - + REAL(wp) :: zz0 , zz1 , ze3t, zlui ! - - REAL(wp) :: zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze - REAL(wp) :: zlogc, zlogc2, zlogc3 - REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: zekb, zekg, zekr - REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ze0, ze1, ze2, ze3, zea, ztrdt - REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: zetot, zchl3d + REAL(wp) :: zlogc, zlogze, zlogCtot, zlogCze + REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: ze0, ze1, ze2, ze3 + REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d !!---------------------------------------------------------------------- ! IF( ln_timing ) CALL timing_start('tra_qsr') @@ -159,77 +158,79 @@ ! CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! - ALLOCATE( zekb(jpi,jpj) , zekg(jpi,jpj) , zekr (jpi,jpj) , & - & ze0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2 (jpi,jpj,jpk) , & - & ze3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk) ) + ALLOCATE( ze0 (jpi,jpj) , ze1 (jpi,jpj) , & + & ze2 (jpi,jpj) , ze3 (jpi,jpj) , & + & ztmp3d(jpi,jpj,nksr + 1) ) ! IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step + ! Separation in R-G-B depending of the surface Chl + ze0(:,:) = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) + DO_3D_00_00 ( 1, nksr + 1 ) + ! zchl = ze0(ji,jj) + zlogc = LOG( ze0(ji,jj) ) + zlogCze = 0.113328685307 + 0.803 * zlogc ! log(zCze = 1.12 * zchl**0.803) + zlogCtot= 3.703768066608 + 0.459 * zlogc ! log(zCtot = 40.6 * zchl**0.459) + ! + zCb = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) + zCmax = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) + zpsimax = 0.6 - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) + zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) + ! + zlogze = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) + IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot + ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) + zze = EXP( zlogze ) + zpsi = gdepw(ji,jj,jk,Kmm) / zze + zCze = EXP( zlogCze ) + ! + ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) + zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) + ! Convert chlorophyll value to attenuation coefficient look-up table index + ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 + END_3D + ELSE !* constant chlorophyll + zchl = 0.05 + ! NB. make sure constant value is such that: + zchl = MIN( 10. , MAX( 0.03, zchl ) ) + ! Convert chlorophyll value to attenuation coefficient look-up table index + zlui = 41 + 20.*LOG10(zchl) + 1.e-15 DO jk = 1, nksr + 1 - DO jj = 2, jpjm1 ! Separation in R-G-B depending of the surface Chl - DO ji = 2, jpim1 - zchl = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) - zCtot = 40.6 * zchl**0.459 - zze = 568.2 * zCtot**(-0.746) - IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) - zpsi = gdepw(ji,jj,jk,Kmm) / zze - ! - zlogc = LOG( zchl ) - zlogc2 = zlogc * zlogc - zlogc3 = zlogc * zlogc * zlogc - zCb = 0.768 + 0.087 * zlogc - 0.179 * zlogc2 - 0.025 * zlogc3 - zCmax = 0.299 - 0.289 * zlogc + 0.579 * zlogc2 - zpsimax = 0.6 - 0.640 * zlogc + 0.021 * zlogc2 + 0.115 * zlogc3 - zdelpsi = 0.710 + 0.159 * zlogc + 0.021 * zlogc2 - zCze = 1.12 * (zchl)**0.803 - ! - zchl3d(ji,jj,jk) = zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) - END DO - ! - END DO + ztmp3d(:,:,jk) = zlui END DO - ELSE !* constant chrlorophyll - DO jk = 1, nksr + 1 - zchl3d(:,:,jk) = 0.05 - ENDDO ENDIF ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00 - ze0(ji,jj,1) = rn_abs * qsr(ji,jj) - ze1(ji,jj,1) = zcoef * qsr(ji,jj) - ze2(ji,jj,1) = zcoef * qsr(ji,jj) - ze3(ji,jj,1) = zcoef * qsr(ji,jj) - zea(ji,jj,1) = qsr(ji,jj) + ze0(ji,jj) = rn_abs * qsr(ji,jj) + ze1(ji,jj) = zcoef * qsr(ji,jj) + ze2(ji,jj) = zcoef * qsr(ji,jj) + ze3(ji,jj) = zcoef * qsr(ji,jj) + ! store the surface SW radiation; re-use the surface ztmp3d array + ! since the surface attenuation coefficient is not used + ztmp3d(ji,jj,1) = qsr(ji,jj) END_2D ! - DO jk = 2, nksr+1 !* interior equi-partition in R-G-B depending of vertical profile of Chl - DO_2D_00_00 - zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) - irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) - zekb(ji,jj) = rkrgb(1,irgb) - zekg(ji,jj) = rkrgb(2,irgb) - zekr(ji,jj) = rkrgb(3,irgb) - END_2D - - DO_2D_00_00 - zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r ) - zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) - zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) - zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) - ze0(ji,jj,jk) = zc0 - ze1(ji,jj,jk) = zc1 - ze2(ji,jj,jk) = zc2 - ze3(ji,jj,jk) = zc3 - zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) - END_2D - END DO + !* interior equi-partition in R-G-B depending on vertical profile of Chl + DO_3D_00_00 ( 2, nksr + 1 ) + ze3t = e3t(ji,jj,jk-1,Kmm) + irgb = NINT( ztmp3d(ji,jj,jk) ) + zc0 = ze0(ji,jj) * EXP( - ze3t * xsi0r ) + zc1 = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) + zc2 = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) + zc3 = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) + ze0(ji,jj) = zc0 + ze1(ji,jj) = zc1 + ze2(ji,jj) = zc2 + ze3(ji,jj) = zc3 + ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) + END_3D ! DO_3D_00_00( 1, nksr ) - qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( zea(ji,jj,jk) - zea(ji,jj,jk+1) ) + qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) END_3D ! - DEALLOCATE( zekb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d ) + DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d ) ! CASE( np_2BD ) !== 2-bands fluxes ==! ! }}} [[Image(percent_cpu_qsr.3.png)]] [[Image(rankqsr.3.png)]] ''...'' === Documentation updates {{{#!box width=55em help Using previous parts, define the main changes to be done in the NEMO literature (manuals, guide, web pages, …). }}} ''...'' == Preview {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#preview_)]] }}} ''...'' == Tests {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#tests)]] }}} ''...'' == Review {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#review)]] }}} ''...''