Version 20 (modified by acc, 4 years ago) (diff) |
---|
Name and subject of the action
Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?
The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.
Summary
Action | ENHANCE-10_acc_fix_traqsr |
---|---|
PI(S) | acc |
Digest | Reduce use of 3D allocatable arrays in RGB light penetration schemes |
Dependencies | If any |
Branch | source:/NEMO/branches/2020/dev_r12953_ENHANCE-10_acc_fix_traqsr |
Previewer(s) | Sebastien |
Reviewer(s) | Names |
Ticket | #2469 |
Description
The current implementation of RGB light penetration in traqsr (either varying or constant chlorophyll) uses 6, domain-sized 3D, temporary arrays which can be reduced to a few 2D arrays. The impact of the current implementation is most evident at lower processor counts where the impact of the extra 3D arrays can cause cache-misses and memory band-width issues. In an extreme case traqsr can switch from consuming 2% of run-time to 68% (comparing ORCA025 running on 200 cores vs 48 cores).
A simple redesign of the algorithm should remove this behaviour.
Implementation
The code in question is contained within a CASE construct which deals with both varying surface Chlorophyll or constant values; in both cases the following temporary arrays are allocated:
CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! ALLOCATE( zekb(jpi,jpj) , zekg(jpi,jpj) , zekr (jpi,jpj) , & & ze0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2 (jpi,jpj,jpk) , & & ze3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk) )
In the case of varying Chlorophyll; surface values are read and then used in a 3D loop to set subsurface value. For constant Chlorophyll a single value is used throughout:
PART 1 ------- IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step DO jk = 1, nksr + 1 DO jj = 2, jpjm1 ! Separation in R-G-B depending of the surface Chl DO ji = 2, jpim1 zchl = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) zCtot = 40.6 * zchl**0.459 zze = 568.2 * zCtot**(-0.746) IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) zpsi = gdepw(ji,jj,jk,Kmm) / zze ! zlogc = LOG( zchl ) zlogc2 = zlogc * zlogc zlogc3 = zlogc * zlogc * zlogc zCb = 0.768 + 0.087 * zlogc - 0.179 * zlogc2 - 0.025 * zlogc3 zCmax = 0.299 - 0.289 * zlogc + 0.579 * zlogc2 zpsimax = 0.6 - 0.640 * zlogc + 0.021 * zlogc2 + 0.115 * zlogc3 zdelpsi = 0.710 + 0.159 * zlogc + 0.021 * zlogc2 zCze = 1.12 * (zchl)**0.803 ! zchl3d(ji,jj,jk) = zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) END DO ! END DO END DO ELSE !* constant chrlorophyll DO jk = 1, nksr + 1 zchl3d(:,:,jk) = 0.05 ENDDO ENDIF
These values are then used in a second 3D loop where Chlorophyll values are converted to attenuation coefficients (for R G and B) via indirect indexing to a prepared look-up table:
PART 2 ------- zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00 ze0(ji,jj,1) = rn_abs * qsr(ji,jj) ze1(ji,jj,1) = zcoef * qsr(ji,jj) ze2(ji,jj,1) = zcoef * qsr(ji,jj) ze3(ji,jj,1) = zcoef * qsr(ji,jj) zea(ji,jj,1) = qsr(ji,jj) END_2D ! DO jk = 2, nksr+1 !* interior equi-partition in R-G-B depending of vertical profile of Chl DO_2D_00_00 zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) zekb(ji,jj) = rkrgb(1,irgb) zekg(ji,jj) = rkrgb(2,irgb) zekr(ji,jj) = rkrgb(3,irgb) END_2D DO_2D_00_00 zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r ) zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) ze0(ji,jj,jk) = zc0 ze1(ji,jj,jk) = zc1 ze2(ji,jj,jk) = zc2 ze3(ji,jj,jk) = zc3 zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) END_2D END DO ! DO_3D_00_00( 1, nksr ) qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( zea(ji,jj,jk) - zea(ji,jj,jk+1) ) END_3D ! DEALLOCATE( zekb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d ) !
Part 2 has obvious inefficiencies because most of the temporary, full-depth arrays are not necessary since only two vertical levels are required at any one time. In fact even the zea array is unnecessary since the zchl3d array could be repurposed once its value has been used. Recalling the chlorophyll value and converting this to a index within the second loop also seems wasteful because this conversion could have been done before the chlorophyll value was stored in part 1 and the index stored instead.
Part 1 has even greater inefficiencies though since it repeats some computationally expensive calculations in a 3D loop even though much of it is 2D in nature. In fact the complexity of the calculations can be reduced further by operating in log-space.
After trying a few options and iterating at the preview stage (see earlier investigations below), the final solution chosen is given below. Its performance, compared with the original and a minimum memory option (see earlier investigations) is shown here:
where the information is taken from the timing.output report of a 64 timestep, ORCA2_ICE_PISCES test using the SETTE test harness. Rank refers to the position of the tra_qsr routine in the list of routines sorted by CPU time (most expensive first). A higher rank, therefore, indicates improved performance relative to the rest of the code. On top of this there is also reduction in memory use of at least 5*jpi*jpj*jpk.
The final code:
CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! ALLOCATE( ze0 (jpi,jpj) , ze1 (jpi,jpj) , & & ze2 (jpi,jpj) , ze3 (jpi,jpj) , & & ztmp3d(jpi,jpj,nksr + 1) ) ! IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step ! ! Separation in R-G-B depending on the surface Chl ! perform and store as many of the 2D calculations as possible ! before the 3D loop (use the temporary 2D arrays to replace the ! most expensive calculations) ! ze0(:,:) = LOG ( MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) ) ! ze0 = log(zchl) ze1(:,:) = 0.113328685307 + 0.803 * ze0(:,:) ! ze1 : log(zCze) = log (1.12 * zchl**0.803) ze2(:,:) = 3.703768066608 + 0.459 * ze0(:,:) ! ze2 : log(zCtot) = log(40.6 * zchl**0.459) ze3(:,:) = 6.34247346942 - 0.746 * ze2(:,:) ! ze3 : log(zze) = log(568.2 * zCtot**(-0.746)) WHERE( ze3 > 4.62497281328 ) ze3 = 5.298317366548 - 0.293 * ze2 ! IF( log(zze) > log(102.) ) log(zze) = ! ! log(200.0 * zCtot**(-0.293)) ze1(:,:) = EXP(ze1(:,:)) ! ze1 = zCze ze2(:,:) = 1._wp / ( 0.710 + ze0(:,:) * ( 0.159 + ze0(:,:) * 0.021 ) ) ! ze2 = 1/zdelpsi ze3(:,:) = 1._wp / EXP(ze3(:,:)) ! ze3 = 1/zze ! DO_3D_00_00 ( 1, nksr + 1 ) ! zchl = ALOG( ze0(ji,jj) ) zlogc = ze0(ji,jj) ! zCb = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) zCmax = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) zpsimax = 0.6 - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) ! zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) ! zCze = ze1(ji,jj) zrdpsi = ze2(ji,jj) ! 1/zdelpsi zpsi = ze3(ji,jj) * gdepw(ji,jj,jk,Kmm) ! gdepw/zze ! ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) * zrdpsi )**2 ) ) ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 END_3D ELSE !* constant chlorophyll zchl = 0.05 ! NB. make sure constant value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index zlui = 41 + 20.*LOG10(zchl) + 1.e-15 DO jk = 1, nksr + 1 ztmp3d(:,:,jk) = zlui END DO ENDIF ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00 ze0(ji,jj) = rn_abs * qsr(ji,jj) ze1(ji,jj) = zcoef * qsr(ji,jj) ze2(ji,jj) = zcoef * qsr(ji,jj) ze3(ji,jj) = zcoef * qsr(ji,jj) ! store the surface SW radiation; re-use the surface ztmp3d array ! since the surface attenuation coefficient is not used ztmp3d(ji,jj,1) = qsr(ji,jj) END_2D ! !* interior equi-partition in R-G-B depending on vertical profile of Chl DO_3D_00_00 ( 2, nksr + 1 ) ze3t = e3t(ji,jj,jk-1,Kmm) irgb = NINT( ztmp3d(ji,jj,jk) ) zc0 = ze0(ji,jj) * EXP( - ze3t * xsi0r ) zc1 = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) zc2 = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) zc3 = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) ze0(ji,jj) = zc0 ze1(ji,jj) = zc1 ze2(ji,jj) = zc2 ze3(ji,jj) = zc3 ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) END_3D ! DO_3D_00_00( 1, nksr ) qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) END_3D ! DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d )
where the difference from the trunk is:
-
traqsr.F90
109 109 REAL(wp) :: zchl, zcoef, z1_2 ! local scalars 110 110 REAL(wp) :: zc0 , zc1 , zc2 , zc3 ! - - 111 111 REAL(wp) :: zzc0, zzc1, zzc2, zzc3 ! - - 112 REAL(wp) :: zz0 , zz1 ! - - 113 REAL(wp) :: zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze 114 REAL(wp) :: zlogc, zlogc2, zlogc3 115 REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: zekb, zekg, zekr 116 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ze0, ze1, ze2, ze3, zea, ztrdt 117 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: zetot, zchl3d 112 REAL(wp) :: zz0 , zz1 , ze3t, zlui ! - - 113 REAL(wp) :: zCb, zCmax, zpsi, zpsimax, zrdpsi, zCze 114 REAL(wp) :: zlogc, zlogze, zlogCtot, zlogCze 115 REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: ze0, ze1, ze2, ze3 116 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d 118 117 !!---------------------------------------------------------------------- 119 118 ! 120 119 IF( ln_timing ) CALL timing_start('tra_qsr') … … 159 158 ! 160 159 CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! 161 160 ! 162 ALLOCATE( ze kb(jpi,jpj) , zekg(jpi,jpj) , zekr (jpi,jpj) ,&163 & ze 0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2 (jpi,jpj,jpk) ,&164 & z e3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk))161 ALLOCATE( ze0 (jpi,jpj) , ze1 (jpi,jpj) , & 162 & ze2 (jpi,jpj) , ze3 (jpi,jpj) , & 163 & ztmp3d(jpi,jpj,nksr + 1) ) 165 164 ! 166 165 IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll 167 166 CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step 167 ! 168 ! Separation in R-G-B depending on the surface Chl 169 ! perform and store as many of the 2D calculations as possible 170 ! before the 3D loop (use the temporary 2D arrays to replace the 171 ! most expensive calculations) 172 ! 173 ze0(:,:) = LOG ( MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) ) ! ze0 = log(zchl) 174 ze1(:,:) = 0.113328685307 + 0.803 * ze0(:,:) ! ze1 : log(zCze) = log (1.12 * zchl**0.803) 175 ze2(:,:) = 3.703768066608 + 0.459 * ze0(:,:) ! ze2 : log(zCtot) = log(40.6 * zchl**0.459) 176 ze3(:,:) = 6.34247346942 - 0.746 * ze2(:,:) ! ze3 : log(zze) = log(568.2 * zCtot**(-0.746)) 177 WHERE( ze3 > 4.62497281328 ) ze3 = 5.298317366548 - 0.293 * ze2 ! IF( log(zze) > log(102.) ) log(zze) = 178 ! ! log(200.0 * zCtot**(-0.293)) 179 ze1(:,:) = EXP(ze1(:,:)) ! ze1 = zCze 180 ze2(:,:) = 1._wp / ( 0.710 + ze0(:,:) * ( 0.159 + ze0(:,:) * 0.021 ) ) ! ze2 = 1/zdelpsi 181 ze3(:,:) = 1._wp / EXP(ze3(:,:)) ! ze3 = 1/zze 182 ! 183 DO_3D_00_00 ( 1, nksr + 1 ) 184 ! zchl = ALOG( ze0(ji,jj) ) 185 zlogc = ze0(ji,jj) 186 ! 187 zCb = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 188 zCmax = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) 189 zpsimax = 0.6 - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) 190 ! zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 191 ! 192 zCze = ze1(ji,jj) 193 zrdpsi = ze2(ji,jj) ! 1/zdelpsi 194 zpsi = ze3(ji,jj) * gdepw(ji,jj,jk,Kmm) ! gdepw/zze 195 ! 196 ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 197 zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) * zrdpsi )**2 ) ) ) ) 198 ! Convert chlorophyll value to attenuation coefficient look-up table index 199 ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 200 END_3D 201 ELSE !* constant chlorophyll 202 zchl = 0.05 203 ! NB. make sure constant value is such that: 204 zchl = MIN( 10. , MAX( 0.03, zchl ) ) 205 ! Convert chlorophyll value to attenuation coefficient look-up table index 206 zlui = 41 + 20.*LOG10(zchl) + 1.e-15 168 207 DO jk = 1, nksr + 1 169 DO jj = 2, jpjm1 ! Separation in R-G-B depending of the surface Chl 170 DO ji = 2, jpim1 171 zchl = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 172 zCtot = 40.6 * zchl**0.459 173 zze = 568.2 * zCtot**(-0.746) 174 IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 175 zpsi = gdepw(ji,jj,jk,Kmm) / zze 176 ! 177 zlogc = LOG( zchl ) 178 zlogc2 = zlogc * zlogc 179 zlogc3 = zlogc * zlogc * zlogc 180 zCb = 0.768 + 0.087 * zlogc - 0.179 * zlogc2 - 0.025 * zlogc3 181 zCmax = 0.299 - 0.289 * zlogc + 0.579 * zlogc2 182 zpsimax = 0.6 - 0.640 * zlogc + 0.021 * zlogc2 + 0.115 * zlogc3 183 zdelpsi = 0.710 + 0.159 * zlogc + 0.021 * zlogc2 184 zCze = 1.12 * (zchl)**0.803 185 ! 186 zchl3d(ji,jj,jk) = zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) 187 END DO 188 ! 189 END DO 208 ztmp3d(:,:,jk) = zlui 190 209 END DO 191 ELSE !* constant chrlorophyll192 DO jk = 1, nksr + 1193 zchl3d(:,:,jk) = 0.05194 ENDDO195 210 ENDIF 196 211 ! 197 212 zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B 198 213 DO_2D_00_00 199 ze0(ji,jj,1) = rn_abs * qsr(ji,jj) 200 ze1(ji,jj,1) = zcoef * qsr(ji,jj) 201 ze2(ji,jj,1) = zcoef * qsr(ji,jj) 202 ze3(ji,jj,1) = zcoef * qsr(ji,jj) 203 zea(ji,jj,1) = qsr(ji,jj) 214 ze0(ji,jj) = rn_abs * qsr(ji,jj) 215 ze1(ji,jj) = zcoef * qsr(ji,jj) 216 ze2(ji,jj) = zcoef * qsr(ji,jj) 217 ze3(ji,jj) = zcoef * qsr(ji,jj) 218 ! store the surface SW radiation; re-use the surface ztmp3d array 219 ! since the surface attenuation coefficient is not used 220 ztmp3d(ji,jj,1) = qsr(ji,jj) 204 221 END_2D 205 222 ! 206 DO jk = 2, nksr+1 !* interior equi-partition in R-G-B depending of vertical profile of Chl 207 DO_2D_00_00 208 zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) 209 irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) 210 zekb(ji,jj) = rkrgb(1,irgb) 211 zekg(ji,jj) = rkrgb(2,irgb) 212 zekr(ji,jj) = rkrgb(3,irgb) 213 END_2D 214 215 DO_2D_00_00 216 zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r ) 217 zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) 218 zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) 219 zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) 220 ze0(ji,jj,jk) = zc0 221 ze1(ji,jj,jk) = zc1 222 ze2(ji,jj,jk) = zc2 223 ze3(ji,jj,jk) = zc3 224 zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 225 END_2D 226 END DO 223 !* interior equi-partition in R-G-B depending on vertical profile of Chl 224 DO_3D_00_00 ( 2, nksr + 1 ) 225 ze3t = e3t(ji,jj,jk-1,Kmm) 226 irgb = NINT( ztmp3d(ji,jj,jk) ) 227 zc0 = ze0(ji,jj) * EXP( - ze3t * xsi0r ) 228 zc1 = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) 229 zc2 = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) 230 zc3 = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) 231 ze0(ji,jj) = zc0 232 ze1(ji,jj) = zc1 233 ze2(ji,jj) = zc2 234 ze3(ji,jj) = zc3 235 ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 236 END_3D 227 237 ! 228 238 DO_3D_00_00( 1, nksr ) 229 qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( z ea(ji,jj,jk) - zea(ji,jj,jk+1) )239 qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) 230 240 END_3D 231 241 ! 232 DEALLOCATE( ze kb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d )242 DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d ) 233 243 ! 234 244 CASE( np_2BD ) !== 2-bands fluxes ==! 235 245 !
Earlier Investigations
Option 1: Minmum memory usage
By rearranging the loop order and placing the vertical loop innermost then the code can be greatly simplified to an equivalent using minimal temporary storage. A couple of other changes seem sensible too, namely:
- rename the zchl3d array to ztmp3d (since it is now used for two purposes)
- only allocate ztmp3d to nksr+1; values below this are not used and nksr + 1 is likely << jpk
- calculate and store the attenuation coefficient look-up table index as soon as the sub-surface chlorophyll value is known. This keeps all LOG operations in one loop and, in the case of constant chlorophyll, removes the LOG from the loop altogether.
CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! ALLOCATE( ztmp3d(jpi,jpj,nksr + 1) ) ! ! code to set ztmp3d(:,:,1:nskr+1) ! including the following changes after IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll . DO_3D_00_00( 1, nksr + 1 ) . . no change until after zCze = 1.12 * (zchl)**0.803 ! ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 END_3D ELSE !* constant chrlorophyll zchl = 0.05 ! NB. make sure constant value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index zlui = 41 + 20.*LOG10(zchl) + 1.e-15 DO jk = 1, nksr + 1 ztmp3d(:,:,jk) = zlui END DO ENDIF ! ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B ! store the surface SW radiation; re-purpose the surface ztmp3d array ! since the surface attenuation coefficient is not used ztmp3d(:,:,1) = qsr(:,:) ! !* interior equi-partition in R-G-B depending of vertical profile of Chl DO_2D_00_00 zc0 = rn_abs * qsr(ji,jj) zc1 = zcoef * qsr(ji,jj) zc2 = zc1 zc3 = zc1 zc4 = e3t(ji,jj,1,Kmm) DO jk = 2, nksr+1 irgb = NINT( ztmp3d(ji,jj,jk) ) zc0 = zc0 * EXP( - zc4 * xsi0r ) zc1 = zc1 * EXP( - zc4 * rkrgb(1,irgb) ) zc2 = zc2 * EXP( - zc4 * rkrgb(2,irgb) ) zc3 = zc3 * EXP( - zc4 * rkrgb(3,irgb) ) zc4 = e3t(ji,jj,jk,Kmm) ! store the SW radiation penetrating to this location ! re-purpose the ztmp3d array since the attenuation coefficient ! at this point will not be needed again ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) END DO END_2D ! DO_3D_00_00( 1, nksr ) qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) END_3D ! DEALLOCATE( ztmp3d )
This is code and memory efficient but may perform poorly due to non-contiguous access to the array elements (see performance section below).
Option 2: Low memory use (retain loop order).
A compromise solution, which reduces memory use and should perform better is to remove all unnecessary full-depth arrays but maintain loop order by keeping a few 2D arrays. The same additional changes listed above are also made.
CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! ! ALLOCATE( ze0 (jpi,jpj) , ze1 (jpi,jpj) , & & ze2 (jpi,jpj) , ze3 (jpi,jpj) , & & ztmp3d(jpi,jpj,nksr + 1) ) ! ! code to set ztmp3d(:,:,1:nskr+1) ! including the following changes after IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll . DO_3D_00_00( 1, nksr + 1 ) . . no change until after zCze = 1.12 * (zchl)**0.803 ! ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 END_3D ELSE !* constant chlorophyll zchl = 0.05 ! NB. make sure constant value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) ! Convert chlorophyll value to attenuation coefficient look-up table index zlui = 41 + 20.*LOG10(zchl) + 1.e-15 DO jk = 1, nksr + 1 ztmp3d(:,:,jk) = zlui END DO ENDIF ! zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B DO_2D_00_00 ze0(ji,jj) = rn_abs * qsr(ji,jj) ze1(ji,jj) = zcoef * qsr(ji,jj) ze2(ji,jj) = zcoef * qsr(ji,jj) ze3(ji,jj) = zcoef * qsr(ji,jj) ! store the surface SW radiation; re-use the surface ztmp3d array ! since the surface attenuation coefficient is not used ztmp3d(ji,jj,1) = qsr(ji,jj) END_2D ! !* interior equi-partition in R-G-B depending of vertical profile of Chl DO_3D_00_00( 2, nksr+1 ) irgb = NINT( ztmp3d(ji,jj,jk) ) ze3t = e3t(ji,jj,jk-1,Kmm) ze0(ji,jj) = ze0(ji,jj) * EXP( - ze3t * xsi0r ) ze1(ji,jj) = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) ze2(ji,jj) = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) ze3(ji,jj) = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) ! store the SW radiation penetrating to this location ! re-use the ztmp3d array since the attenuation coefficient ! at this point will not be needed again ztmp3d(ji,jj,jk) = ( ze0(ji,jj) + ze1(ji,jj) + ze2(ji,jj) + ze3(ji,jj) ) * wmask(ji,jj,jk) END_3D ! DO_3D_00_00( 1, nksr ) qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) END_3D ! DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d )
Performance and evaluation
Both these options produce identical results to the original code (based on an ORCA2_ICE_PISCES test using SETTE (which includes variable surface chlorophyll inputs). ln_timing was activated and the CPU time (averaged across all processors) spent in tra_qsr used as a simple measure of performance. Unfortunately, variations in runtime between successive tests (even with the same code) on the NOC cluster were almost as great as any difference arising from algorithmic differences. Each test was repeated 6 times with the following results:
code option | CPU seconds spent in tra_qsr | Average | |||||
original code | 0.34 | 0.34 | 0.35 | 0.35 | 0.34 | 0.34 | 0.3433 |
minimum memory option | 0.36 | 0.36 | 0.37 | 0.36 | 0.36 | 0.37 | 0.3633 |
low memory option | 0.35 | 0.35 | 0.35 | 0.36 | 0.36 | 0.35 | 0.3533 |
from which the tentative conclusion is that the minimum memory option does perform consistently worst but the low memory option appears to be a suitable replacement to the original code. More stringent tests are require to confirm this.
These initial tests were performed using the standard 32 processor SETTE test for ORCA2_ICE_PISCES. To search for a better distinction between the options further tests were made by varying the number of processors. Tests with 2, 8, 32 and 60 processors were performed (3 for each option at each core count). The following table shows the percentage of CPU time spent in tra_qsr and the rank of the tra_qsr routine in the CPU time-sorted list of routines (a higher rank means tra_qsr is taking proportionally less of the overall CPU time). In each case the average of the 3 samples is given.:
% CPU spent in tra_qsr | |||
#CPUs | original | min-mem | low-mem |
2 | 1.76 | 1.82 | 1.83 |
8 | 1.38 | 1.48 | 1.46 |
32 | 0.48 | 0.49 | 0.5 |
60 | 0.24 | 0.26 | 0.26 |
Rank in sorted list of routines by CPU usage | |||
#CPUs | original | min-mem | low-mem |
2 | 14 | 12.67 | 12 |
8 | 16.33 | 15.67 | 15 |
32 | 22.33 | 21.33 | 23.33 |
60 | 26 | 25 | 25 |
Unfortunately the message is still mixed and, although the new codes reduce the size and number of temporary arrays; they have a slight detrimental effect on performance and do not alter the tendency for tra_qsr to consume higher percentages of the CPU time as domain size increases.
The low-memory option does, at least, lend itself to tiling.
Option 2 revisited
Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop. Importantly, there are also opportunities to avoid some expensive operations by performing some calculations in log space. Here are the mathematically equivalent alternatives:
-
traqsr.F90
old new 111 111 REAL(wp) :: zzc0, zzc1, zzc2, zzc3 ! - - 112 112 REAL(wp) :: zz0 , zz1 , ze3t, zlui ! - - 113 113 REAL(wp) :: zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze 114 REAL(wp) :: zlogc 114 REAL(wp) :: zlogc, zlogze, zlogCtot, zlogCze 115 115 REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: ze0, ze1, ze2, ze3 116 116 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d 117 117 !!---------------------------------------------------------------------- … … 165 165 IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll 166 166 CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step 167 167 ! Separation in R-G-B depending of the surface Chl 168 ze0(:,:) = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) 168 169 DO_3D_00_00 ( 1, nksr + 1 ) 169 zchl = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ))170 z Cze = 1.12 * zchl**0.803171 z Ctot = 40.6 * zchl**0.459172 zlog c = LOG( zchl)170 ! zchl = ze0(ji,jj) 171 zlogc = LOG( ze0(ji,jj) ) 172 zlogCze = 0.113328685307 + 0.803 * zlogc ! log(zCze = 1.12 * zchl**0.803) 173 zlogCtot= 3.703768066608 + 0.459 * zlogc ! log(zCtot = 40.6 * zchl**0.459) 173 174 ! 174 175 zCb = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 175 176 zCmax = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) 176 177 zpsimax = 0.6 - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) 177 178 zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 178 179 ! 179 zze = 568.2 * zCtot**(-0.746) 180 IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 181 zpsi = gdepw(ji,jj,jk,Kmm) / zze 180 zlogze = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) 181 IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot 182 ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) 183 zze = EXP( zlogze ) 184 zpsi = gdepw(ji,jj,jk,Kmm) / zze 185 zCze = EXP( zlogCze ) 182 186 ! 183 187 ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 184 188 zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) … … 207 211 ztmp3d(ji,jj,1) = qsr(ji,jj) 208 212 END_2D 209 213 ! 210 !* interior equi-partition in R-G-B depending o fvertical profile of Chl214 !* interior equi-partition in R-G-B depending on vertical profile of Chl 211 215 DO_3D_00_00 ( 2, nksr + 1 ) 212 216 ze3t = e3t(ji,jj,jk-1,Kmm) 213 217 irgb = NINT( ztmp3d(ji,jj,jk) )
Despite significant variation between identical runs on the NOC cluster there is evidence that this second version of the low-memory option improves on the original code and is certainly no worse whilst using less storage. Here are the tables with the new results added. Graphs are shown below the code differences.
% CPU spent in tra_qsr | ||||
#CPUs | original | min-mem | low-mem | low-men v2 |
2 | 1.76 | 1.82 | 1.83 | 1.19 |
8 | 1.38 | 1.48 | 1.46 | 0.56 |
32 | 0.48 | 0.49 | 0.5 | 0.3 |
60 | 0.24 | 0.26 | 0.26 | 0.08 |
Rank in sorted list of routines by CPU usage | ||||
#CPUs | original | min-mem | low-mem | low-men v2 |
2 | 14 | 12.67 | 12 | 18.33 |
8 | 16.33 | 15.67 | 15 | 21 |
32 | 22.33 | 21.33 | 23.33 | 27 |
60 | 26 | 25 | 25 | 30 |
Here is the final set of differences between this improved low-memory solution and the original traqsr.F90:
-
traqsr.F90
old new 109 109 REAL(wp) :: zchl, zcoef, z1_2 ! local scalars 110 110 REAL(wp) :: zc0 , zc1 , zc2 , zc3 ! - - 111 111 REAL(wp) :: zzc0, zzc1, zzc2, zzc3 ! - - 112 REAL(wp) :: zz0 , zz1 112 REAL(wp) :: zz0 , zz1 , ze3t, zlui ! - - 113 113 REAL(wp) :: zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze 114 REAL(wp) :: zlogc, zlogc2, zlogc3 115 REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: zekb, zekg, zekr 116 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ze0, ze1, ze2, ze3, zea, ztrdt 117 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: zetot, zchl3d 114 REAL(wp) :: zlogc, zlogze, zlogCtot, zlogCze 115 REAL(wp), ALLOCATABLE, DIMENSION(:,:) :: ze0, ze1, ze2, ze3 116 REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d 118 117 !!---------------------------------------------------------------------- 119 118 ! 120 119 IF( ln_timing ) CALL timing_start('tra_qsr') … … 159 158 ! 160 159 CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! 161 160 ! 162 ALLOCATE( ze kb(jpi,jpj) , zekg(jpi,jpj) , zekr (jpi,jpj) ,&163 & ze 0 (jpi,jpj,jpk) , ze1 (jpi,jpj,jpk) , ze2 (jpi,jpj,jpk) ,&164 & z e3 (jpi,jpj,jpk) , zea (jpi,jpj,jpk) , zchl3d(jpi,jpj,jpk))161 ALLOCATE( ze0 (jpi,jpj) , ze1 (jpi,jpj) , & 162 & ze2 (jpi,jpj) , ze3 (jpi,jpj) , & 163 & ztmp3d(jpi,jpj,nksr + 1) ) 165 164 ! 166 165 IF( nqsr == np_RGBc ) THEN !* Variable Chlorophyll 167 166 CALL fld_read( kt, 1, sf_chl ) ! Read Chl data and provides it at the current time step 167 ! Separation in R-G-B depending of the surface Chl 168 ze0(:,:) = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(:,:,1) ) ) 169 DO_3D_00_00 ( 1, nksr + 1 ) 170 ! zchl = ze0(ji,jj) 171 zlogc = LOG( ze0(ji,jj) ) 172 zlogCze = 0.113328685307 + 0.803 * zlogc ! log(zCze = 1.12 * zchl**0.803) 173 zlogCtot= 3.703768066608 + 0.459 * zlogc ! log(zCtot = 40.6 * zchl**0.459) 174 ! 175 zCb = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 176 zCmax = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) 177 zpsimax = 0.6 - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) 178 zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 179 ! 180 zlogze = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) 181 IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot 182 ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) 183 zze = EXP( zlogze ) 184 zpsi = gdepw(ji,jj,jk,Kmm) / zze 185 zCze = EXP( zlogCze ) 186 ! 187 ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 188 zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) 189 ! Convert chlorophyll value to attenuation coefficient look-up table index 190 ztmp3d(ji,jj,jk) = 41 + 20.*LOG10(zchl) + 1.e-15 191 END_3D 192 ELSE !* constant chlorophyll 193 zchl = 0.05 194 ! NB. make sure constant value is such that: 195 zchl = MIN( 10. , MAX( 0.03, zchl ) ) 196 ! Convert chlorophyll value to attenuation coefficient look-up table index 197 zlui = 41 + 20.*LOG10(zchl) + 1.e-15 168 198 DO jk = 1, nksr + 1 169 DO jj = 2, jpjm1 ! Separation in R-G-B depending of the surface Chl 170 DO ji = 2, jpim1 171 zchl = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 172 zCtot = 40.6 * zchl**0.459 173 zze = 568.2 * zCtot**(-0.746) 174 IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 175 zpsi = gdepw(ji,jj,jk,Kmm) / zze 176 ! 177 zlogc = LOG( zchl ) 178 zlogc2 = zlogc * zlogc 179 zlogc3 = zlogc * zlogc * zlogc 180 zCb = 0.768 + 0.087 * zlogc - 0.179 * zlogc2 - 0.025 * zlogc3 181 zCmax = 0.299 - 0.289 * zlogc + 0.579 * zlogc2 182 zpsimax = 0.6 - 0.640 * zlogc + 0.021 * zlogc2 + 0.115 * zlogc3 183 zdelpsi = 0.710 + 0.159 * zlogc + 0.021 * zlogc2 184 zCze = 1.12 * (zchl)**0.803 185 ! 186 zchl3d(ji,jj,jk) = zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) 187 END DO 188 ! 189 END DO 199 ztmp3d(:,:,jk) = zlui 190 200 END DO 191 ELSE !* constant chrlorophyll192 DO jk = 1, nksr + 1193 zchl3d(:,:,jk) = 0.05194 ENDDO195 201 ENDIF 196 202 ! 197 203 zcoef = ( 1. - rn_abs ) / 3._wp !* surface equi-partition in R-G-B 198 204 DO_2D_00_00 199 ze0(ji,jj,1) = rn_abs * qsr(ji,jj) 200 ze1(ji,jj,1) = zcoef * qsr(ji,jj) 201 ze2(ji,jj,1) = zcoef * qsr(ji,jj) 202 ze3(ji,jj,1) = zcoef * qsr(ji,jj) 203 zea(ji,jj,1) = qsr(ji,jj) 205 ze0(ji,jj) = rn_abs * qsr(ji,jj) 206 ze1(ji,jj) = zcoef * qsr(ji,jj) 207 ze2(ji,jj) = zcoef * qsr(ji,jj) 208 ze3(ji,jj) = zcoef * qsr(ji,jj) 209 ! store the surface SW radiation; re-use the surface ztmp3d array 210 ! since the surface attenuation coefficient is not used 211 ztmp3d(ji,jj,1) = qsr(ji,jj) 204 212 END_2D 205 213 ! 206 DO jk = 2, nksr+1 !* interior equi-partition in R-G-B depending of vertical profile of Chl 207 DO_2D_00_00 208 zchl = MIN( 10. , MAX( 0.03, zchl3d(ji,jj,jk) ) ) 209 irgb = NINT( 41 + 20.*LOG10(zchl) + 1.e-15 ) 210 zekb(ji,jj) = rkrgb(1,irgb) 211 zekg(ji,jj) = rkrgb(2,irgb) 212 zekr(ji,jj) = rkrgb(3,irgb) 213 END_2D 214 215 DO_2D_00_00 216 zc0 = ze0(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * xsi0r ) 217 zc1 = ze1(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekb(ji,jj) ) 218 zc2 = ze2(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekg(ji,jj) ) 219 zc3 = ze3(ji,jj,jk-1) * EXP( - e3t(ji,jj,jk-1,Kmm) * zekr(ji,jj) ) 220 ze0(ji,jj,jk) = zc0 221 ze1(ji,jj,jk) = zc1 222 ze2(ji,jj,jk) = zc2 223 ze3(ji,jj,jk) = zc3 224 zea(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 225 END_2D 226 END DO 214 !* interior equi-partition in R-G-B depending on vertical profile of Chl 215 DO_3D_00_00 ( 2, nksr + 1 ) 216 ze3t = e3t(ji,jj,jk-1,Kmm) 217 irgb = NINT( ztmp3d(ji,jj,jk) ) 218 zc0 = ze0(ji,jj) * EXP( - ze3t * xsi0r ) 219 zc1 = ze1(ji,jj) * EXP( - ze3t * rkrgb(1,irgb) ) 220 zc2 = ze2(ji,jj) * EXP( - ze3t * rkrgb(2,irgb) ) 221 zc3 = ze3(ji,jj) * EXP( - ze3t * rkrgb(3,irgb) ) 222 ze0(ji,jj) = zc0 223 ze1(ji,jj) = zc1 224 ze2(ji,jj) = zc2 225 ze3(ji,jj) = zc3 226 ztmp3d(ji,jj,jk) = ( zc0 + zc1 + zc2 + zc3 ) * wmask(ji,jj,jk) 227 END_3D 227 228 ! 228 229 DO_3D_00_00( 1, nksr ) 229 qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( z ea(ji,jj,jk) - zea(ji,jj,jk+1) )230 qsr_hc(ji,jj,jk) = r1_rho0_rcp * ( ztmp3d(ji,jj,jk) - ztmp3d(ji,jj,jk+1) ) 230 231 END_3D 231 232 ! 232 DEALLOCATE( ze kb , zekg , zekr , ze0 , ze1 , ze2 , ze3 , zea , zchl3d )233 DEALLOCATE( ze0 , ze1 , ze2 , ze3 , ztmp3d ) 233 234 ! 234 235 CASE( np_2BD ) !== 2-bands fluxes ==! 235 236 !
Documentation updates
...
Preview
...
Tests
...
Review
...
Attachments (13)
- traqsr_lomem.F90 (21.4 KB) - added by acc 4 years ago.
- traqsr_trunk.F90 (21.3 KB) - added by acc 4 years ago.
- traqsr_minmem.F90 (21.0 KB) - added by acc 4 years ago.
- percent_cpu_qsr.png (91.6 KB) - added by acc 4 years ago.
- rankqsr.png (83.8 KB) - added by acc 4 years ago.
- rankqsr.2.png (83.8 KB) - added by acc 4 years ago.
- percent_cpu_qsr.2.png (91.6 KB) - added by acc 4 years ago.
- rankqsr.3.png (77.8 KB) - added by acc 4 years ago.
- percent_cpu_qsr.3.png (91.3 KB) - added by acc 4 years ago.
- traqsr_lomem_v2.F90 (21.5 KB) - added by acc 4 years ago.
- rankqsr.4.png (75.5 KB) - added by acc 4 years ago.
- percent_cpu_qsr.4.png (88.2 KB) - added by acc 4 years ago.
- eORCA025_traqsr.png (98.0 KB) - added by acc 4 years ago.
Download all attachments as: .zip