Changes between Version 11 and Version 12 of 2020WP/ENHANCE-10_acc_fix_traqsr


Ignore:
Timestamp:
2020-05-19T11:51:35+02:00 (4 months ago)
Author:
acc
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • 2020WP/ENHANCE-10_acc_fix_traqsr

    v11 v12  
    253253== Option 2 revisited == 
    254254 
    255 Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop.  
    256  
    257 Although significant variation between identical runs on the NOC cluster means the evidence is not conclusive; this second version of the low-memory option does appear to improve on the original code and is certainly no worse whilst using less storage. Here are the tables with the new results added. Graphs are shown below the code differences.  
     255Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop. Importantly, there are also opportunities to avoid some expensive operations by performing some calculations in log space.  Here are the mathematically equivalent alternatives: 
     256 
     257{{{#!diff 
     258--- traqsr.F90  2020-05-19 10:28:06.858457146 +0100 
     259+++ LOMEM3/traqsr.F90   2020-05-15 15:44:11.652736539 +0100 
     260@@ -111,7 +111,6 @@ 
     261       REAL(wp) ::   zzc0, zzc1, zzc2, zzc3   !    -         - 
     262       REAL(wp) ::   zz0 , zz1 , ze3t, zlui   !    -         - 
     263       REAL(wp) ::   zCb, zCmax, zze, zpsi, zpsimax, zdelpsi, zCtot, zCze 
     264-      REAL(wp) ::   zlogze, zlogCtot, zlogCze 
     265       REAL(wp) ::   zlogc 
     266       REAL(wp), ALLOCATABLE, DIMENSION(:,:)   :: ze0, ze1, ze2, ze3 
     267       REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ztrdt, zetot, ztmp3d 
     268@@ -168,21 +167,18 @@ 
     269             ! Separation in R-G-B depending of the surface Chl 
     270             DO_3D_00_00 ( 1, nksr + 1 ) 
     271                zchl    = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 
     272+               zCze    = 1.12  * zchl**0.803 
     273+               zCtot   = 40.6  * zchl**0.459 
     274                zlogc   = LOG( zchl ) 
     275-               zlogCze = 0.113328685307 + 0.803 * zlogc   ! log(zCze  = 1.12  * zchl**0.803) 
     276-               zlogCtot= 3.703768066608 + 0.459 * zlogc   ! log(zCtot = 40.6  * zchl**0.459) 
     277                ! 
     278                zCb     = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 
     279                zCmax   = 0.299 - zlogc * ( 0.289 - zlogc * 0.579 ) 
     280                zpsimax = 0.6   - zlogc * ( 0.640 - zlogc * ( 0.021 + zlogc * 0.115 ) ) 
     281                zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 
     282                ! 
     283-               zlogze  = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) 
     284-               IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot 
     285-                                                          ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) 
     286-               zze  = EXP( zlogze ) 
     287-               zpsi = gdepw(ji,jj,jk,Kmm) / zze 
     288-               zCze = EXP( zlogCze ) 
     289+               zze     = 568.2 * zCtot**(-0.746) 
     290+               IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 
     291+               zpsi    = gdepw(ji,jj,jk,Kmm) / zze 
     292                ! 
     293                ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 
     294                zchl = MIN( 10. , MAX( 0.03, zCze * ( zCb + zCmax * EXP( -( (zpsi - zpsimax) / zdelpsi )**2 ) ) ) ) 
     295}}} 
     296 
     297Despite significant variation between identical runs on the NOC cluster there is evidence that this second version of the low-memory option improves on the original code and is certainly no worse whilst using less storage. Here are the tables with the new results added. Graphs are shown below the code differences.  
    258298 
    259299||||||||||  '''% CPU spent in tra_qsr''' || 
    260300|| #CPUs || original || min-mem || low-mem || low-men v2 || 
    261 || 2 || 1.76 || 1.82 || 1.83 || 1.68 || 
    262 || 8 || 1.38 || 1.48 || 1.46 || 1.14 || 
    263 || 32 || 0.48 || 0.49 || 0.5 || 0.44 || 
    264 || 60 || 0.24 || 0.26 || 0.26 || 0.13 || 
     301|| 2 || 1.76 || 1.82 || 1.83 || 1.19 || 
     302|| 8 || 1.38 || 1.48 || 1.46 || 0.56 || 
     303|| 32 || 0.48 || 0.49 || 0.5 || 0.3 || 
     304|| 60 || 0.24 || 0.26 || 0.26 || 0.08 || 
    265305 
    266306 
     
    268308||||||||||  '''Rank in sorted list of routines by CPU usage ''' || 
    269309|| #CPUs || original || min-mem || low-mem || low-men v2 || 
    270 || 2 || 14 || 12.67 || 12 || 14 || 
    271 || 8 || 16.33 || 15.67 || 15 || 17.33 || 
    272 || 32 || 22.33 || 21.33 || 23.33 || 23 || 
    273 || 60 || 26 || 25 || 25 || 26 || 
     310|| 2 || 14 || 12.67 || 12 || 18.33 || 
     311|| 8 || 16.33 || 15.67 || 15 || 21|| 
     312|| 32 || 22.33 || 21.33 || 23.33 || 27 || 
     313|| 60 || 26 || 25 || 25 || 30 || 
    274314 
    275315Here is the final set of differences between this improved low-memory solution and the original traqsr.F90: 
     
    277317{{{#!diff 
    278318--- ORG/traqsr.F90      2020-05-13 11:37:57.094258396 +0100 
    279 +++ traqsr.F90  2020-05-15 14:48:00.138206859 +0100 
    280 @@ -109,12 +109,11 @@ 
     319+++ traqsr.F90  2020-05-19 10:28:06.858457146 +0100 
     320@@ -109,12 +109,12 @@ 
    281321       REAL(wp) ::   zchl, zcoef, z1_2        ! local scalars 
    282322       REAL(wp) ::   zc0 , zc1 , zc2 , zc3    !    -         - 
     
    289329-      REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: ze0, ze1, ze2, ze3, zea, ztrdt 
    290330-      REAL(wp), ALLOCATABLE, DIMENSION(:,:,:) :: zetot, zchl3d 
     331+      REAL(wp) ::   zlogze, zlogCtot, zlogCze 
    291332+      REAL(wp) ::   zlogc 
    292333+      REAL(wp), ALLOCATABLE, DIMENSION(:,:)   :: ze0, ze1, ze2, ze3 
     
    295336       ! 
    296337       IF( ln_timing )   CALL timing_start('tra_qsr') 
    297 @@ -159,77 +158,75 @@ 
     338@@ -159,77 +159,78 @@ 
    298339          ! 
    299340       CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==! 
     
    311352+            DO_3D_00_00 ( 1, nksr + 1 ) 
    312353+               zchl    = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 
    313 +               zCze    = 1.12  * zchl**0.803 
    314 +               zCtot   = 40.6  * zchl**0.459 
    315354+               zlogc   = LOG( zchl ) 
     355+               zlogCze = 0.113328685307 + 0.803 * zlogc   ! log(zCze  = 1.12  * zchl**0.803) 
     356+               zlogCtot= 3.703768066608 + 0.459 * zlogc   ! log(zCtot = 40.6  * zchl**0.459) 
    316357+               ! 
    317358+               zCb     = 0.768 + zlogc * ( 0.087 - zlogc * ( 0.179 + zlogc * 0.025 ) ) 
     
    320361+               zdelpsi = 0.710 + zlogc * ( 0.159 + zlogc * 0.021 ) 
    321362+               ! 
    322 +               zze     = 568.2 * zCtot**(-0.746) 
    323 +               IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293) 
    324 +               zpsi    = gdepw(ji,jj,jk,Kmm) / zze 
     363+               zlogze  = 6.34247346942 - 0.746 * zlogCtot ! log(zze = 568.2 * zCtot**(-0.746)) 
     364+               IF( zlogze > 4.62497281328 ) zlogze = 5.298317366548 - 0.293 * zlogCtot 
     365+                                                          ! log(IF( zze > 102. ) zze = 200.0 * zCtot**(-0.293)) 
     366+               zze  = EXP( zlogze ) 
     367+               zpsi = gdepw(ji,jj,jk,Kmm) / zze 
     368+               zCze = EXP( zlogCze ) 
    325369+               ! 
    326370+               ! NB. make sure zchl value is such that: zchl = MIN( 10. , MAX( 0.03, zchl ) ) 
     
    426470          ! 
    427471       CASE( np_2BD  )            !==  2-bands fluxes  ==! 
    428           ! 
    429472}}} 
    430473[[Image(percent_cpu_qsr.2.png)]]