New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2020WP/ENHANCE-10_acc_fix_traqsr (diff) – NEMO

Changes between Version 9 and Version 10 of 2020WP/ENHANCE-10_acc_fix_traqsr


Ignore:
Timestamp:
2020-05-15T17:38:55+02:00 (4 years ago)
Author:
acc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2020WP/ENHANCE-10_acc_fix_traqsr

    v9 v10  
    253253== Option 2 revisited == 
    254254 
    255 Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop. Here is the final set of differences between this improved low-memory solution and the original traqsr.F90: 
     255Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop.  
     256 
     257Although significant variation between identical runs on the NOC cluster means the evidence is not conclusive; this second version of the low-memory option does appear to improve on the original code and is certainly no worse whilst using less storage. Here are the tables with the new results added. Graphs are shown below the code differences.  
     258 
     259||||||||||  '''% CPU spent in tra_qsr''' || 
     260|| #CPUs || original || min-mem || low-mem || low-men v2 || 
     261|| 2 || 1.76 || 1.82 || 1.83 || 1.68 || 
     262|| 8 || 1.38 || 1.48 || 1.46 || 1.14 || 
     263|| 32 || 0.48 || 0.49 || 0.5 || 0.44 || 
     264|| 60 || 0.24 || 0.26 || 0.26 || 0.13 || 
     265 
     266 
     267\\ 
     268||||||||||  '''Rank in sorted list of routines by CPU usage ''' || 
     269|| #CPUs || original || min-mem || low-mem || low-men v2 || 
     270|| 2 || 14 || 12.67 || 12 || 14 || 
     271|| 8 || 16.33 || 15.67 || 15 || 17.33 || 
     272|| 32 || 22.33 || 21.33 || 23.33 || 23 || 
     273|| 60 || 26 || 25 || 25 || 26 || 
     274 
     275Here is the final set of differences between this improved low-memory solution and the original traqsr.F90: 
    256276 
    257277{{{#!diff 
     
    291311+            DO_3D_00_00 ( 1, nksr + 1 ) 
    292312+               zchl    = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 
    293 +               zCze    = 1.12  * (zchl)**0.803 
     313+               zCze    = 1.12  * zchl**0.803 
    294314+               zCtot   = 40.6  * zchl**0.459 
    295315+               zlogc   = LOG( zchl )