Changes between Version 9 and Version 10 of 2020WP/ENHANCE-10_acc_fix_traqsr
- Timestamp:
- 2020-05-15T17:38:55+02:00 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
2020WP/ENHANCE-10_acc_fix_traqsr
v9 v10 253 253 == Option 2 revisited == 254 254 255 Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop. Here is the final set of differences between this improved low-memory solution and the original traqsr.F90: 255 Following discussions with the previewer, it was decided that low-memory option should be the best approach but the slight deterioration in performance over the original code may be down to the over-zealous replacement of temporary scalars within the second 3D loop. On reflection there are also opportunities to reduce the number of floating point operations and load and store instructions within the first 3D loop. 256 257 Although significant variation between identical runs on the NOC cluster means the evidence is not conclusive; this second version of the low-memory option does appear to improve on the original code and is certainly no worse whilst using less storage. Here are the tables with the new results added. Graphs are shown below the code differences. 258 259 |||||||||| '''% CPU spent in tra_qsr''' || 260 || #CPUs || original || min-mem || low-mem || low-men v2 || 261 || 2 || 1.76 || 1.82 || 1.83 || 1.68 || 262 || 8 || 1.38 || 1.48 || 1.46 || 1.14 || 263 || 32 || 0.48 || 0.49 || 0.5 || 0.44 || 264 || 60 || 0.24 || 0.26 || 0.26 || 0.13 || 265 266 267 \\ 268 |||||||||| '''Rank in sorted list of routines by CPU usage ''' || 269 || #CPUs || original || min-mem || low-mem || low-men v2 || 270 || 2 || 14 || 12.67 || 12 || 14 || 271 || 8 || 16.33 || 15.67 || 15 || 17.33 || 272 || 32 || 22.33 || 21.33 || 23.33 || 23 || 273 || 60 || 26 || 25 || 25 || 26 || 274 275 Here is the final set of differences between this improved low-memory solution and the original traqsr.F90: 256 276 257 277 {{{#!diff … … 291 311 + DO_3D_00_00 ( 1, nksr + 1 ) 292 312 + zchl = MIN( 10. , MAX( 0.03, sf_chl(1)%fnow(ji,jj,1) ) ) 293 + zCze = 1.12 * (zchl)**0.803313 + zCze = 1.12 * zchl**0.803 294 314 + zCtot = 40.6 * zchl**0.459 295 315 + zlogc = LOG( zchl )