Changes between Version 4 and Version 5 of 2020WP/ENHANCE-10_acc_fix_traqsr
- Timestamp:
- 2020-05-14T16:24:51+02:00 (5 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
2020WP/ENHANCE-10_acc_fix_traqsr
v4 v5 30 30 The current code is structured thus: 31 31 32 {{{ 32 {{{#!f 33 33 CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! 34 34 ! … … 84 84 * rename the zchl3d array to ztmp3d (since it is now used for two purposes) 85 85 * only allocate ztmp3d to nksr+1; values below this are not used and nksr + 1 is likely << jpk 86 * calculate and store the attenuation coefficient look-up table index as soon as the sub-surface chlorophyll value is known. This keeps all LOG operations in one loop .87 88 {{{ 86 * calculate and store the attenuation coefficient look-up table index as soon as the sub-surface chlorophyll value is known. This keeps all LOG operations in one loop and, in the case of constant chlorophyll, removes the LOG from the loop altogether. 87 88 {{{#!f 89 89 CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! 90 90 ! … … 154 154 155 155 === Option 2: Low memory use (retain loop order). 156 A compromise solution, which reduces memory use and should perform better is to remove all unnecessary full-depth arrays but maintain loop order by keeping a few 2D arrays. 157 {{{ 156 A compromise solution, which reduces memory use and should perform better is to remove all unnecessary full-depth arrays but maintain loop order by keeping a few 2D arrays. The same additional changes listed above are also made. 157 {{{#!f 158 158 CASE( np_RGB , np_RGBc ) !== R-G-B fluxes ==! 159 159 ! … … 222 222 Both these options produce identical results to the original code (based on an ORCA2_ICE_PISCES test using SETTE (which includes variable surface chlorophyll inputs). ln_timing was activated and the CPU time (averaged across all processors) spent in tra_qsr used as a simple measure of performance. Unfortunately, variations in runtime between successive tests (even with the same code) on the NOC cluster were almost as great as any difference arising from algorithmic differences. Each test was repeated 6 times with the following results: 223 223 224 || code option || 225 || original code || 0.34 || 0.34 || 0.35 || 0.35 || 0.34 || 0.34 || 226 || minimum memory option || 0.36 || 0.36 || 0.37 || 0.36 || 0.36 || 0.37 || 227 || low memory option || 0.35 || 0.35 || 0.35 || 0.36 || 0.36 || 0.35 || 224 || code option |||||||||||| CPU seconds spent in tra_qsr || Average || 225 || original code || 0.34 || 0.34 || 0.35 || 0.35 || 0.34 || 0.34 || '''0.3433''' || 226 || minimum memory option || 0.36 || 0.36 || 0.37 || 0.36 || 0.36 || 0.37 || '''0.3633 ''' || 227 || low memory option || 0.35 || 0.35 || 0.35 || 0.36 || 0.36 || 0.35 || ''' 0.3533 ''' || 228 228 229 229 from which the tentative conclusion is that the minimum memory option does perform consistently worst but the low memory option appears to be a suitable replacement to the original code. More stringent tests are require to confirm this. 230 230 231 These initial tests were performed using the standard 32 processor SETTE test for ORCA2_ICE_PISCES. To search for a better distinction between the options further tests were made by varying the number of processors. Tests with 2, 8, 32 and 60 processors were performed (3 for each option at each core count). The following table shows the percentage of CPU time spent in tra_qsr and the rank of the tra_qsr routine in the CPU time-sorted list of routines (a higher rank means tra_qsr is taking proportionally less of the overall CPU time). In each case the average of the 3 samples is given.: 232 233 |||||||| '''% CPU spent in tra_qsr''' || 234 || #CPUs || original || min-mem || low-mem || 235 || 2 || 1.76 || 1.82 || 1.83 || 236 || 8 || 1.38 || 1.48 || 1.46 || 237 || 32 || 0.48 || 0.49 || 0.5 || 238 || 60 || 0.24 || 0.26 || 0.26 || 239 240 241 \\ 242 |||||||| '''Rank in sorted list of routines by CPU usage ''' || 243 || #CPUs || original || min-mem || low-mem || 244 || 2 || 14 || 12.67 || 12 || 245 || 8 || 16.33 || 15.67 || 15 || 246 || 32 || 22.33 || 21.33 || 23.33 || 247 || 60 || 26 || 25 || 25 || 248 249 Unfortunately the message is still mixed 250 251 [[Image()]] 231 252 ''...'' 232 253