New URL for NEMO forge! http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.

2020WP/ENHANCE-10_acc_fix_traqsr (diff) – NEMO

Context Navigation

Changes between Version 4 and Version 5 of 2020WP/ENHANCE-10_acc_fix_traqsr

Timestamp:: 2020-05-14T16:24:51+02:00 (4 years ago)
Author:: acc
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

2020WP/ENHANCE-10_acc_fix_traqsr

-                      v4
+                      v5
 The current code is structured thus:
 {{{
+{{{#!f
       CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==!
+         !
 …
  * rename the zchl3d array to ztmp3d (since it is now used for two purposes)
  * only allocate ztmp3d to nksr+1; values below this are not used and nksr + 1 is likely << jpk
  * calculate and store the attenuation coefficient look-up table index as soon as the sub-surface chlorophyll value is known. This keeps all LOG operations in one loop.
 {{{
+ * calculate and store the attenuation coefficient look-up table index as soon as the sub-surface chlorophyll value is known. This keeps all LOG operations in one loop and, in the case of constant chlorophyll, removes the LOG from the loop altogether.
+{{{#!f
       CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==!
+         !
 …
 === Option 2: Low memory use (retain loop order).
 A compromise solution, which reduces memory use and should perform better is to remove all unnecessary full-depth arrays but maintain loop order by keeping a few 2D arrays.
 {{{
+A compromise solution, which reduces memory use and should perform better is to remove all unnecessary full-depth arrays but maintain loop order by keeping a few 2D arrays. The same additional changes listed above are also made.
+{{{#!f
        CASE( np_RGB , np_RGBc )         !==  R-G-B fluxes  ==!
+         !
 …
 Both these options produce identical results to the original code (based on an ORCA2_ICE_PISCES test using SETTE (which includes variable surface chlorophyll inputs). ln_timing was activated and the CPU time (averaged across all processors) spent in tra_qsr used as a simple measure of performance. Unfortunately, variations in runtime between successive tests (even with the same code) on the NOC cluster were almost as great as any difference arising from algorithmic differences. Each test was repeated 6 times with the following results:
 || code option    ||
 || original code  ||  0.34      ||  0.34        ||  0.35        ||  0.35        ||  0.34        ||  0.34  ||
 ||  minimum memory option ||  0.36      ||  0.36        ||  0.37        ||  0.36        ||  0.36        ||  0.37  ||
 || low memory option ||  0.35   ||  0.35        ||  0.35        ||  0.36        ||  0.36        ||  0.35  ||
+|| code option    |||||||||||| CPU seconds spent in tra_qsr || Average ||
+|| original code  ||  0.34      ||  0.34        ||  0.35        ||  0.35        ||  0.34        ||  0.34  || '''0.3433''' ||
+||  minimum memory option ||  0.36      ||  0.36        ||  0.37        ||  0.36        ||  0.36        ||  0.37  || '''0.3633 ''' ||
+|| low memory option ||  0.35   ||  0.35        ||  0.35        ||  0.36        ||  0.36        ||  0.35  || ''' 0.3533 ''' ||
 from which the tentative conclusion is that the minimum memory option does perform consistently worst but the low memory option appears to be a suitable replacement to the original code. More stringent tests are require to confirm this.
+These initial tests were performed using the standard 32 processor SETTE test for ORCA2_ICE_PISCES. To search for a  better distinction between the options further tests were made by varying the number of processors. Tests with 2, 8, 32 and 60 processors were performed (3 for each option at each core count). The following table shows the percentage of CPU time spent in tra_qsr and the rank of the tra_qsr routine in the CPU time-sorted list of routines (a higher rank means tra_qsr is taking proportionally less of the overall CPU time). In each case the average of the 3 samples is given.:
+||||||||  '''% CPU spent in tra_qsr''' ||
+|| #CPUs || original || min-mem || low-mem ||
+|| 2 || 1.76 || 1.82 || 1.83 ||
+|| 8 || 1.38 || 1.48 || 1.46 ||
+|| 32 || 0.48 || 0.49 || 0.5 ||
+|| 60 || 0.24 || 0.26 || 0.26 ||
+\\
+||||||||  '''Rank in sorted list of routines by CPU usage ''' ||
+|| #CPUs || original || min-mem || low-mem ||
+|| 2 || 14 || 12.67 || 12 ||
+|| 8 || 16.33 || 15.67 || 15 ||
+|| 32 || 22.33 || 21.33 || 23.33 ||
+|| 60 || 26 || 25 || 25 ||
+Unfortunately the message is still mixed
+[[Image()]]
 ''...''