New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
ticket/1821/BranchTesting – NEMO
wiki:ticket/1821/BranchTesting

Version 2 (modified by frrh, 7 years ago) (diff)

--

Setting up and testing optimisations for #1821

Branch Creation

Here's how I created the branch:

  • Stripped out svn keywords. (I'm surprised we need to do this at all since we're creating a branch of a branch which has already been stripped - there are only a subset of files affected by this too... mostly AGRIF related)
  • Merge in the optimisations from our MEDUSA based branch and add further changes in the same style which should be of benefit to GO6 more generally. Revisions r7581:r7602 show the actual code changes made.

Testing

Initial testing in a copy of the GO6 standard job.

  • Took copy of standard suite u-ah494/trunk@29996 to u-aj380Optim to act as control run
  • Created working copy of this to act as test run. u-aj380Optim
  • Replaced branches/UKMO/dev_r5518_GO6_package@7573 in working copy with branches/UKMO/dev_r5518_optim_GO6_alloc@7602
  • Ran both jobs for a 10 day NRUN with NEMO timings activated. Repeated both runs to try to iron out any random variations in run time on the XC40.
  • Checking solver.stat at 10 days, we have bit comparison. (I've not done a rigorous comparison of restart files)
  • Comparing NEMO timer output, it seems that there are indications that the optimised run may be ~1-2% faster overall, looking at total elapsed and total CPU, though these differences lie well within the noise of variabilities in run time which can be 10% or more.
  • Here's some output;
  • Control run:
    Elapsed Time (s)  CPU Time (s)
           593483.386   584478.452
     
    Averaged timing on all processors :
    -----------------------------------
    Section             Elap. Time(s)  Elap. Time(%)  CPU Time(s)  CPU Time(%)  CPU/Elap Max elap(%)  Min elap(%)  Freq
    sbc_ice_cice        0.1751989E+03   14.17           174.41      14.32        1.00         14.50     13.76     640.00
    tra_adv_tvd         0.7552396E+02    6.11            75.33       6.19        1.00          7.42      5.27     640.00
    tra_nxt             0.6302550E+02    5.10            62.89       5.17        1.00          7.65      2.14     640.00
    tra_ldf_iso         0.5789208E+02    4.68            57.77       4.74        1.00          5.08      4.06     640.00
    sol_pcg             0.5625327E+02    4.55            56.08       4.61        1.00          4.55      4.55     640.00
    dia_wri             0.4800178E+02    3.88            47.84       3.93        1.00          9.16      0.04     640.00
    tra_bbc             0.4565193E+02    3.69            45.39       3.73        0.99         11.38      0.09     640.00
    nonosc              0.4531063E+02    3.66            45.23       3.71        1.00          4.11      3.44    1280.00
    zps_hde             0.3959416E+02    3.20            39.35       3.23        0.99          7.05      0.08    1281.00
    ldf_slp             0.3823219E+02    3.09            38.15       3.13        1.00          3.25      2.98     640.00
    
    
  • Test Run:
    Total timing (sum) :
     --------------------
    Elapsed Time (s)  CPU Time (s)
           583015.673   573647.291
     
    Averaged timing on all processors :
    -----------------------------------
    Section             Elap. Time(s)  Elap. Time(%)  CPU Time(s)  CPU Time(%)  CPU/Elap Max elap(%)  Min elap(%)  Freq
    sbc_ice_cice        0.1728438E+03   14.23           172.01      14.39        1.00         14.56     13.80     640.00
    tra_adv_tvd         0.7375908E+02    6.07            73.52       6.15        1.00          7.29      5.22     640.00
    tra_nxt             0.6276407E+02    5.17            62.63       5.24        1.00          7.68      2.33     640.00
    tra_ldf_iso         0.5655850E+02    4.66            56.48       4.73        1.00          4.86      3.43     640.00
    sol_pcg             0.4990999E+02    4.11            49.77       4.16        1.00          4.11      4.11     640.00
    dia_wri             0.4792860E+02    3.95            47.80       4.00        1.00          9.32      0.05     640.00
    tra_bbc             0.4585938E+02    3.78            45.57       3.81        0.99         11.72      0.10     640.00
    zps_hde             0.3905950E+02    3.22            38.80       3.25        0.99          7.08      0.09    1281.00
    ldf_slp             0.3817614E+02    3.14            38.10       3.19        1.00          3.30      3.03     640.00
    nonosc              0.3657865E+02    3.01            36.47       3.05        1.00          3.48      2.77    1280.00
    
    
    

We should take direct comparisons with a pinch of salt since these vary from run to run, but we do seem to see a consistent reduction in the elapsed time of sol_pcg and possibly tra_ldf_iso.

There's certainly no suggestion of adverse effects in any of these tests.

So as far as GO6 is concerned it would seem viable.