Setting up and testing optimisations for #1821 == Branch Creation == '''Here's how I created the branch:''' * Created branch [log:branches/UKMO/dev_r5518_optim_GO6_alloc branches/UKMO/dev_r5518_optim_GO6_alloc] from head of package branch (r7573 at time of writing). * Stripped out svn keywords. (I'm surprised we need to do this at all since we're creating a branch of a branch which has already been stripped - there are only a subset of files affected by this too... mostly AGRIF related) * Merge in the optimisations from our MEDUSA based branch and add further changes in the same style which should be of benefit to GO6 more generally. Revisions r7581:r7602 show the actual code changes made. == Testing == '''Initial testing in a copy of the GO6 standard job.''' * Took copy of standard suite u-ah494/trunk@29996 to u-aj380Optim to act as control run * Created working copy of this to act as test run. u-aj380Optim * Replaced branches/UKMO/dev_r5518_GO6_package@7573 in working copy with branches/UKMO/dev_r5518_optim_GO6_alloc@7602 * Ran both jobs for a 10 day NRUN with NEMO timings activated. Repeated both runs to try to iron out any random variations in run time on the XC40. * Checking solver.stat at 10 days, we have bit comparison. (I've not done a rigorous comparison of restart files) * Comparing NEMO timer output, it seems that there are indications that the optimised run may be ~1-2% faster overall, looking at total elapsed and total CPU, though these differences lie well within the noise of variabilities in run time which can be 10% or more. * Here's some output; * Control run: {{{ Elapsed Time (s) CPU Time (s) 593483.386 584478.452 Averaged timing on all processors : ----------------------------------- Section Elap. Time(s) Elap. Time(%) CPU Time(s) CPU Time(%) CPU/Elap Max elap(%) Min elap(%) Freq sbc_ice_cice 0.1751989E+03 14.17 174.41 14.32 1.00 14.50 13.76 640.00 tra_adv_tvd 0.7552396E+02 6.11 75.33 6.19 1.00 7.42 5.27 640.00 tra_nxt 0.6302550E+02 5.10 62.89 5.17 1.00 7.65 2.14 640.00 tra_ldf_iso 0.5789208E+02 4.68 57.77 4.74 1.00 5.08 4.06 640.00 sol_pcg 0.5625327E+02 4.55 56.08 4.61 1.00 4.55 4.55 640.00 dia_wri 0.4800178E+02 3.88 47.84 3.93 1.00 9.16 0.04 640.00 tra_bbc 0.4565193E+02 3.69 45.39 3.73 0.99 11.38 0.09 640.00 nonosc 0.4531063E+02 3.66 45.23 3.71 1.00 4.11 3.44 1280.00 zps_hde 0.3959416E+02 3.20 39.35 3.23 0.99 7.05 0.08 1281.00 ldf_slp 0.3823219E+02 3.09 38.15 3.13 1.00 3.25 2.98 640.00 }}} * Test Run: {{{ Total timing (sum) : -------------------- Elapsed Time (s) CPU Time (s) 583015.673 573647.291 Averaged timing on all processors : ----------------------------------- Section Elap. Time(s) Elap. Time(%) CPU Time(s) CPU Time(%) CPU/Elap Max elap(%) Min elap(%) Freq sbc_ice_cice 0.1728438E+03 14.23 172.01 14.39 1.00 14.56 13.80 640.00 tra_adv_tvd 0.7375908E+02 6.07 73.52 6.15 1.00 7.29 5.22 640.00 tra_nxt 0.6276407E+02 5.17 62.63 5.24 1.00 7.68 2.33 640.00 tra_ldf_iso 0.5655850E+02 4.66 56.48 4.73 1.00 4.86 3.43 640.00 sol_pcg 0.4990999E+02 4.11 49.77 4.16 1.00 4.11 4.11 640.00 dia_wri 0.4792860E+02 3.95 47.80 4.00 1.00 9.32 0.05 640.00 tra_bbc 0.4585938E+02 3.78 45.57 3.81 0.99 11.72 0.10 640.00 zps_hde 0.3905950E+02 3.22 38.80 3.25 0.99 7.08 0.09 1281.00 ldf_slp 0.3817614E+02 3.14 38.10 3.19 1.00 3.30 3.03 640.00 nonosc 0.3657865E+02 3.01 36.47 3.05 1.00 3.48 2.77 1280.00 }}}