| 1 | |
| 2 | |
| 3 | Setting up and testing optimisations for #1821 |
| 4 | |
| 5 | == Branch Creation == |
| 6 | |
| 7 | '''Here's how I created the branch:''' |
| 8 | |
| 9 | * Created branch [log:branches/UKMO/dev_r5518_optim_GO6_alloc branches/UKMO/dev_r5518_optim_GO6_alloc] from head of package branch (r7573 at time of writing). |
| 10 | |
| 11 | * Stripped out svn keywords. (I'm surprised we need to do this at all since we're creating a branch of a branch which has already been stripped - there are only a subset of files affected by this too... mostly AGRIF related) |
| 12 | |
| 13 | * Merge in the optimisations from our MEDUSA based branch and add further changes in the same style which should be of benefit to GO6 more generally. Revisions r7581:r7602 show the actual code changes made. |
| 14 | |
| 15 | == Testing == |
| 16 | |
| 17 | '''Initial testing in a copy of the GO6 standard job.''' |
| 18 | |
| 19 | * Took copy of standard suite u-ah494/trunk@29996 to u-aj380Optim to act as control run |
| 20 | |
| 21 | * Created working copy of this to act as test run. u-aj380Optim |
| 22 | |
| 23 | * Replaced branches/UKMO/dev_r5518_GO6_package@7573 in working copy with branches/UKMO/dev_r5518_optim_GO6_alloc@7602 |
| 24 | |
| 25 | * Ran both jobs for a 10 day NRUN with NEMO timings activated. Repeated both runs to try to iron out any random variations in run time on the XC40. |
| 26 | |
| 27 | * Checking solver.stat at 10 days, we have bit comparison. (I've not done a rigorous comparison of restart files) |
| 28 | |
| 29 | * Comparing NEMO timer output, it seems that there are indications that the optimised run may be ~1-2% faster overall, looking at total elapsed and total CPU, though these differences |
| 30 | lie well within the noise of variabilities in run time which can be 10% or more. |
| 31 | |
| 32 | * Here's some output; |
| 33 | |
| 34 | * Control run: |
| 35 | {{{ |
| 36 | Elapsed Time (s) CPU Time (s) |
| 37 | 593483.386 584478.452 |
| 38 | |
| 39 | Averaged timing on all processors : |
| 40 | ----------------------------------- |
| 41 | Section Elap. Time(s) Elap. Time(%) CPU Time(s) CPU Time(%) CPU/Elap Max elap(%) Min elap(%) Freq |
| 42 | sbc_ice_cice 0.1751989E+03 14.17 174.41 14.32 1.00 14.50 13.76 640.00 |
| 43 | tra_adv_tvd 0.7552396E+02 6.11 75.33 6.19 1.00 7.42 5.27 640.00 |
| 44 | tra_nxt 0.6302550E+02 5.10 62.89 5.17 1.00 7.65 2.14 640.00 |
| 45 | tra_ldf_iso 0.5789208E+02 4.68 57.77 4.74 1.00 5.08 4.06 640.00 |
| 46 | sol_pcg 0.5625327E+02 4.55 56.08 4.61 1.00 4.55 4.55 640.00 |
| 47 | dia_wri 0.4800178E+02 3.88 47.84 3.93 1.00 9.16 0.04 640.00 |
| 48 | tra_bbc 0.4565193E+02 3.69 45.39 3.73 0.99 11.38 0.09 640.00 |
| 49 | nonosc 0.4531063E+02 3.66 45.23 3.71 1.00 4.11 3.44 1280.00 |
| 50 | zps_hde 0.3959416E+02 3.20 39.35 3.23 0.99 7.05 0.08 1281.00 |
| 51 | ldf_slp 0.3823219E+02 3.09 38.15 3.13 1.00 3.25 2.98 640.00 |
| 52 | |
| 53 | }}} |
| 54 | |
| 55 | * Test Run: |
| 56 | {{{ |
| 57 | Total timing (sum) : |
| 58 | -------------------- |
| 59 | Elapsed Time (s) CPU Time (s) |
| 60 | 583015.673 573647.291 |
| 61 | |
| 62 | Averaged timing on all processors : |
| 63 | ----------------------------------- |
| 64 | Section Elap. Time(s) Elap. Time(%) CPU Time(s) CPU Time(%) CPU/Elap Max elap(%) Min elap(%) Freq |
| 65 | sbc_ice_cice 0.1728438E+03 14.23 172.01 14.39 1.00 14.56 13.80 640.00 |
| 66 | tra_adv_tvd 0.7375908E+02 6.07 73.52 6.15 1.00 7.29 5.22 640.00 |
| 67 | tra_nxt 0.6276407E+02 5.17 62.63 5.24 1.00 7.68 2.33 640.00 |
| 68 | tra_ldf_iso 0.5655850E+02 4.66 56.48 4.73 1.00 4.86 3.43 640.00 |
| 69 | sol_pcg 0.4990999E+02 4.11 49.77 4.16 1.00 4.11 4.11 640.00 |
| 70 | dia_wri 0.4792860E+02 3.95 47.80 4.00 1.00 9.32 0.05 640.00 |
| 71 | tra_bbc 0.4585938E+02 3.78 45.57 3.81 0.99 11.72 0.10 640.00 |
| 72 | zps_hde 0.3905950E+02 3.22 38.80 3.25 0.99 7.08 0.09 1281.00 |
| 73 | ldf_slp 0.3817614E+02 3.14 38.10 3.19 1.00 3.30 3.03 640.00 |
| 74 | nonosc 0.3657865E+02 3.01 36.47 3.05 1.00 3.48 2.77 1280.00 |
| 75 | |
| 76 | |
| 77 | }}} |
| 78 | |