| 118 | The first thing to check is that both runs give the same results. tkdiff of solver.stat indicates bit comp at 10 days. |
| 119 | |
| 120 | Control run: |
| 121 | {{{ Total timing (sum) : |
| 122 | -------------------- |
| 123 | Elapsed Time (s) CPU Time (s) |
| 124 | 84009.892 72252.383 |
| 125 | |
| 126 | Averaged timing on all processors : |
| 127 | ----------------------------------- |
| 128 | Section Elap. Time(s) Elap. Time(%) CPU Time(s) CPU Time(%) CPU/Elap Max elap(%) Min elap(%) Freq |
| 129 | istate_init 0.2690187E+02 18.44 16.12 12.85 0.60 18.55 18.11 1.00 |
| 130 | tra_adv_muscl 0.1861153E+02 12.76 18.55 14.79 1.00 12.87 11.91 320.00 |
| 131 | trc_stp 0.8253362E+01 5.66 6.04 4.82 0.73 6.83 4.07 320.00 |
| 132 | sbc_ice_cice 0.8216911E+01 5.63 8.07 6.44 0.98 6.94 5.39 320.00 |
| 133 | tra_ldf_iso 0.7819895E+01 5.36 7.81 6.23 1.00 5.42 3.63 640.00 |
| 134 | trc_sms 0.7681783E+01 5.27 7.65 6.10 1.00 9.13 1.96 320.00 |
| 135 | sbc 0.6609725E+01 4.53 4.82 3.84 0.73 8.93 3.17 320.00 |
| 136 | sol_pcg 0.5979100E+01 4.10 5.95 4.74 0.99 4.10 3.22 320.00 |
| 137 | trc_sbc 0.5684169E+01 3.90 5.65 4.50 0.99 7.42 0.41 320.00 |
| 138 | cice_sbc_init 0.3166722E+01 2.17 2.82 2.25 0.89 2.19 2.16 1.00 |
| 139 | trc_nxt 0.2791707E+01 1.91 2.78 2.21 1.00 2.11 0.97 320.00 |
| 140 | cice_sbc_out 0.2142690E+01 1.47 2.13 1.70 0.99 1.71 0.16 320.00 |
| 141 | dia_wri 0.1469471E+01 1.01 1.46 1.17 1.00 2.10 0.14 320.00 |
| 142 | ldf_slp 0.1370465E+01 0.94 1.36 1.09 0.99 0.97 0.92 320.00 |
| 143 | |
| 144 | }}} |
| 145 | |
| 146 | Test Run |
| 147 | |
| 148 | {{{ |
| 149 | Total timing (sum) : |
| 150 | -------------------- |
| 151 | Elapsed Time (s) CPU Time (s) |
| 152 | 67852.649 59359.526 |
| 153 | |
| 154 | Averaged timing on all processors : |
| 155 | ----------------------------------- |
| 156 | Section Elap. Time(s) Elap. Time(%) CPU Time(s) CPU Time(%) CPU/Elap Max elap(%) Min elap(%) Freq |
| 157 | tra_adv_muscl 0.1672964E+02 14.20 16.67 16.18 1.00 14.32 13.08 320.00 |
| 158 | sbc_ice_cice 0.8397770E+01 7.13 8.22 7.98 0.98 7.97 6.82 320.00 |
| 159 | trc_sms 0.7469511E+01 6.34 7.43 7.21 0.99 10.11 2.43 320.00 |
| 160 | trc_stp 0.6297795E+01 5.35 5.82 5.65 0.92 8.72 3.81 320.00 |
| 161 | trc_sbc 0.5423700E+01 4.60 5.37 5.21 0.99 9.02 0.28 320.00 |
| 162 | sbc 0.5226975E+01 4.44 3.39 3.29 0.65 9.18 3.56 320.00 |
| 163 | sol_pcg 0.5086075E+01 4.32 5.06 4.91 0.99 4.32 3.23 320.00 |
| 164 | tra_ldf_iso 0.4719996E+01 4.01 4.71 4.57 1.00 4.09 1.83 640.00 |
| 165 | istate_init 0.3775559E+01 3.21 1.30 1.26 0.34 3.35 2.82 1.00 |
| 166 | trc_nxt 0.2831918E+01 2.40 2.82 2.73 0.99 2.59 1.40 320.00 |
| 167 | cice_sbc_init 0.2793284E+01 2.37 2.50 2.42 0.89 2.39 2.36 1.00 |
| 168 | cice_sbc_out 0.2308216E+01 1.96 2.29 2.23 0.99 2.26 1.10 320.00 |
| 169 | dom_vvl_rst 0.1516852E+01 1.29 0.35 0.34 0.23 1.45 0.83 3.00 |
| 170 | |
| 171 | }}} |
| 172 | |
| 173 | So the overall elapsed time for our test run appears to drop by 20%. That's probably far enough outside the normal XC40 variability to suggest a positive impact. |
| 174 | A detailed look at the top routines above suggest wild variance in start up time (istate_init is the slowest routine in control but only about 20% of the cost in |
| 175 | the test run - that clearly is not a result of our optimisations!) |
| 176 | |
| 177 | tra_adv_muscl claims to be 20% speed up in elapsed time but slightly slower in CPU! Go figure. |
| 178 | sol_pcg claims to be slower in elapsed but faster in CPU! |
| 179 | tra_ldf_iso claims ~25% speed up in elapsed and 35% speed up in CPU. |
| 180 | |
| 181 | So as ever it's all pretty hazy but there's no evidence here to suggest things are any worse so it should be safe to go ahead with these changes.. |
| 182 | |
| 183 | |
| 184 | |