Changes between Version 15 and Version 16 of 2021WP/HPC-02_Daley_Tiling
- Timestamp:
- 2021-05-10T14:48:02+02:00 (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
2021WP/HPC-02_Daley_Tiling
v15 v16 14 14 ||=PI(S) || Daley Calvert || 15 15 ||=Digest || Implement 2D tiling in `DYN` and `ZDF` code || 16 ||=Dependencies || Cleanup of `lbc_lnk` calls (wiki:2021WP/HPC-03_Mele_Comm_Cleanup) ||16 ||=Dependencies || Cleanup of `lbc_lnk` calls (wiki:2021WP/HPC-03_Mele_Comm_Cleanup) incl. extra halo science-neutral changes ([https://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/ticket2607_r14608_halo1_halo2_compatibility ticket2607_r14608_halo1_halo2_compatibility branch]) || 17 17 ||=Branch || source:/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling || 18 18 ||=Previewer(s) || Italo Epicoco || … … 22 22 === Description 23 23 24 Further implement tiling for code appearing in `stp` and `stpmlf` 24 Implement tiling for code appearing in `stp` and `stpmlf` (DYN, ZDF modules), clean up existing tiling code. 25 25 26 26 === Implementation … … 28 28 ==== Branch 29 29 30 [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14797 dev_r14273_HPC-02_Daley_Tiling@14797] 31 * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14797%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14509%40NEMO%2Ftrunk/src/OCE Difference vs trunk] 32 33 [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14393_HPC-03_Mele_Comm_Cleanup?rev=14776 dev_r14393_HPC-03_Mele_Comm_Cleanup@14776] has been merged into this branch. 34 * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14797%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14776%40NEMO%2Fbranches%2F2021%2Fdev_r14393_HPC-03_Mele_Comm_Cleanup%2Fsrc%2FOCE Difference vs dev_r14393_HPC-03_Mele_Comm_Cleanup] 30 [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14819 dev_r14273_HPC-02_Daley_Tiling@14819] 31 32 [https://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/ticket2607_r14608_halo1_halo2_compatibility ticket2607_r14608_halo1_halo2_compatibility] has been merged into the trunk at r14820. 33 * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14819%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14820%40NEMO%2Ftrunk/src/OCE Difference vs trunk@14820] 34 35 [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14393_HPC-03_Mele_Comm_Cleanup?rev=14776 dev_r14393_HPC-03_Mele_Comm_Cleanup@14776] has been merged into the branch. 36 * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14819%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14776%40NEMO%2Fbranches%2F2021%2Fdev_r14393_HPC-03_Mele_Comm_Cleanup%2Fsrc%2FOCE Difference vs dev_r14393_HPC-03_Mele_Comm_Cleanup] 35 37 36 38 ==== Changes to tiling framework … … 162 164 163 165 * All routines in the DYN 'block' of code except `ssh_nxt`, `wzv`, `wAimp` and `dyn_spg` 164 * All schemes in `zdf_phy` except `zdf_osm`166 * All routines in `zdf_phy` except `zdf_osm` 165 167 166 168 __Improved TRA coverage … … 175 177 __Untiled code 176 178 177 The tiling has been implemented in the standard (`step.F90`) and QCO (`stpmlf.F90`) code .178 179 Code relating to the loop fusion (`key_loop_fusion`) andRK3 scheme (`key_RK3`) has not been tiled.179 The tiling has been implemented in the standard (`step.F90`) and QCO (`stpmlf.F90`) code, as well as code relating to the loop fusion (`key_loop_fusion`). 180 181 Code relating to the RK3 scheme (`key_RK3`) has not been tiled. 180 182 181 183 ==== Other changes of note … … 238 240 * When using `nn_etau = 2`, `zdf_tke_init` calls `zdf_mxl` to initialise `nmln`, which depends on `rn2b`. However, `rn2b` is not yet initialised at this point, so the `nn_etau = 2` option is not restartable and tiling changes the results. Furthermore, the diagnostics calculated by `zdf_mxl` are not correct for the first timestep 239 241 * To address these issues, the calculation of `hmld` was moved into a new routine `zdf_mxl_turb`. `zdf_mxl` is now called before `zdf_sh2` in `zdfphy.F90`, while `zdf_mxl_turb` is called where `zdf_mxl` was previously called. Additionally, `zdf_mxl` is no longer called by `zdf_tke_init` 240 * **This bug fix changes the results with respect to the trunk** when using ` nn_etau = 2`242 * **This bug fix changes the results with respect to the trunk** when using `ln_zdftke = .true.` with `nn_etau = 2` 241 243 242 244 ==== Outstanding issues … … 244 246 * The new DO loop macros result in line lengths that easily exceed 132 characters 245 247 * This can be overcome by using the appropriate compiler options, but is not permitted by the NEMO coding standard 246 247 * In [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/DYN/dynhpg.F90?rev=14797#L120 DYN/dynhpg.F90] there is an `lbc_lnk` call that I think can be removed, but I'm not sure whether Francesca still requires it to resolve her issue with `hpg_djc`248 * If it is still required, a workaround will be needed to disable the tiling when using `hpg_djc`249 248 250 249 ==== List of new variables (excluding local) and functions … … 289 288 === SETTE 290 289 291 * **NOTE**: The following test results apply to r14797 of the development branch, which is in phase with r14509 of the trunk.292 293 SETTE has been tested with the following, with the QCO (`NOT_USING_QCO`) and icebergs (`USING_ICEBERGS`) options turned on and off:290 SETTE ([http://forge.ipsl.jussieu.fr/nemo/browser/utils/CI/sette?rev=14561 r14561]) has been run with the QCO (`NOT_USING_QCO`) and icebergs (`USING_ICEBERGS`) options turned on and off. 291 292 These are run for [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14819 dev_r14273_HPC-02_Daley_Tiling@14819] with the following: 294 293 295 294 * `nn_hls = 1` (`USING_EXTRA_HALO="no"`) 296 295 * `nn_hls = 2` (`USING_EXTRA_HALO="yes"`) 297 * `nn_hls = 2` (`USING_EXTRA_HALO="yes"`) and `ln_tile = .true.` 298 299 The Intel compiler (ifort 18.0.5 20180823) is used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/dev/dev_oa?rev=2084 r2084] of the tiling development branch) in detached mode. 296 * `nn_hls = 2` (`USING_EXTRA_HALO="yes"`) and `ln_tile = .true.` (using default 10i x 10j tile sizes) 297 298 and are compared with results from the [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/trunk?rev=14820 trunk@14820] with `nn_hls = 1`. 299 300 The Intel compiler (ifort 18.0.5 20180823) is used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/trunk?rev=2131 r2131 of the trunk]) in detached mode. 300 301 301 302 All tests (including SWG) pass, but it should be noted that the `USING_EXTRA_HALO` option is only used by ORCA2_ICE_PISCES. … … 311 312 * Can this change be shown to have a null impact (option not activated)? __YES__ 312 313 * Results of the required bit comparability tests been run: are there no differences when activating the development? __YES (SETTE), NO (other tests)__ 313 * If some differences appear, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling# Knownfailures knownfailures])__314 * If some differences appear, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling#Testfailures test failures])__ 314 315 * If some differences appear, is the impact as expected on model configurations? __YES__ 315 316 * Is this change expected to preserve all diagnostics? __NO__ 316 * If no, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling# Knownfailures knownfailures])__317 * If no, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling#Testfailures test failures])__ 317 318 * Are there significant changes in run time/memory? __NO__ 318 * Difference in ORCA2_ICE_PISCES/REPRO* execution times, with respect to the trunk using `nn_hls = 1` 319 * QCO, `nn_hls = 1`: <5% 320 * QCO, `nn_hls = 2`: +5-8% 321 * QCO, `nn_hls = 2`, 5x5 tiling: +11-13% 322 * non-QCO, `nn_hls = 1`: <5% 323 * non-QCO, `nn_hls = 2`: +6-7% 324 * non-QCO, `nn_hls = 2`, 5x5 tiling: +7-9% 325 * The increase due to tiling is likely because the 5x5 tile size is not optimal (`nn_ltile_i < Ni_0` causes performance loss) 326 * Icebergs are turned off 319 * ORCA2_ICE_PISCES/REPRO* changes with respect to the trunk using `nn_hls = 1` (icebergs turned off): 320 * Execution time 321 * QCO, `nn_hls = 1`: < 5% 322 * QCO, `nn_hls = 2`: + 5-7% 323 * QCO, `nn_hls = 2` and `ln_tile = .true.`: + 6-7% 324 * non-QCO, `nn_hls = 1`: < 5% 325 * non-QCO, `nn_hls = 2`: + 4-6% 326 * non-QCO, `nn_hls = 2` and `ln_tile = .true.`: + 8-11% 327 * Memory 328 * QCO, `nn_hls = 1`: < 0.1Gb 329 * QCO, `nn_hls = 2`: + 0.4Gb 330 * QCO, `nn_hls = 2` and `ln_tile = .true.`: + 0.7-0.8Gb 331 * non-QCO, `nn_hls = 1`: < 0.1Gb 332 * non-QCO, `nn_hls = 2`: + 0.4-0.5Gb 333 * non-QCO, `nn_hls = 2` and `ln_tile = .true.`: + 0.8-0.9Gb 334 * The increase in execution time due to tiling is likely because the 10x10 tile size is not optimal (`nn_ltile_i < Ni_0` causes performance loss) 335 * The trunk with `nn_hls = 1` uses about 11Gb of memory 336 * The increase in memory due to `nn_hls = 2` is probably mostly due to an increase in the size of the domain 337 * The increase in memory due to tiling may be because a number of additional arrays are declared when `ln_tile = .true.` 327 338 328 339 === Development testing … … 330 341 A configuration based on ORCA2_ICE_PISCES (without `key_si3` or `key_top`) was used to test code modified by the tiling development. 331 342 To facilitate cleaner testing, `ln_trabbc`, `ln_trabbl`, `ln_icebergs`, `ln_rnf`, `ln_ssr`, `ln_tradmp`, `ln_ldfeiv`, `ln_traldf_msc`, `ln_mle`, `ln_zdfddm` and `ln_zdfiwm` were all set to `.false.`. 332 `ln_qsr_2bd` was used instead of `ln_qsr_rgb`, `ln_dynvor_ene` was used instead of `ln_dynvor_een`, `nn_havtb`/`nn_etau`/`nn_ice`/`nn_fwb` were set to 0, and `nn_fsbc` was set to 1. 333 334 All tests were run with the standard VVL code, the QCO code (`key_qco`) and the new non-VVL code (`key_linssh`). 335 336 Simulations of the tiling branch were run for 100 days with 1-day diagnostic output, for all scientific options relevant to the affected code. 337 Each simulation was run with: 343 `ln_qsr_2bd` was used instead of `ln_qsr_rgb`, `nn_havtb`/`nn_etau`/`nn_ice`/`nn_fwb` were set to 0, and `nn_fsbc` was set to 1. 344 345 All tests were run with the standard VVL code, the QCO code (`key_qco`) and the new linear free surface code (`key_linssh`). 346 347 The Intel compiler (ifort 18.0.5 20180823) was used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/trunk?rev=2131 r2131 of the trunk]) in detached mode. 348 A `jpni = 4`, `jpnj = 9` decomposition was used with 6 XIOS processors. 349 350 Simulations using [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14819 dev_r14273_HPC-02_Daley_Tiling@14819] and the [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/trunk?rev=14820 trunk@14820] were run for 100 days with 1-day diagnostic output, for all scientific options relevant to the affected code. 351 Each simulation of the tiling branch was run with: 338 352 339 353 1. `nn_hls = 1` … … 342 356 4. `nn_hls = 2` and `ln_tile = .true.`, using 50x50 tiles (equivalent to one tile over the full domain) 343 357 344 `run.stat` and diagnostic output were compared with simulations of the trunk using configuration 1 (`nn_hls = 1`), and with equivalent 100-day simulations of the tiling branch that were run in two 50-day submissions (i.e. testing for restartability). 345 346 The Intel compiler (ifort 18.0.5 20180823) was used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/dev/dev_oa?rev=2084 r2084] of the tiling development branch) in detached mode. 347 A `jpni = 4`, `jpnj = 9` decomposition was used with 6 XIOS processors. 348 349 * **NOTE**: this testing is not exhaustive and covers only the scientific options required to test the code directly affected by the tiling, although some limited additional testing was included. For example, the `dynspg_ts` scheme is used in all tests as this is the standard option for ORCA2_ICE_PISCES, but one additional test was included for the `dynspg_exp` scheme. 358 `run.stat` and diagnostic output were compared with simulations of the trunk using `nn_hls = 1`, and with equivalent simulations of the tiling branch that were run in two 50-day submissions (i.e. testing for restartability). 359 360 * **NOTE**: this testing is not exhaustive and covers only the scientific options required to test the code directly affected by the tiling, although some limited additional testing was included. For example, the `dynspg_ts` scheme is used in all tests as this is the standard setting for ORCA2_ICE_PISCES, but one additional test was included for the `dynspg_exp` scheme. 350 361 351 362 ==== Test failures … … 354 365 355 366 * Results differ when using `nn_hls = 2` and standard (non-QCO) code 356 * `ln_traadv_fct = .true.` with `nn_fct_h = 4` and `nn_fct_v = 2` 357 * `ln_traldf_lap = .true.` with `ln_traldf_triad = .true.`, `ln_botmix_triad = .true.`, and `ln_traldf_msc = .true.` 358 * `ln_zdftke = .true.` with `ln_drg_off = .true.` 367 * **__TO BE COMPLETED__** 359 368 * **NOTE**: As the QCO code will replace the standard code, I don't think these issues are worth investigating. They are also very hard to track down, as they seem to disappear when unrelated scientific options (e.g. vertical mixing coefficients) are changed. 360 369 361 370 __Expected failures 362 363 This list does not include tests that fail due to refactoring to preserve results for different `nn_hls` ([http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/ticket2607_r14608_halo1_halo2_compatibility ticket2607_r14608_halo1_halo2_compatibility] development).364 371 365 372 * Results differ when using tiling … … 371 378 372 379 * Results differ with respect to the trunk 373 * `ln_trabbl = .true.` diagnostics `uoce_bbl`/`voce_bbl`374 * The removal of the `lbc_lnk` for `utr_bbl`/`vtr_bbl` changes results, because the sign should have been reversed (i.e. the results were incorrect in the trunk)375 380 * `ln_zdftke = .true.` with `nn_etau = 2` 376 381 * This is because of a bug fix (the results were incorrect in the trunk)