New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2021WP/HPC-02_Daley_Tiling (diff) – NEMO

Changes between Version 15 and Version 16 of 2021WP/HPC-02_Daley_Tiling


Ignore:
Timestamp:
2021-05-10T14:48:02+02:00 (3 years ago)
Author:
hadcv
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2021WP/HPC-02_Daley_Tiling

    v15 v16  
    1414||=PI(S)        || Daley Calvert || 
    1515||=Digest       || Implement 2D tiling in `DYN` and `ZDF` code || 
    16 ||=Dependencies || Cleanup of `lbc_lnk` calls (wiki:2021WP/HPC-03_Mele_Comm_Cleanup) || 
     16||=Dependencies || Cleanup of `lbc_lnk` calls (wiki:2021WP/HPC-03_Mele_Comm_Cleanup) incl. extra halo science-neutral changes ([https://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/ticket2607_r14608_halo1_halo2_compatibility ticket2607_r14608_halo1_halo2_compatibility branch]) || 
    1717||=Branch       || source:/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling || 
    1818||=Previewer(s) || Italo Epicoco || 
     
    2222=== Description 
    2323 
    24 Further implement tiling for code appearing in `stp` and `stpmlf` 
     24Implement tiling for code appearing in `stp` and `stpmlf` (DYN, ZDF modules), clean up existing tiling code. 
    2525 
    2626=== Implementation 
     
    2828==== Branch 
    2929 
    30 [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14797 dev_r14273_HPC-02_Daley_Tiling@14797] 
    31  * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14797%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14509%40NEMO%2Ftrunk/src/OCE Difference vs trunk] 
    32  
    33 [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14393_HPC-03_Mele_Comm_Cleanup?rev=14776 dev_r14393_HPC-03_Mele_Comm_Cleanup@14776] has been merged into this branch. 
    34  * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14797%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14776%40NEMO%2Fbranches%2F2021%2Fdev_r14393_HPC-03_Mele_Comm_Cleanup%2Fsrc%2FOCE Difference vs dev_r14393_HPC-03_Mele_Comm_Cleanup] 
     30[http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14819 dev_r14273_HPC-02_Daley_Tiling@14819] 
     31 
     32[https://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/ticket2607_r14608_halo1_halo2_compatibility ticket2607_r14608_halo1_halo2_compatibility] has been merged into the trunk at r14820. 
     33 * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14819%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14820%40NEMO%2Ftrunk/src/OCE Difference vs trunk@14820] 
     34 
     35[http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14393_HPC-03_Mele_Comm_Cleanup?rev=14776 dev_r14393_HPC-03_Mele_Comm_Cleanup@14776] has been merged into the branch. 
     36 * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14819%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14776%40NEMO%2Fbranches%2F2021%2Fdev_r14393_HPC-03_Mele_Comm_Cleanup%2Fsrc%2FOCE Difference vs dev_r14393_HPC-03_Mele_Comm_Cleanup] 
    3537 
    3638==== Changes to tiling framework 
     
    162164 
    163165 * All routines in the DYN 'block' of code except `ssh_nxt`, `wzv`, `wAimp` and `dyn_spg` 
    164  * All schemes in `zdf_phy` except `zdf_osm` 
     166 * All routines in `zdf_phy` except `zdf_osm` 
    165167 
    166168__Improved TRA coverage 
     
    175177__Untiled code 
    176178 
    177 The tiling has been implemented in the standard (`step.F90`) and QCO (`stpmlf.F90`) code. 
    178  
    179 Code relating to the loop fusion (`key_loop_fusion`) and RK3 scheme (`key_RK3`) has not been tiled. 
     179The tiling has been implemented in the standard (`step.F90`) and QCO (`stpmlf.F90`) code, as well as code relating to the loop fusion (`key_loop_fusion`). 
     180 
     181Code relating to the RK3 scheme (`key_RK3`) has not been tiled. 
    180182 
    181183==== Other changes of note 
     
    238240  * When using `nn_etau = 2`, `zdf_tke_init` calls `zdf_mxl` to initialise `nmln`, which depends on `rn2b`. However, `rn2b` is not yet initialised at this point, so the `nn_etau = 2` option is not restartable and tiling changes the results. Furthermore, the diagnostics calculated by `zdf_mxl` are not correct for the first timestep 
    239241  * To address these issues, the calculation of `hmld` was moved into a new routine `zdf_mxl_turb`. `zdf_mxl` is now called before `zdf_sh2` in `zdfphy.F90`, while `zdf_mxl_turb` is called where `zdf_mxl` was previously called. Additionally, `zdf_mxl` is no longer called by `zdf_tke_init` 
    240   * **This bug fix changes the results with respect to the trunk** when using `nn_etau = 2` 
     242  * **This bug fix changes the results with respect to the trunk** when using `ln_zdftke = .true.` with `nn_etau = 2` 
    241243 
    242244==== Outstanding issues 
     
    244246 * The new DO loop macros result in line lengths that easily exceed 132 characters 
    245247  * This can be overcome by using the appropriate compiler options, but is not permitted by the NEMO coding standard 
    246  
    247  * In [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/DYN/dynhpg.F90?rev=14797#L120 DYN/dynhpg.F90] there is an `lbc_lnk` call that I think can be removed, but I'm not sure whether Francesca still requires it to resolve her issue with `hpg_djc` 
    248   * If it is still required, a workaround will be needed to disable the tiling when using `hpg_djc` 
    249248 
    250249==== List of new variables (excluding local) and functions 
     
    289288=== SETTE 
    290289 
    291 * **NOTE**: The following test results apply to r14797 of the development branch, which is in phase with r14509 of the trunk. 
    292  
    293 SETTE has been tested with the following, with the QCO (`NOT_USING_QCO`) and icebergs (`USING_ICEBERGS`) options turned on and off: 
     290SETTE ([http://forge.ipsl.jussieu.fr/nemo/browser/utils/CI/sette?rev=14561 r14561]) has been run with the QCO (`NOT_USING_QCO`) and icebergs (`USING_ICEBERGS`) options turned on and off. 
     291 
     292These are run for [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14819 dev_r14273_HPC-02_Daley_Tiling@14819] with the following: 
    294293 
    295294 * `nn_hls = 1` (`USING_EXTRA_HALO="no"`) 
    296295 * `nn_hls = 2` (`USING_EXTRA_HALO="yes"`) 
    297  * `nn_hls = 2` (`USING_EXTRA_HALO="yes"`) and `ln_tile = .true.` 
    298  
    299 The Intel compiler (ifort 18.0.5 20180823) is used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/dev/dev_oa?rev=2084 r2084] of the tiling development branch) in detached mode. 
     296 * `nn_hls = 2` (`USING_EXTRA_HALO="yes"`) and `ln_tile = .true.` (using default 10i x 10j tile sizes) 
     297 
     298and are compared with results from the [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/trunk?rev=14820 trunk@14820] with `nn_hls = 1`. 
     299 
     300The Intel compiler (ifort 18.0.5 20180823) is used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/trunk?rev=2131 r2131 of the trunk]) in detached mode. 
    300301 
    301302All tests (including SWG) pass, but it should be noted that the `USING_EXTRA_HALO` option is only used by ORCA2_ICE_PISCES.  
     
    311312 * Can this change be shown to have a null impact (option not activated)? __YES__ 
    312313 * Results of the required bit comparability tests been run: are there no differences when activating the development? __YES (SETTE), NO (other tests)__ 
    313   * If some differences appear, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling#Knownfailures known failures])__ 
     314  * If some differences appear, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling#Testfailures test failures])__ 
    314315  * If some differences appear, is the impact as expected on model configurations? __YES__ 
    315316 * Is this change expected to preserve all diagnostics? __NO__ 
    316   * If no, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling#Knownfailures known failures])__ 
     317  * If no, is reason for the change valid/understood? __YES (see [http://forge.ipsl.jussieu.fr/nemo/wiki/2021WP/HPC-02_Daley_Tiling#Testfailures test failures])__ 
    317318 * Are there significant changes in run time/memory? __NO__ 
    318   * Difference in ORCA2_ICE_PISCES/REPRO* execution times, with respect to the trunk using `nn_hls = 1` 
    319    * QCO, `nn_hls = 1`: <5% 
    320    * QCO, `nn_hls = 2`: +5-8% 
    321    * QCO, `nn_hls = 2`, 5x5 tiling: +11-13% 
    322    * non-QCO, `nn_hls = 1`: <5% 
    323    * non-QCO, `nn_hls = 2`: +6-7% 
    324    * non-QCO, `nn_hls = 2`, 5x5 tiling: +7-9% 
    325   * The increase due to tiling is likely because the 5x5 tile size is not optimal (`nn_ltile_i < Ni_0` causes performance loss) 
    326   * Icebergs are turned off 
     319  * ORCA2_ICE_PISCES/REPRO* changes with respect to the trunk using `nn_hls = 1` (icebergs turned off): 
     320   * Execution time 
     321    * QCO, `nn_hls = 1`: < 5% 
     322    * QCO, `nn_hls = 2`: + 5-7% 
     323    * QCO, `nn_hls = 2` and `ln_tile = .true.`: + 6-7% 
     324    * non-QCO, `nn_hls = 1`: < 5% 
     325    * non-QCO, `nn_hls = 2`: + 4-6% 
     326    * non-QCO, `nn_hls = 2` and `ln_tile = .true.`: + 8-11% 
     327   * Memory 
     328    * QCO, `nn_hls = 1`: < 0.1Gb 
     329    * QCO, `nn_hls = 2`: + 0.4Gb 
     330    * QCO, `nn_hls = 2` and `ln_tile = .true.`: + 0.7-0.8Gb 
     331    * non-QCO, `nn_hls = 1`: < 0.1Gb 
     332    * non-QCO, `nn_hls = 2`: + 0.4-0.5Gb 
     333    * non-QCO, `nn_hls = 2` and `ln_tile = .true.`: + 0.8-0.9Gb 
     334  * The increase in execution time due to tiling is likely because the 10x10 tile size is not optimal (`nn_ltile_i < Ni_0` causes performance loss) 
     335  * The trunk with `nn_hls = 1` uses about 11Gb of memory 
     336  * The increase in memory due to `nn_hls = 2` is probably mostly due to an increase in the size of the domain 
     337  * The increase in memory due to tiling may be because a number of additional arrays are declared when `ln_tile = .true.` 
    327338 
    328339=== Development testing 
     
    330341A configuration based on ORCA2_ICE_PISCES (without `key_si3` or `key_top`) was used to test code modified by the tiling development.  
    331342To facilitate cleaner testing, `ln_trabbc`, `ln_trabbl`, `ln_icebergs`, `ln_rnf`, `ln_ssr`, `ln_tradmp`, `ln_ldfeiv`, `ln_traldf_msc`, `ln_mle`, `ln_zdfddm` and `ln_zdfiwm` were all set to `.false.`.  
    332 `ln_qsr_2bd` was used instead of `ln_qsr_rgb`, `ln_dynvor_ene` was used instead of `ln_dynvor_een`, `nn_havtb`/`nn_etau`/`nn_ice`/`nn_fwb` were set to 0, and `nn_fsbc` was set to 1. 
    333  
    334 All tests were run with the standard VVL code, the QCO code (`key_qco`) and the new non-VVL code (`key_linssh`). 
    335  
    336 Simulations of the tiling branch were run for 100 days with 1-day diagnostic output, for all scientific options relevant to the affected code. 
    337 Each simulation was run with:  
     343`ln_qsr_2bd` was used instead of `ln_qsr_rgb`, `nn_havtb`/`nn_etau`/`nn_ice`/`nn_fwb` were set to 0, and `nn_fsbc` was set to 1. 
     344 
     345All tests were run with the standard VVL code, the QCO code (`key_qco`) and the new linear free surface code (`key_linssh`). 
     346 
     347The Intel compiler (ifort 18.0.5 20180823) was used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/trunk?rev=2131 r2131 of the trunk]) in detached mode. 
     348A `jpni = 4`, `jpnj = 9` decomposition was used with 6 XIOS processors.  
     349 
     350Simulations using [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14819 dev_r14273_HPC-02_Daley_Tiling@14819] and the [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/trunk?rev=14820 trunk@14820] were run for 100 days with 1-day diagnostic output, for all scientific options relevant to the affected code. 
     351Each simulation of the tiling branch was run with:  
    338352 
    339353 1. `nn_hls = 1` 
     
    342356 4. `nn_hls = 2` and `ln_tile = .true.`, using 50x50 tiles (equivalent to one tile over the full domain) 
    343357 
    344 `run.stat` and diagnostic output were compared with simulations of the trunk using configuration 1 (`nn_hls = 1`), and with equivalent 100-day simulations of the tiling branch that were run in two 50-day submissions (i.e. testing for restartability). 
    345  
    346 The Intel compiler (ifort 18.0.5 20180823) was used with XIOS ([http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/dev/dev_oa?rev=2084 r2084] of the tiling development branch) in detached mode. 
    347 A `jpni = 4`, `jpnj = 9` decomposition was used with 6 XIOS processors.  
    348  
    349 * **NOTE**: this testing is not exhaustive and covers only the scientific options required to test the code directly affected by the tiling, although some limited additional testing was included. For example, the `dynspg_ts` scheme is used in all tests as this is the standard option for ORCA2_ICE_PISCES, but one additional test was included for the `dynspg_exp` scheme. 
     358`run.stat` and diagnostic output were compared with simulations of the trunk using `nn_hls = 1`, and with equivalent simulations of the tiling branch that were run in two 50-day submissions (i.e. testing for restartability). 
     359 
     360* **NOTE**: this testing is not exhaustive and covers only the scientific options required to test the code directly affected by the tiling, although some limited additional testing was included. For example, the `dynspg_ts` scheme is used in all tests as this is the standard setting for ORCA2_ICE_PISCES, but one additional test was included for the `dynspg_exp` scheme. 
    350361 
    351362==== Test failures 
     
    354365 
    355366 * Results differ when using `nn_hls = 2` and standard (non-QCO) code 
    356   * `ln_traadv_fct = .true.` with `nn_fct_h = 4` and `nn_fct_v = 2` 
    357   * `ln_traldf_lap = .true.` with `ln_traldf_triad = .true.`, `ln_botmix_triad = .true.`, and `ln_traldf_msc = .true.` 
    358   * `ln_zdftke = .true.` with `ln_drg_off = .true.` 
     367  * **__TO BE COMPLETED__** 
    359368  * **NOTE**: As the QCO code will replace the standard code, I don't think these issues are worth investigating. They are also very hard to track down, as they seem to disappear when unrelated scientific options (e.g. vertical mixing coefficients) are changed. 
    360369 
    361370__Expected failures 
    362  
    363 This list does not include tests that fail due to refactoring to preserve results for different `nn_hls` ([http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/ticket2607_r14608_halo1_halo2_compatibility ticket2607_r14608_halo1_halo2_compatibility] development). 
    364371 
    365372 * Results differ when using tiling 
     
    371378 
    372379 * Results differ with respect to the trunk 
    373   * `ln_trabbl = .true.` diagnostics `uoce_bbl`/`voce_bbl` 
    374    * The removal of the `lbc_lnk` for `utr_bbl`/`vtr_bbl` changes results, because the sign should have been reversed (i.e. the results were incorrect in the trunk) 
    375380  * `ln_zdftke = .true.` with `nn_etau = 2` 
    376381   * This is because of a bug fix (the results were incorrect in the trunk)