= Name and subject of the action Last edition: '''[[Wikinfo(changed_ts)]]''' by '''[[Wikinfo(changed_by)]]''' The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected. [[PageOutline(2, , inline)]] == Summary ||=Action || Implement 2D tiling (continuation of [wiki:2020WP/HPC-02_Daley_Tiling 2020 work]) || ||=PI(S) || Daley Calvert || ||=Digest || Implement 2D tiling in `DYN` and `ZDF` code || ||=Dependencies || Cleanup of `lbc_lnk` calls (wiki:2021WP/HPC-03_Mele_Comm_Cleanup) || ||=Branch || source:/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling || ||=Previewer(s) || Italo Epico || ||=Reviewer(s) || Italo Epico || ||=Ticket || #2600 || === Description Further implement tiling for code appearing in `stp` and `stpmlf` === Implementation ==== Branch [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling?rev=14787 dev_r14273_HPC-02_Daley_Tiling@14787] * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14787%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14509%40NEMO%2Ftrunk/src/OCE Difference vs trunk] * [http://forge.ipsl.jussieu.fr/nemo/changeset?sfp_email=&sfph_mail=&reponame=&new=14787%40NEMO%2Fbranches%2F2021%2Fdev_r14273_HPC-02_Daley_Tiling/src/OCE&old=14776%40NEMO%2Fbranches%2F2021%2Fdev_r14393_HPC-03_Mele_Comm_Cleanup%2Fsrc%2FOCE Difference vs dev_r14393_HPC-03_Mele_Comm_Cleanup] ==== Changes to tiling framework __New DO loop macros Each tile is effectively a subdomain with the same structure as the full processor domain, i.e. it has an internal part (with `i` indices `ntsi:ntei` and `j` indices `ntsj:ntej`) and a halo. The example below demonstrates that operations performed on the halo points of one tile will affect the internal part of adjacent tiles. [[Image()]] This is quite a common issue. In fact, this will occur whenever a DO loop does work on halo points for a full-sized array (i.e. it is the size of the full domain, rather than just the tile) that is persistent in memory (i.e. it is declared at the module level or as allocatable with the `SAVE` attribute). Therefore, the [http://forge.ipsl.jussieu.fr/nemo/changeset/14780/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/TRA/traqsr.F90?old=14215&old_path=NEMO%2Ftrunk%2Fsrc%2FOCE%2FTRA%2Ftraqsr.F90 existing workaround must be replaced] by something better, in order to avoid many code changes. A new set of [http://forge.ipsl.jussieu.fr/nemo/changeset/14787/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/do_loop_substitute.h90?old=14215&old_path=NEMO%2Ftrunk%2Fsrc%2FOCE%2Fdo_loop_substitute.h90 DO loop macros] have been added as one solution for this problem. The aim is to avoid repeating operations on particular points by adjusting the DO loop bounds, which has the advantage of reducing unneccessary calculations when using the tiling. This is achieved by keeping track of which tiles have been completed (`l_tilefin` module variable in `domtile.F90`) and using the stencil of the DO loop to work out which points have therefore been processed: {{{#!fortran #define DO_2D_OVR(L, R, B, T) DO_2D(L-(L+R)*nthl, R-(R+L)*nthr, B-(B+T)*nthb, T-(T+B)*ntht) }}} Here, `nthl`/`nthr`/`nthb`/`ntht` are equal to 1 if work on the adjacent left/right/bottom/top tiles is finished, otherwise they are equal to 0. If there is no adjacent tile (i.e. the tile is at the edge of the domain), then the corresponding integer is again equal to 0. As a general rule (although it is not always necessary), this new DO loop macro must be used whenever: * The bounds of the DO loop include the halo (i.e. are greater than 0) * The DO loop contains an assignment to an array that is persistent in memory (e.g. the state variables `ts`, `uu`, `vv` etc) As such, they have been implemented widely in this development and have replaced previous workarounds in: * TRA/trabbl.F90 * TRA/tranpc.F90 * TRA/traqsr.F90 * TRA/trasbc.F90 __Restructuring of core `domtile.F90` routines Several [http://forge.ipsl.jussieu.fr/nemo/changeset/14787/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/DOM/domtile.F90?old=14090&old_path=NEMO%2Ftrunk%2Fsrc%2FOCE%2FDOM%2Fdomtile.F90 new routines] have been added to `domtile.F90` in order to implement the new DO loop macros, to make the tiling implementation tidier and easier to use, and to add warnings/errors. * `dom_tile`- sets the currently active tile * `dom_tile_init`- contains initialisation steps moved out of `dom_tile` * `dom_tile_start`- declare the start of a tiled code region * `dom_tile_end`- declare the end of a tiled code region The following variables have been added (see [https://forge.ipsl.jussieu.fr/nemo/wiki/2020WP/HPC-02_Daley_Tiling#Listofnewvariablesandfunctionsexcludinglocal here] for a list of the pre-existing tiling variables): * `INTEGER, PUBLIC :: nthl, nthr, nthb, ntht`- modifiers on bounds in the new DO loop macros (see above) * `LOGICAL, PUBLIC :: l_istiled`- whether tiling is **currently** active * This replaces instances of `ntile /= 0` and `ntile == 0` throughout the code * `LOGICAL, ALLOCATABLE, DIMENSION(:) :: l_tilefin`- whether a specific tile has been processed (size `nijtile`) Below is an example of how the tiling routines are now used, as well as the action taken by each routine: {{{#!fortran IF( ln_tile ) CALL dom_tile_start ! Set l_istiled = .true. DO jtile = 1, nijtile IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile = jtile ) ! Set ntile = ktile ! Set ntsi/ntei/ntsj/ntej for the given tile (ktile) ! Set nthl/nthr/nthb/ntht for the given tile (ktile) ! Set l_tilefin = .true. for the previous tile ! Tiled code END DO IF( ln_tile ) CALL dom_tile_stop ! Set ntile = 0 ! Set l_istiled = .false. ! Set ntsi/ntei/ntsj/ntej for the full domain (equal to Nis0/Nie0/Njs0/Nje0) ! Set nthl/nthr/nthb/ntht for the full domain (equal to 0) ! Set l_tilefin(:) = .false. }}} In addition, the new tiling routines now include a "pause/resume" functionality which is activated by setting `ldhold = .true.`. This [http://forge.ipsl.jussieu.fr/nemo/changeset/14780/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/DOM/dtatsd.F90?old=14189&old_path=NEMO%2Ftrunk%2Fsrc%2FOCE%2FDOM%2Fdtatsd.F90 replaces the existing workaround] with a tidier implementation. Below is an example of how this is used, in the context of the above example: {{{#!fortran IF( ln_tile ) CALL dom_tile_start DO jtile = 1, nijtile IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile = jtile ) ! Tiled code IF( ln_tile ) CALL dom_tile_stop( ldhold=.TRUE. ) ! Set l_istiled = .false. ! Set ntsi/ntei/ntsj/ntej for the full domain (equal to Nis0/Nie0/Njs0/Nje0) ! Set nthl/nthr/nthb/ntht for the full domain (equal to 0) ! Untiled code IF( ln_tile ) CALL dom_tile_start( ldhold=.TRUE. ) ! Set l_istiled = .true. ! Set ntsi/ntei/ntsj/ntej for the currently active tile (ntile) ! Set nthl/nthr/nthb/ntht for the currently active tile (ntile) END DO IF( ln_tile ) CALL dom_tile_stop }}} __XIOS tiling support Support for tiled data has been added to the XIOS trunk at [http://forge.ipsl.jussieu.fr/ioserver/changeset/2131 r2131]. These changes to the Fortran interface have been implemented in [http://forge.ipsl.jussieu.fr/nemo/changeset/14607/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/IOM/iom.F90?old=14239&old_path=NEMO%2Ftrunk%2Fsrc%2FOCE%2FIOM%2Fiom.F90 iom.F90]: * New XIOS interface routine arguments added to `iom_set_domain_attr` and `set_grid` routines * `ntile` argument added to `xios_send_field` in 2D/3D/4D routines used by `iom_put` interface * Data is only passed to XIOS if: * The size of the tile * The size of the full domain AND tiling is not active * The size of the full domain AND the current tile is the final tile * This is necessary because not all iom_put calls will pass tile-sized data when tiling is active (e.g. global arrays, which are full sized and must be calculated gradually by each tile) Additionally, it was also necessary to expand the `is_tile` interface in `DOM/domutl.F90` to include sp and dp versions. The NEMO workarounds required to use `iom_put` with tiling (e.g. in [http://forge.ipsl.jussieu.fr/nemo/changeset/14607/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/TRA/trabbc.F90?old=14072&old_path=NEMO%2Ftrunk%2Fsrc%2FOCE%2FTRA%2Ftrabbc.F90 trabbc.F90]) have been removed. **This development therefore requires the use of the [http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/trunk?rev=2131 XIOS trunk at r2131]** ==== Tiling code coverage __DYN and ZDF coverage Tiling coverage in `stp` (step.F90) and `stpmlf` (`stpmlf.F90`) has been expanded to include DYN and ZDF code. * All routines in the DYN 'block' of code except `ssh_nxt`, `wzv`, `wAimp` and `dyn_spg` * All schemes in `zdf_phy` except `zdf_osm` __Expanded TRA coverage Tiling can now be used with most of the TRA code, in particular the bilaplacian lateral diffusion operator (`ln_traldf_blp = .true.`) and all advection schemes except the FCT scheme. Following the [wiki:2021WP/HPC-03_Mele_Comm_Cleanup extended haloes development], most of the `lbc_lnk` calls affecting the tiled code have been removed in the `nn_hls = 2` case. As a result, the tiling workarounds required to bypass `lbc_lnk` calls (e.g. in [http://forge.ipsl.jussieu.fr/nemo/changeset/14780/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/TRA/traldf.F90?old=14189&old_path=NEMO%2Ftrunk%2Fsrc%2FOCE%2FTRA%2Ftraldf.F90 `traldf.F90`] have mostly been removed. **This development therefore requires the use of `nn_hls = 2` when tiling is enabled with `ln_tile = .true.`** __Untiled code The tiling implementation is currently focussed on the standard (`step.F90`) and QCO (`stpmlf.F90`) code. Code relating to the loop fusion (`key_loop_fusion`) and RK3 scheme (`key_RK3`) has not been tiled. ==== Other changes of note __Removed workarounds * `TRA/traadv.F90`- updated to reflect that the FCT scheme is now the only scheme that cannot be used with tiling * `TRA/traadv.F90`- changed the input array dimension declarations for several routines called from `tra_adv` * This was to remove arguments causing "copy-in" to occur, e.g. `CALL tra_mle_trp( kt, nit000, zuu(A2D(nn_hls),:), ...` * The "copy-in" for `dia_ptr` is still present __Removal of tiling * `ASM/asminc.F90`- removed tiling for code activated by `ln_asmdin =.true.` * This code seems to only be called during initialisation, which is not within the tiled code region __Refactoring * `DYN/dynhpg.F90`- parts of `hpg_djc` have been refactored to give consistent results for different `nn_hls` * Harmonic averages have been refactored to use the machine epsilon coefficient (`zep`) in a similar way to other code in NEMO * **This changes the results with respect to the trunk** {{{#!fortran ! Old cffu = 2._wp * zdrhox(ji-1,jj,jk) * zdrhox(ji,jj,jk) IF( cffu > zep ) THEN zdrho_i(ji,jj,jk) = cffu / ( zdrhox(ji-1,jj,jk) + zdrhox(ji,jj,jk) ) ELSE zdrho_i(ji,jj,jk ) = 0._wp ENDIF ! New cffu = MAX( 2._wp * zdrhox(ji-1,jj,jk) * zdrhox(ji,jj,jk), 0._wp ) z1_cff = zdrhox(ji-1,jj,jk) + zdrhox(ji,jj,jk) zdrho_i(ji,jj,jk) = cffu / SIGN( MAX( ABS(z1_cff), zep ), z1_cff ) }}} * `DYN/dynhpg.F90`- `hpg_prj` has been refactored to implement the tiling and to improve readability * `DYN/dynldf_iso.F90`- several working arrays have been declared as automatic local arrays instead of allocatable module arrays * This was seen as a tidier implementation, and there was no need for these arrays to persist in memory * `DYN/wet_dry.F90`- added a `ctl_warn` when `ln_wd_il = .true.` * This controls several code blocks throughout `dyn_hpg.F90` * Tiling has not been tested with this option, but it is apparently due to be deprecated in favor of the less intrusive `ln_wd_dl` option, so there are no plans to test it * `ZDF/zdfphy.F90`- a read-only copy of `avm_k` (`avm_k_n`) is saved and used by `zdf_sh2` when using tiling * The closure schemes (`zdf_tke` etc) will update `avm_k`, which is used for the calculation of `zsh2` by `zdf_sh2` with a stencil that includes halo points. When using tiling, the calculation of `zsh2` will therefore include `avm_k` values that have been updated (and are therefore valid for the next timestep) by adjacent tiles, changing the results * To preserve results when using tiling, a read-only copy of the "now" `avm_k` is saved for use in the calculation of `zsh2` * `ZDF/zdfevd.F90`- `zavt_evd`/`zavm_evd` have been declared as allocatable arrays instead of automatic arrays * `p_avt`/`p_avm` are updated on halo points when using `nn_hls > 1`. When using tiling, `p_avt`/`p_avm` will therefore already have been partially updated by adjacent tiles, since the halo of a tile corresponds to internal points on adjacent tiles. `zavt_evd`/`zavm_evd` then evaluate to zero on these points, changing the results * To preserve results when using tiling, the diagnostic data is not sent for each tile, but is instead stored for the full domain and sent once after all tiles are complete __Bug fixes * `ZDF/zdfmxl.F90`- the calculation of `hmld` has been moved into a new routine `zdf_mxl_turb` * This was done to address a bug in `zdftke.F90` that caused the results to change when using tiling and `nn_etau = 2` * When using `nn_etau = 2`, `zdf_tke_init` calls `zdf_mxl` to initialise `nmln`, which depends on `rn2b`. However, `rn2b` has not yet been initialised at this point, so the `nn_etau = 2` option was not restartable and tiling changed the results. Furthermore, the diagnostics calculated by `zdf_mxl` were not correct for the first timestep * To address these issues, the calculation of `hmld` was moved into a new routine `zdf_mxl_turb`. `zdf_mxl` is now called before `zdf_sh2` in `zdfphy.F90`, while `zdf_mxl_turb` is called where `zdf_mxl` was previously called. Additionally, `zdf_mxl` is no longer called by `zdf_tke_init` * **This bug fix changes the results with respect to the trunk** when using `nn_etau = 2` ==== Outstanding issues * The new DO loop macros result in line lengths that easily exceed 132 characters * This is easily overcome with the appropriate compiler options, but is not permitted by the NEMO coding standard * In [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2021/dev_r14273_HPC-02_Daley_Tiling/src/OCE/DYN/dynhpg.F90?rev=14787#L120 `DYN/dynhpg.F90`] there is an `lbc_lnk` call that I think can be removed, but I'm not sure whether Francesca still requires it to resolve her issue with `hpg_djc` * If it is still required, a workaround will be needed to disable the tiling when using `hpg_djc` * Tiling changes some results * `DIA/diaptr.F90` diagnostics slightly change when using tiling in a few tests * I suspect this is due to the additional integration step over tiles * The diagnostics change only very slightly, and only for a single point in the Indian Ocean basin where there are few zonal points. I therefore don't think it is a major issue, but I note it here for future investigation * `TRA/trabbl.F90` results change when using tiling and `nn_bbl_adv > 0` * This is because the order in which the tendency components are added can be changed by tiling * This is a known issue from the 2020 development === Documentation updates {{{#!box width=55em help Using previous parts, define the main changes to be done in the NEMO literature (manuals, guide, web pages, …). }}} ''...'' == Preview {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#preview_)]] }}} ''...'' == Tests {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#tests)]] }}} ''...'' == Review {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#review)]] }}} ''...''