= Name and subject of the action Last edition: '''[[Wikinfo(changed_ts)]]''' by '''[[Wikinfo(changed_by)]]''' The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected. [[PageOutline(2, , inline)]] == Summary ||=Action || Implement 2D tiling (with the LFRA version of NEMO) || ||=PI(S) || Daley Calvert, Andrew Coward || ||=Digest || Implement 2D tiling to reduce traffic between main memory and L3 cache || ||=Dependencies || DO loop macros ([wiki:2020WP/KERNEL-02_Coward_DoLoopMacros_part1]), extended haloes (Italo Epicoco, Seb Masson and Francesca Mele), extension of XIOS to accept 2D tiles of data (Yann Meurdesoif & Seb Masson) || ||=Branch || source:/NEMO/branches/{YEAR}/dev_r{REV}_{ACTION_NAME} || ||=Previewer(s) || Gurvan Madec || ||=Reviewer(s) || Gurvan Madec || ||=Ticket || #2365 || === Description Implement loop tiling over horizontal dimensions (i and j). === Implementation A trial of horizontal tiles has been implemented in `tra_ldf_iso` and the main code changes are described. This has been tested using GYRE with 1 CPU, without XIOS. 10 day simulations using different tile decompositions (including no tiling) have been bit compared against the trunk. __Summary of method__ The full processor domain (dimensions `jpi` x `jpj`) is split into one or more subdomains (tiles). This is implemented by: '''1. Modifying the DO loop macros in `do_loop_substitute.h90` to use the tile bounds''' The tile domain is defined by a new set of domain indices (`ntsi`, `ntei`, `ntsj`, `ntej`), which are arrays with lengths equal to the number of tiles (`nijtile`) plus one and represent the internal part of the domain. The tile number (`ntile`) is used to obtain the indices for the current tile: {{{ #!diff - #define __kIs_ 2 + #define __kIs_ ntsi(ntile) }}} A new subroutine `dom_tile` (in `domain.F90`) sets the values of these indices during initialisation. The zero index is used to store the indices for the full domain: {{{ #!fortran ntsi(0) = 1 + nn_hls ntsj(0) = 1 + nn_hls ntei(0) = jpi - nn_hls ntej(0) = jpj - nn_hls }}} '''2. Declaring SUBROUTINE-level arrays using the tile bounds''' A new substitution macro in `do_loop_substitute.h90`: {{{ #define A2D __kIsm1_:__kIep1_,__kJsm1_:__kJep1_ }}} is used such that: {{{ #!diff - ALLOCATE(jpi,jpj) DIMENSION(jpi,jpj) + ALLOCATE(A2D) DIMENSION(A2D) }}} and therefore operations between local working arrays (which have the dimensions of the tile) and global/input arrays (which have the dimensions of either the tile or full domain) require no further changes, unless using `:` subscripts as described below. '''3. Replacing `:` subscripts with a DO loop macro where appropriate''' This is only necessary when step 2 would introduce an array shape inconsistency: {{{ #!diff - REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d - REAL(wp), DIMENSION(jpi,jpj) :: z2d - z2d(:,:) = a3d(:,:,1). + REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d + REAL(wp), DIMENSION(A2D) :: z2d + DO_2D_11_11 + z2d(ji,jj) = a3d(ji,jj,1) + END_2D }}} '''4. Looping over tiles at the timestepping level''' The current tile number (`ntile`) is set within this loop in `stp`, then set to 0 after exiting the loop (and after initialisation, before the loop). {{{ #!fortran ! Loop over tile domains DO jtile = 1, nijtile IF( ln_tile ) ntile = jtile CALL tra_ldf( kstp, Nbb, Nnn, ts, Nrhs ) ! lateral mixing END DO IF( ln_tile ) ntile = 0 ! Revert to tile over full domain }}} DO loops within the tiling loop therefore work on the current tile (`ntile /= 0`), while those outside the loop work on the full domain (`ntile == 0`). '''5. A new namelist (`namtile`)''' {{{ !----------------------------------------------------------------------- &namtile ! parameters of the tiling !----------------------------------------------------------------------- ln_tile = .false. ! Use tiling (T) or not (F) nn_ltile_i = 10 ! Length of tiles in i nn_ltile_j = 10 ! Length of tiles in j / }}} The number of tiles is calculated from the tile lengths, `nn_ltile_i` and `nn_ltile_j`, with respect to the full domain. __Branch__ [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12945%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] __New subroutines__ * `OCE/DOM/domain/dom_tile`- Calculate tiling variables (domain indices, number of tiles) __Modified modules__ * `cfgs/SHARED/namelist_ref`- Add `namtile` namelist * `OCE/DOM/dom_oce`- Declare namelist variables * `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables (`dom_tile`) * `OCE/IOM/prtctl`- Add IF statement to prevent execution of `prt_ctl` by each tile * `OCE/TRA/traldf`- Add IF statements to prevent execution of `trd_tra` by each tile * `OCE/TRA/traldf_iso`- Add IF statements (as above), modify local arrays for tiling * `OCE/do_loop_substitute`- Modify DO loop macros to use domain indices, add `A2D` macro * `OCE/par_oce`- Declare tiling variables * `OCE/step`- Add tiling loop * `OCE/timing`- Add IF statements to prevent execution of `timing_start` and `timing_stop` by each tile __New variables (excluding local)__ * Global variables * `ntsi`, `ntsj`- start index of tile * `ntei`, `ntej`- end index of tile * `ntile`- current tile number * `nijtile`- number of tiles * Namelist * `ln_tile`- logical control on use of tiling * `nn_ltile_i`, `nn_ltile_j`- tile length * Pre-processor macros * `A2D`- substitution for ALLOCATE or DIMENSION arguments __Notes__ '''Untiled code''' Parts of the code that should only be executed by one tile (e.g. `numout` write statements) as well as code that has not yet been tiled (e.g. timing routines) have been enclosed in IF statements. This code has been marked with `! TODO: TO BE TILED`. I will add some notes on this code in the near future. '''Extended haloes''' The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. There are few differences between this and the trunk implementation. === Documentation updates {{{#!box width=55em help Using previous parts, define the main changes to be done in the NEMO literature (manuals, guide, web pages, …). }}} ''...'' == Preview {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#preview_)]] }}} ''...'' == Tests {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#tests)]] }}} ''...'' == Review {{{#!box width=50em info [[Include(wiki:Developers/DevProcess#review)]] }}} ''...''