New URL for NEMO forge!

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2020WP/HPC-02_Daley_Tiling – NEMO

Version 6 (modified by hadcv, 4 years ago) (diff)


Name and subject of the action

Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review


Action Implement 2D tiling (with the LFRA version of NEMO)
PI(S) Daley Calvert, Andrew Coward
Digest Implement 2D tiling to reduce traffic between main memory and L3 cache
Dependencies DO loop macros (2020WP/KERNEL-02_Coward_DoLoopMacros_part1), extended haloes (Italo Epicoco, Seb Masson and Francesca Mele), extension of XIOS to accept 2D tiles of data (Yann Meurdesoif & Seb Masson)
Branch source:/NEMO/branches/{YEAR}/dev_r{REV}_{ACTION_NAME}
Previewer(s) Gurvan Madec
Reviewer(s) Gurvan Madec
Ticket #2365


Implement loop tiling over horizontal dimensions (i and j).


A trial of horizontal tiles has been implemented in tra_ldf_iso and the main code changes are described.

This has been tested using GYRE with 1 CPU, without XIOS. 10 day simulations using different tile decompositions (including no tiling) have been bit compared against the trunk.

Summary of method

The full processor domain (dimensions jpi x jpj) is split into one or more subdomains (tiles). This is implemented by:

1. Modifying the DO loop macros in do_loop_substitute.h90 to use the tile bounds

The tile domain is defined by a new set of domain indices (ntsi, ntei, ntsj, ntej), which are arrays with lengths equal to the number of tiles (nijtile) plus one and represent the internal part of the domain. The tile number (ntile) is used to obtain the indices for the current tile:

- #define __kIs_     2
+ #define __kIs_     ntsi(ntile)

A new subroutine dom_tile (in domain.F90) sets the values of these indices during initialisation. The zero index is used to store the indices for the full domain:

ntsi(0) = 1 + nn_hls
ntsj(0) = 1 + nn_hls
ntei(0) = jpi - nn_hls
ntej(0) = jpj - nn_hls

2. Declaring SUBROUTINE-level arrays using the tile bounds

A new substitution macro in do_loop_substitute.h90:

#define A2D        __kIsm1_:__kIep1_,__kJsm1_:__kJep1_

is used such that:

- ALLOCATE(jpi,jpj)   DIMENSION(jpi,jpj)

and therefore operations between local working arrays (which have the dimensions of the tile) and global/input arrays (which have the dimensions of either the tile or full domain) require no further changes, unless using : subscripts as described below.

3. Replacing : subscripts with a DO loop macro where appropriate

This is only necessary when step 2 would introduce an array shape inconsistency:

- REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d
- REAL(wp), DIMENSION(jpi,jpj)     :: z2d
- z2d(:,:) = a3d(:,:,1).
+ REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d
+ REAL(wp), DIMENSION(A2D)         :: z2d
+ DO_2D_11_11
+    z2d(ji,jj) = a3d(ji,jj,1)
+ END_2D

4. Looping over tiles at the timestepping level

The current tile number (ntile) is set within this loop in stp, then set to 0 after exiting the loop (and after initialisation, before the loop).

! Loop over tile domains
DO jtile = 1, nijtile
   IF( ln_tile ) ntile = jtile
   CALL tra_ldf( kstp, Nbb, Nnn, ts, Nrhs )  ! lateral mixing
IF( ln_tile ) ntile = 0                      ! Revert to tile over full domain

DO loops within the tiling loop therefore work on the current tile (ntile /= 0), while those outside the loop work on the full domain (ntile == 0).

5. A new namelist (namtile)

   &namtile        !   parameters of the tiling
      ln_tile = .false.     !  Use tiling (T) or not (F)
      nn_ltile_i = 10       !  Length of tiles in i
      nn_ltile_j = 10       !  Length of tiles in j

The number of tiles is calculated from the tile lengths, nn_ltile_i and nn_ltile_j, with respect to the full domain.


Implementation in trunk

Implementation in extended haloes branch

New subroutines

  • OCE/DOM/domain/dom_tile- Calculate tiling variables (domain indices, number of tiles)

Modified modules

  • cfgs/SHARED/namelist_ref- Add namtile namelist
  • OCE/DOM/dom_oce- Declare namelist variables
  • OCE/DOM/domain- Read namtile namelist (dom_nam), calculate tiling variables (dom_tile)
  • OCE/IOM/prtctl- Add IF statement to prevent execution of prt_ctl by each tile
  • OCE/TRA/traldf- Add IF statements to prevent execution of trd_tra by each tile
  • OCE/TRA/traldf_iso- Add IF statements (as above), modify local arrays for tiling
  • OCE/do_loop_substitute- Modify DO loop macros to use domain indices, add A2D macro
  • OCE/par_oce- Declare tiling variables
  • OCE/step- Add tiling loop
  • OCE/timing- Add IF statements to prevent execution of timing_start and timing_stop by each tile

New variables (excluding local)

  • Global variables
    • ntsi, ntsj- start index of tile
    • ntei, ntej- end index of tile
    • ntile- current tile number
    • nijtile- number of tiles
  • Namelist
    • ln_tile- logical control on use of tiling
    • nn_ltile_i, nn_ltile_j- tile length
  • Pre-processor macros
    • A2D- substitution for ALLOCATE or DIMENSION arguments


Untiled code

Parts of the code that should only be executed by one tile (e.g. numout write statements) as well as code that has not yet been tiled (e.g. timing routines) have been enclosed in IF statements. This code has been marked with ! TODO: TO BE TILED.

I will add some notes on this code in the near future.

Extended haloes

The tiling trial has also been implemented in the extended haloes branch. There are few differences between this and the trunk implementation.

Documentation updates

Error: Failed to load processor box
No macro or processor named 'box' found



Error: Failed to load processor box
No macro or processor named 'box' found



Error: Failed to load processor box
No macro or processor named 'box' found



Error: Failed to load processor box
No macro or processor named 'box' found


Attachments (5)

Download all attachments as: .zip