New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2020WP/HPC-02_Daley_Tiling (diff) – NEMO

Changes between Version 8 and Version 9 of 2020WP/HPC-02_Daley_Tiling


Ignore:
Timestamp:
2020-09-24T20:06:30+02:00 (4 years ago)
Author:
hadcv
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2020WP/HPC-02_Daley_Tiling

    v8 v9  
    2626=== Implementation 
    2727 
    28 The current approach to tiling is described below. A document describing the issues encountered to date are described in [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf this document]. 
    29  
    30 Several modules have been tiled as of 18/06/20: `tra_ldf`, `tra_zdf`, `tra_adv` and `dia_ptr`. 
    31  
    32 The tiling implementation has been tested using GYRE with 1 CPU. The tests comprise 10 day simulations using different tile decompositions (including no tiling) and different science options particular to the tiled modules. A test passes if the tiling does not change results at the bit level (`run.stat`) or in the diagnostics. 
     28As of 24/09/20, most of the code called by the "active tracers" part of the step subroutine (between `trc_stp` and `tra_atf`) has been tiled. Solutions and workarounds for the issues encountered to date are described in [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf this document]. 
     29 
     30The tiling implementation has been tested using GYRE in benchmark mode with mono-processor and MPI configurations. The tests comprise 10 day simulations using different tile decompositions (including no tiling) and different science options particular to the tiled modules. A test passes if the tiling does not change results at the bit level (`run.stat`) or in the diagnostics.  
    3331 
    3432__Summary of method__ 
     
    4341  {{{ 
    4442  #!diff 
    45   - #define __kIs_     2 
    46   + #define __kIs_     ntsi 
     43  - #define DO_2D(B, T, L, R) DO jj = Njs0-(B), Nje0+(T)   ;   DO ji = Nis0-(L), Nie0+(R) 
     44  + #define DO_2D(B, T, L, R) DO jj = ntsj-(B), ntej+(T)   ;   DO ji = ntsi-(L), ntei+(R) 
    4745  }}} 
    4846 
     
    5351  {{{ 
    5452  #!fortran 
    55   ntsi_a(0) = 1 + nn_hls 
    56   ntsj_a(0) = 1 + nn_hls 
    57   ntei_a(0) = jpi - nn_hls 
    58   ntej_a(0) = jpj - nn_hls 
    59   }}} 
    60  
    61   `dom_tile` is called whenever the active tile needs to be set or if tiling needs to be suppressed: 
     53  ntsi_a(0) = Nis0 
     54  ntsj_a(0) = Njs0 
     55  ntei_a(0) = Nie0 
     56  ntej_a(0) = Nje0 
     57  }}} 
     58 
     59  `dom_tile` is called whenever the active tile needs to be set or if tiling needs to be disabled: 
    6260 
    6361  {{{ 
     
    6967'''2. Declaring SUBROUTINE-level arrays using the tile bounds''' 
    7068 
    71   A new substitution macro in `do_loop_substitute.h90`: 
    72  
    73   {{{ 
    74   #define A2D        __kIsm1_:__kIep1_,__kJsm1_:__kJep1_ 
    75   }}} 
    76  
    77   is used such that: 
     69  A new set of substitution macros in `do_loop_substitute.h90`: 
     70 
     71  {{{ 
     72  #define ST_1Di(H) ntsi-H:ntei+H 
     73  #define ST_1Dj(H) ntsj-H:ntej+H 
     74  #define ST_2D(H) ST_1Di(H),ST_1Dj(H) 
     75  }}} 
     76 
     77  replaces references to the full domain in explicit shape and allocatable array declarations: 
    7878 
    7979  {{{ 
    8080  #!diff 
    81   - ALLOCATE(jpi,jpj)   DIMENSION(jpi,jpj) 
    82   + ALLOCATE(A2D)       DIMENSION(A2D) 
    83   }}} 
    84  
    85   and therefore operations between local working arrays (which have the dimensions of the tile) and global/input arrays (which have the dimensions of either the tile or full domain) require no further changes, unless using `:` subscripts as described below. 
    86  
    87 '''3. Replacing `:` subscripts with a DO loop macro where appropriate''' 
    88  
    89   This is only necessary when step 2 would introduce conformance issues: 
    90  
    91   {{{ 
    92   #!diff 
    93   - REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d 
    94   - REAL(wp), DIMENSION(jpi,jpj)     :: z2d 
    95   - z2d(:,:) = a3d(:,:,1). 
    96   + REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d 
    97   + REAL(wp), DIMENSION(A2D)         :: z2d 
    98   + DO_2D_11_11 
    99   +    z2d(ji,jj) = a3d(ji,jj,1) 
    100   + END_2D 
    101   }}} 
    102  
    103 '''4. Looping over tiles at the timestepping level''' 
     81  - ALLOCATE(jpi,jpj      ) DIMENSION(jpi,jpj      ) 
     82  + ALLOCATE(ST_2D(nn_hls)) DIMENSION(ST_2D(nn_hls)) 
     83  }}} 
     84 
     85  These arrays then have the same dimensions as the tile if tiling is used, otherwise they will have the same dimensions as the full domain as before. Furthermore, the tile-sized arrays are declared with lower and upper bounds corresponding to the position of the tile in the full domain. Horizontal indices, for example in DO loops, will therefore apply to both tile- and full-sized arrays: 
     86 
     87  {{{ 
     88  #!fortran 
     89  ! ntsi = 3, ntsj = 7, ntei = 5, ntej = 9 
     90  REAL(wp), DIMENSION(ntsi:ntei,ntsj:ntej) :: z2d 
     91  REAL(wp), DIMENSION(jpi,jpj) :: a2d 
     92 
     93  DO_2D(1,1,1,1) 
     94    z2d(ji,jj) = a2d(ji,jj) 
     95  END_2D 
     96  }}} 
     97 
     98  This substitution is made for local working arrays where possible to minimise memory consumption when using tiling.  
     99  No further changes are generally required, except in specific cases described in [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf this document] and other common cases described in steps 5 & 6 below. 
     100   
     101'''3. Looping over tiles at the timestepping level''' 
    104102 
    105103  A loop over tiles has been added to `stp`. The domain indices for the current tile (`ntile /= 0`) are set at the start of each iteration. After exiting the loop (and before, during initialisation) the tiling is suppressed (`ntile == 0`): 
     
    110108  DO jtile = 1, nijtile 
    111109     IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=jtile ) 
     110 
    112111     CALL tra_ldf( kstp, Nbb, Nnn, ts, Nrhs )  ! lateral mixing 
    113112  END DO 
     113 
    114114  IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=0 )        ! Revert to full domain 
    115115  }}} 
    116116 
    117   DO loops within the tiling loop therefore work on the current tile, while those outside the loop work on the full domain. 
    118  
    119 '''5. A new namelist (`namtile`)''' 
     117  DO loops within the tiling loop therefore work on the current tile, while those outside the tiling loop work on the full domain. 
     118 
     119'''4. A new namelist (`namtile`)''' 
    120120 
    121121{{{ 
     
    131131  The number of tiles is calculated from the tile lengths, `nn_ltile_i` and `nn_ltile_j`, with respect to the full domain. 
    132132 
     133'''5. Replacing `:` subscripts with a DO loop macro where appropriate''' 
     134 
     135  This is only necessary when step 2 would introduce conformance issues: 
     136 
     137  {{{ 
     138  #!diff 
     139  - REAL(wp), DIMENSION(jpi,jpj,jpk)   :: a3d 
     140  - REAL(wp), DIMENSION(jpi,jpj)       :: z2d 
     141  - z2d(:,:) = a3d(:,:,1). 
     142  + REAL(wp), DIMENSION(jpi,jpj,jpk)   :: a3d 
     143  + REAL(wp), DIMENSION(ST_2D(nn_hls)) :: z2d 
     144  + DO_2D(1,1,1,1) 
     145  +    z2d(ji,jj) = a3d(ji,jj,1) 
     146  + END_2D 
     147  }}} 
     148 
     149'''6. Suppressing code that should not be called more than once per timestep''' 
     150 
     151  Examples include ocean.output write statements and initialisation steps outside of an "_ini" routine. 
     152 
    133153__Branch__ 
    134154 
    135 ''These branches contain a trial implementation of tiling in `tra_ldf_iso`; there is not yet a formal branch for the development.'' 
    136  
    137 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] 
    138  
    139 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] 
     155[http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/2020/dev_r13383_HPC-02_Daley_Tiling] 
    140156 
    141157__New subroutines__ 
     
    148164* `OCE/DOM/dom_oce`- Declare tiling namelist and other tiling variables 
    149165* `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables and do control print (`dom_tile`) 
    150 * `OCE/IOM/prtctl`- Add IF statement to prevent execution of `prt_ctl` by each tile 
    151 * `OCE/TRA/traldf`- Add IF statements to prevent execution of `trd_tra` by each tile 
    152 * `OCE/TRA/traldf_iso`- Add IF statements (as above), modify local arrays for tiling 
    153 * `OCE/do_loop_substitute`- Modify DO loop macros to use domain indices, add `A2D` macro 
     166* `OCE/DOM/domutl`- `is_tile` functions 
     167* `OCE/do_loop_substitute`- Modify DO loop macro to use domain indices, add CPP macros 
    154168* `OCE/par_oce`- Declare tiling variables 
    155169* `OCE/step`- Add tiling loop 
    156170* `OCE/step_oce`- Add USE statement for `dom_tile` in `step` 
    157 * `OCE/timing`- Add IF statements to prevent execution of `timing_start` and `timing_stop` by each tile 
     171* Various others.. 
    158172 
    159173__New variables (excluding local)__ 
     
    166180  * `ntile`- current tile number 
    167181  * `nijtile`- number of tiles 
    168 * Namelist 
     182* Namelist (`namtile`) 
    169183  * `ln_tile`- logical control on use of tiling 
    170184  * `nn_ltile_i`, `nn_ltile_j`- tile length 
    171185* Pre-processor macros 
    172   * `A2D`- substitution for ALLOCATE or DIMENSION arguments 
    173  
    174 __Notes__ 
    175  
    176 '''Issues with the tiling implementation''' 
    177  
    178 See the attached [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf document]. 
    179  
    180 '''Extended haloes''' 
    181  
    182 The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. 
    183 There are few differences between this and the trunk implementation. 
     186  * `ST_*D`- substitutions for ALLOCATE or DIMENSION arguments 
     187  * `ST_*DT`- substitutions for ALLOCATE or DIMENSION arguments when the shape of the array is unknown 
     188* Functions 
     189  * `is_tile`- Returns 0 if the array has the dimensions of the full domain, else 1 
    184190 
    185191=== Documentation updates