New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2020WP/HPC-02_Daley_Tiling (diff) – NEMO

Changes between Version 6 and Version 7 of 2020WP/HPC-02_Daley_Tiling


Ignore:
Timestamp:
2020-06-18T15:50:18+02:00 (4 years ago)
Author:
hadcv
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2020WP/HPC-02_Daley_Tiling

    v6 v7  
    2626=== Implementation 
    2727 
    28 A trial of horizontal tiles has been implemented in `tra_ldf_iso` and the main code changes are described.  
    29  
    30 This has been tested using GYRE with 1 CPU, without XIOS. 10 day simulations using different tile decompositions (including no tiling) have been bit compared against the trunk. 
     28The current approach to tiling is described below. More detailed notes, including a number of issues that prevent the full tiling of code are described in [??? this document]. 
     29 
     30Several modules have been tiled as of 18/06/20: `tra_ldf`, `tra_zdf`, `tra_adv` and `dia_ptr`. 
     31 
     32The tiling implementation has been tested using GYRE with 1 CPU. The tests comprise 10 day simulations using different tile decompositions (including no tiling) and different science options particular to the tiled modules. A test passes if the tiling does not change results at the bit level (`run.stat`) or in the diagnostics. 
    3133 
    3234__Summary of method__ 
    3335 
    34 The full processor domain (dimensions `jpi` x `jpj`) is split into one or more subdomains (tiles).  
     36The full processor domain (dimensions `jpi` x `jpj`) is split into one or more tiles/subdomains.  
    3537This is implemented by:  
    3638 
    3739'''1. Modifying the DO loop macros in `do_loop_substitute.h90` to use the tile bounds''' 
    3840 
    39   The tile domain is defined by a new set of domain indices (`ntsi`, `ntei`, `ntsj`, `ntej`), which are arrays with lengths equal to the number of tiles (`nijtile`) plus one and represent the internal part of the domain. The tile number (`ntile`) is used to obtain the indices for the current tile: 
     41  The tile domain is defined by a new set of domain indices (`ntsi`, `ntei`, `ntsj`, `ntej`), which represent the internal part of the domain: 
    4042 
    4143  {{{ 
    4244  #!diff 
    4345  - #define __kIs_     2 
    44   + #define __kIs_     ntsi(ntile) 
    45   }}} 
    46  
    47   A new subroutine `dom_tile` (in `domain.F90`) sets the values of these indices during initialisation. 
    48   The zero index is used to store the indices for the full domain: 
     46  + #define __kIs_     ntsi 
     47  }}} 
     48 
     49  A new subroutine `dom_tile` (in `domain.F90`) sets the values of these indices.  
     50 
     51  During initialisation, this subroutine calculates and stores the indices in global arrays (`ntsi_a`, `ntei_a`, `ntsj_a`, `ntej_a`) with lengths equal to the number of tiles (`nijtile`) plus one. The zero index is used to store the indices for the full domain: 
    4952 
    5053  {{{ 
    5154  #!fortran 
    52   ntsi(0) = 1 + nn_hls 
    53   ntsj(0) = 1 + nn_hls 
    54   ntei(0) = jpi - nn_hls 
    55   ntej(0) = jpj - nn_hls 
     55  ntsi_a(0) = 1 + nn_hls 
     56  ntsj_a(0) = 1 + nn_hls 
     57  ntei_a(0) = jpi - nn_hls 
     58  ntej_a(0) = jpj - nn_hls 
     59  }}} 
     60 
     61  `dom_tile` is called whenever the active tile needs to be set or if tiling needs to be suppressed: 
     62 
     63  {{{ 
     64  #!fortran 
     65  CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=3 ) ! Work on tile 3 
     66  CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=0 ) ! Work on the full domain 
    5667  }}} 
    5768 
     
    7687'''3. Replacing `:` subscripts with a DO loop macro where appropriate''' 
    7788 
    78   This is only necessary when step 2 would introduce an array shape inconsistency: 
     89  This is only necessary when step 2 would introduce conformance issues: 
    7990 
    8091  {{{ 
     
    92103'''4. Looping over tiles at the timestepping level''' 
    93104 
    94   The current tile number (`ntile`) is set within this loop in `stp`, then set to 0 after exiting the loop (and after initialisation, before the loop).  
     105  A loop over tiles has been added to `stp`. The domain indices for the current tile (`ntile /= 0`) are set at the start of each iteration. After exiting the loop (and before, during initialisation) the tiling is suppressed (`ntile == 0`): 
    95106 
    96107  {{{ 
     
    98109  ! Loop over tile domains 
    99110  DO jtile = 1, nijtile 
    100      IF( ln_tile ) ntile = jtile 
     111     IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=jtile ) 
    101112     CALL tra_ldf( kstp, Nbb, Nnn, ts, Nrhs )  ! lateral mixing 
    102113  END DO 
    103   IF( ln_tile ) ntile = 0                      ! Revert to tile over full domain 
    104   }}} 
    105  
    106   DO loops within the tiling loop therefore work on the current tile (`ntile /= 0`), while those outside the loop work on the full domain (`ntile == 0`). 
     114  IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=0 )        ! Revert to full domain 
     115  }}} 
     116 
     117  DO loops within the tiling loop therefore work on the current tile, while those outside the loop work on the full domain. 
    107118 
    108119'''5. A new namelist (`namtile`)''' 
     
    122133__Branch__ 
    123134 
    124 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12945%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] 
    125  
    126 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] 
     135''These branches contain a trial implementation of tiling in `tra_ldf_iso`; there is not yet a formal branch for the development.'' 
     136 
     137[http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] 
     138 
     139[http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] 
    127140 
    128141__New subroutines__ 
    129142 
    130 * `OCE/DOM/domain/dom_tile`- Calculate tiling variables (domain indices, number of tiles) 
     143* `OCE/DOM/domain/dom_tile`- Calculate/set tiling variables (domain indices, number of tiles) 
    131144 
    132145__Modified modules__ 
    133146 
    134147* `cfgs/SHARED/namelist_ref`- Add `namtile` namelist 
    135 * `OCE/DOM/dom_oce`- Declare namelist variables 
    136 * `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables (`dom_tile`) 
     148* `OCE/DOM/dom_oce`- Declare tiling namelist and other tiling variables 
     149* `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables and do control print (`dom_tile`) 
    137150* `OCE/IOM/prtctl`- Add IF statement to prevent execution of `prt_ctl` by each tile 
    138151* `OCE/TRA/traldf`- Add IF statements to prevent execution of `trd_tra` by each tile 
     
    141154* `OCE/par_oce`- Declare tiling variables 
    142155* `OCE/step`- Add tiling loop 
     156* `OCE/step_oce`- Add USE statement for `dom_tile` in `step` 
    143157* `OCE/timing`- Add IF statements to prevent execution of `timing_start` and `timing_stop` by each tile 
    144158 
     
    148162  * `ntsi`, `ntsj`- start index of tile 
    149163  * `ntei`, `ntej`- end index of tile 
     164  * `ntsi_a`, `ntsj_a`- start indices of each tile 
     165  * `ntei_a`, `ntej_a`- end indices of each tile 
    150166  * `ntile`- current tile number 
    151167  * `nijtile`- number of tiles 
     
    158174__Notes__ 
    159175 
    160 '''Untiled code''' 
    161  
    162 Parts of the code that should only be executed by one tile (e.g. `numout` write statements) as well as code that has not yet been tiled (e.g. timing routines) have been enclosed in IF statements. 
    163 This code has been marked with `! TODO: TO BE TILED`. 
    164  
    165 I will add some notes on this code in the near future. 
     176'''Untiled code and other issues''' 
     177 
     178See the attached [??? document]. 
    166179 
    167180'''Extended haloes''' 
    168181 
    169 The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. 
     182The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. 
    170183There are few differences between this and the trunk implementation. 
    171184