New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2020WP/HPC-02_Daley_Tiling (diff) – NEMO

Changes between Version 5 and Version 6 of 2020WP/HPC-02_Daley_Tiling


Ignore:
Timestamp:
2020-05-19T13:46:47+02:00 (4 years ago)
Author:
hadcv
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2020WP/HPC-02_Daley_Tiling

    v5 v6  
    2323 
    2424Implement loop tiling over horizontal dimensions (i and j). 
     25 
    2526=== Implementation 
    2627 
    27 A trial implementation of `tra_ldf_iso` is described in [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/tra_ldf_iso%20trial.pdf this document]. It will be revised as described [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_call_notes_220420.pdf here]. 
    28  
    29 The main code changes in the preferred approach (using public variables) are described below. 
     28A trial of horizontal tiles has been implemented in `tra_ldf_iso` and the main code changes are described.  
     29 
     30This has been tested using GYRE with 1 CPU, without XIOS. 10 day simulations using different tile decompositions (including no tiling) have been bit compared against the trunk. 
    3031 
    3132__Summary of method__ 
    3233 
    33 The full processor domain (1:jpi, 1:jpj) is split into one or more subdomains (tiles). 
    34  
    35 To work on a tile, the DO loop macros in `do_loop_substitute` are modified to use a new set of domain indices. A new subroutine `DOM/domain/dom_tile` sets the values of these indices and is also used to initialise the tile to the full domain in `DOM/domain/dom_init`. 
    36  
    37 A loop over tiles is implemented at the timestepping level in `OCE/step/stp`. The domain indices for the tile subdomain are set within this loop by `dom_tile`, then 'unset' (set back to the full domain) after exiting the loop. All DO loops within the tiling loop therefore work on the current tile, instead of the full processor domain. 
    38  
    39 The number of tiles is determined by the tile lengths, `nn_tile_i` and `nn_tile_j` defined in a new namelist `namtile`, with respect to the full domain. 
    40  
    41 __Branch__ 
    42  
    43 [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/UKMO/dev_r12745_HPC-02_Daley_Tiling_trial_public dev_r12745_HPC-02_Daley_Tiling_trial_public] 
    44  
    45 __New subroutines__ 
    46  
    47 * `dom_tile` - Set domain indices 
    48  
    49 __Modified modules__ 
    50  
    51 ''NOTE: the number of affected modules is expected to be much larger in the final implementation'' 
    52  
    53 * `cfgs/SHARED/namelist_ref` - Add namelist `namtile` 
    54 * `OCE/DOM/dom_oce` - Declare namelist variables 
    55 * `OCE/DOM/domain` - Read `namtile` namelist and calculate tiling decomposition, add `dom_tile`, initialise domain indices 
    56 * `OCE/TRA/traldf` - Changes to account for domain indices 
    57 * `OCE/TRA/traldf_iso` - Changes to account for domain indices 
    58 * `OCE/do_loop_substitute` - Implement domain indices 
    59 * `OCE/par_oce` - Declare domain indices and tiling decomposition parameters 
    60 * `OCE/step` - Add tiling loop and set domain indices using `dom_tile` 
    61 * `OCE/step_oce` - Import `dom_tile` 
    62  
    63 __Variables__ 
    64  
    65 * Global variables 
    66   * `ntsi`, `ntsj`- start index of tile 
    67   * `ntei`, `ntej`- end index of tile 
    68   * `ntsim1`, `ntsjm1`- start index of tile, minus 1 
    69   * `nteip1`, `ntejp1`- end index of tile, plus 1 
    70   * `ntile`- tile number 
    71 * Parameters 
    72   * `jpnitile`, `jpnjtile`, `jpnijtile`- number of tiles 
    73 * Loop indices 
    74   * `jtile`- loop over tiles 
    75 * Namelist 
    76   * `ln_tile`- Logical control on use of tiling 
    77   * `nn_tile_i`, `nn_tile_j`- tile length 
    78 * Pre-processor macros 
    79   * `IND_2D`- substitution for ALLOCATE or DIMENSION arguments 
    80 * Working variables 
    81   * `iitile`, `ijtile`- tile number 
    82 * Dummy arguments 
    83   * `kntile` (`ntile`) 
    84  
    85 __Namelist__ 
     34The full processor domain (dimensions `jpi` x `jpj`) is split into one or more subdomains (tiles).  
     35This is implemented by:  
     36 
     37'''1. Modifying the DO loop macros in `do_loop_substitute.h90` to use the tile bounds''' 
     38 
     39  The tile domain is defined by a new set of domain indices (`ntsi`, `ntei`, `ntsj`, `ntej`), which are arrays with lengths equal to the number of tiles (`nijtile`) plus one and represent the internal part of the domain. The tile number (`ntile`) is used to obtain the indices for the current tile: 
     40 
     41  {{{ 
     42  #!diff 
     43  - #define __kIs_     2 
     44  + #define __kIs_     ntsi(ntile) 
     45  }}} 
     46 
     47  A new subroutine `dom_tile` (in `domain.F90`) sets the values of these indices during initialisation. 
     48  The zero index is used to store the indices for the full domain: 
     49 
     50  {{{ 
     51  #!fortran 
     52  ntsi(0) = 1 + nn_hls 
     53  ntsj(0) = 1 + nn_hls 
     54  ntei(0) = jpi - nn_hls 
     55  ntej(0) = jpj - nn_hls 
     56  }}} 
     57 
     58'''2. Declaring SUBROUTINE-level arrays using the tile bounds''' 
     59 
     60  A new substitution macro in `do_loop_substitute.h90`: 
     61 
     62  {{{ 
     63  #define A2D        __kIsm1_:__kIep1_,__kJsm1_:__kJep1_ 
     64  }}} 
     65 
     66  is used such that: 
     67 
     68  {{{ 
     69  #!diff 
     70  - ALLOCATE(jpi,jpj)   DIMENSION(jpi,jpj) 
     71  + ALLOCATE(A2D)       DIMENSION(A2D) 
     72  }}} 
     73 
     74  and therefore operations between local working arrays (which have the dimensions of the tile) and global/input arrays (which have the dimensions of either the tile or full domain) require no further changes, unless using `:` subscripts as described below. 
     75 
     76'''3. Replacing `:` subscripts with a DO loop macro where appropriate''' 
     77 
     78  This is only necessary when step 2 would introduce an array shape inconsistency: 
     79 
     80  {{{ 
     81  #!diff 
     82  - REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d 
     83  - REAL(wp), DIMENSION(jpi,jpj)     :: z2d 
     84  - z2d(:,:) = a3d(:,:,1). 
     85  + REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d 
     86  + REAL(wp), DIMENSION(A2D)         :: z2d 
     87  + DO_2D_11_11 
     88  +    z2d(ji,jj) = a3d(ji,jj,1) 
     89  + END_2D 
     90  }}} 
     91 
     92'''4. Looping over tiles at the timestepping level''' 
     93 
     94  The current tile number (`ntile`) is set within this loop in `stp`, then set to 0 after exiting the loop (and after initialisation, before the loop).  
     95 
     96  {{{ 
     97  #!fortran 
     98  ! Loop over tile domains 
     99  DO jtile = 1, nijtile 
     100     IF( ln_tile ) ntile = jtile 
     101     CALL tra_ldf( kstp, Nbb, Nnn, ts, Nrhs )  ! lateral mixing 
     102  END DO 
     103  IF( ln_tile ) ntile = 0                      ! Revert to tile over full domain 
     104  }}} 
     105 
     106  DO loops within the tiling loop therefore work on the current tile (`ntile /= 0`), while those outside the loop work on the full domain (`ntile == 0`). 
     107 
     108'''5. A new namelist (`namtile`)''' 
    86109 
    87110{{{ 
     
    90113   !----------------------------------------------------------------------- 
    91114      ln_tile = .false.     !  Use tiling (T) or not (F) 
    92       nn_tile_i = 10        !  Length of tiles in i 
    93       nn_tile_j = 10        !  Length of tiles in j 
     115      nn_ltile_i = 10       !  Length of tiles in i 
     116      nn_ltile_j = 10       !  Length of tiles in j 
    94117   / 
    95118}}} 
     119 
     120  The number of tiles is calculated from the tile lengths, `nn_ltile_i` and `nn_ltile_j`, with respect to the full domain. 
     121 
     122__Branch__ 
     123 
     124[http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12945%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] 
     125 
     126[http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] 
     127 
     128__New subroutines__ 
     129 
     130* `OCE/DOM/domain/dom_tile`- Calculate tiling variables (domain indices, number of tiles) 
     131 
     132__Modified modules__ 
     133 
     134* `cfgs/SHARED/namelist_ref`- Add `namtile` namelist 
     135* `OCE/DOM/dom_oce`- Declare namelist variables 
     136* `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables (`dom_tile`) 
     137* `OCE/IOM/prtctl`- Add IF statement to prevent execution of `prt_ctl` by each tile 
     138* `OCE/TRA/traldf`- Add IF statements to prevent execution of `trd_tra` by each tile 
     139* `OCE/TRA/traldf_iso`- Add IF statements (as above), modify local arrays for tiling 
     140* `OCE/do_loop_substitute`- Modify DO loop macros to use domain indices, add `A2D` macro 
     141* `OCE/par_oce`- Declare tiling variables 
     142* `OCE/step`- Add tiling loop 
     143* `OCE/timing`- Add IF statements to prevent execution of `timing_start` and `timing_stop` by each tile 
     144 
     145__New variables (excluding local)__ 
     146 
     147* Global variables 
     148  * `ntsi`, `ntsj`- start index of tile 
     149  * `ntei`, `ntej`- end index of tile 
     150  * `ntile`- current tile number 
     151  * `nijtile`- number of tiles 
     152* Namelist 
     153  * `ln_tile`- logical control on use of tiling 
     154  * `nn_ltile_i`, `nn_ltile_j`- tile length 
     155* Pre-processor macros 
     156  * `A2D`- substitution for ALLOCATE or DIMENSION arguments 
     157 
     158__Notes__ 
     159 
     160'''Untiled code''' 
     161 
     162Parts of the code that should only be executed by one tile (e.g. `numout` write statements) as well as code that has not yet been tiled (e.g. timing routines) have been enclosed in IF statements. 
     163This code has been marked with `! TODO: TO BE TILED`. 
     164 
     165I will add some notes on this code in the near future. 
     166 
     167'''Extended haloes''' 
     168 
     169The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. 
     170There are few differences between this and the trunk implementation. 
     171 
    96172=== Documentation updates 
    97173