Changes between Version 6 and Version 7 of 2020WP/HPC-02_Daley_Tiling
- Timestamp:
- 2020-06-18T15:50:18+02:00 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
2020WP/HPC-02_Daley_Tiling
v6 v7 26 26 === Implementation 27 27 28 A trial of horizontal tiles has been implemented in `tra_ldf_iso` and the main code changes are described. 29 30 This has been tested using GYRE with 1 CPU, without XIOS. 10 day simulations using different tile decompositions (including no tiling) have been bit compared against the trunk. 28 The current approach to tiling is described below. More detailed notes, including a number of issues that prevent the full tiling of code are described in [??? this document]. 29 30 Several modules have been tiled as of 18/06/20: `tra_ldf`, `tra_zdf`, `tra_adv` and `dia_ptr`. 31 32 The tiling implementation has been tested using GYRE with 1 CPU. The tests comprise 10 day simulations using different tile decompositions (including no tiling) and different science options particular to the tiled modules. A test passes if the tiling does not change results at the bit level (`run.stat`) or in the diagnostics. 31 33 32 34 __Summary of method__ 33 35 34 The full processor domain (dimensions `jpi` x `jpj`) is split into one or more subdomains (tiles).36 The full processor domain (dimensions `jpi` x `jpj`) is split into one or more tiles/subdomains. 35 37 This is implemented by: 36 38 37 39 '''1. Modifying the DO loop macros in `do_loop_substitute.h90` to use the tile bounds''' 38 40 39 The tile domain is defined by a new set of domain indices (`ntsi`, `ntei`, `ntsj`, `ntej`), which are arrays with lengths equal to the number of tiles (`nijtile`) plus one and represent the internal part of the domain. The tile number (`ntile`) is used to obtain the indices for the current tile:41 The tile domain is defined by a new set of domain indices (`ntsi`, `ntei`, `ntsj`, `ntej`), which represent the internal part of the domain: 40 42 41 43 {{{ 42 44 #!diff 43 45 - #define __kIs_ 2 44 + #define __kIs_ ntsi(ntile) 45 }}} 46 47 A new subroutine `dom_tile` (in `domain.F90`) sets the values of these indices during initialisation. 48 The zero index is used to store the indices for the full domain: 46 + #define __kIs_ ntsi 47 }}} 48 49 A new subroutine `dom_tile` (in `domain.F90`) sets the values of these indices. 50 51 During initialisation, this subroutine calculates and stores the indices in global arrays (`ntsi_a`, `ntei_a`, `ntsj_a`, `ntej_a`) with lengths equal to the number of tiles (`nijtile`) plus one. The zero index is used to store the indices for the full domain: 49 52 50 53 {{{ 51 54 #!fortran 52 ntsi(0) = 1 + nn_hls 53 ntsj(0) = 1 + nn_hls 54 ntei(0) = jpi - nn_hls 55 ntej(0) = jpj - nn_hls 55 ntsi_a(0) = 1 + nn_hls 56 ntsj_a(0) = 1 + nn_hls 57 ntei_a(0) = jpi - nn_hls 58 ntej_a(0) = jpj - nn_hls 59 }}} 60 61 `dom_tile` is called whenever the active tile needs to be set or if tiling needs to be suppressed: 62 63 {{{ 64 #!fortran 65 CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=3 ) ! Work on tile 3 66 CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=0 ) ! Work on the full domain 56 67 }}} 57 68 … … 76 87 '''3. Replacing `:` subscripts with a DO loop macro where appropriate''' 77 88 78 This is only necessary when step 2 would introduce an array shape inconsistency:89 This is only necessary when step 2 would introduce conformance issues: 79 90 80 91 {{{ … … 92 103 '''4. Looping over tiles at the timestepping level''' 93 104 94 The current tile number (`ntile`) is set within this loop in `stp`, then set to 0 after exiting the loop (and after initialisation, before the loop).105 A loop over tiles has been added to `stp`. The domain indices for the current tile (`ntile /= 0`) are set at the start of each iteration. After exiting the loop (and before, during initialisation) the tiling is suppressed (`ntile == 0`): 95 106 96 107 {{{ … … 98 109 ! Loop over tile domains 99 110 DO jtile = 1, nijtile 100 IF( ln_tile ) ntile = jtile111 IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=jtile ) 101 112 CALL tra_ldf( kstp, Nbb, Nnn, ts, Nrhs ) ! lateral mixing 102 113 END DO 103 IF( ln_tile ) ntile = 0 ! Revert to tile overfull domain104 }}} 105 106 DO loops within the tiling loop therefore work on the current tile (`ntile /= 0`), while those outside the loop work on the full domain (`ntile == 0`).114 IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=0 ) ! Revert to full domain 115 }}} 116 117 DO loops within the tiling loop therefore work on the current tile, while those outside the loop work on the full domain. 107 118 108 119 '''5. A new namelist (`namtile`)''' … … 122 133 __Branch__ 123 134 124 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12945%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] 125 126 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] 135 ''These branches contain a trial implementation of tiling in `tra_ldf_iso`; there is not yet a formal branch for the development.'' 136 137 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] 138 139 [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] 127 140 128 141 __New subroutines__ 129 142 130 * `OCE/DOM/domain/dom_tile`- Calculate tiling variables (domain indices, number of tiles)143 * `OCE/DOM/domain/dom_tile`- Calculate/set tiling variables (domain indices, number of tiles) 131 144 132 145 __Modified modules__ 133 146 134 147 * `cfgs/SHARED/namelist_ref`- Add `namtile` namelist 135 * `OCE/DOM/dom_oce`- Declare namelistvariables136 * `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables (`dom_tile`)148 * `OCE/DOM/dom_oce`- Declare tiling namelist and other tiling variables 149 * `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables and do control print (`dom_tile`) 137 150 * `OCE/IOM/prtctl`- Add IF statement to prevent execution of `prt_ctl` by each tile 138 151 * `OCE/TRA/traldf`- Add IF statements to prevent execution of `trd_tra` by each tile … … 141 154 * `OCE/par_oce`- Declare tiling variables 142 155 * `OCE/step`- Add tiling loop 156 * `OCE/step_oce`- Add USE statement for `dom_tile` in `step` 143 157 * `OCE/timing`- Add IF statements to prevent execution of `timing_start` and `timing_stop` by each tile 144 158 … … 148 162 * `ntsi`, `ntsj`- start index of tile 149 163 * `ntei`, `ntej`- end index of tile 164 * `ntsi_a`, `ntsj_a`- start indices of each tile 165 * `ntei_a`, `ntej_a`- end indices of each tile 150 166 * `ntile`- current tile number 151 167 * `nijtile`- number of tiles … … 158 174 __Notes__ 159 175 160 '''Untiled code''' 161 162 Parts of the code that should only be executed by one tile (e.g. `numout` write statements) as well as code that has not yet been tiled (e.g. timing routines) have been enclosed in IF statements. 163 This code has been marked with `! TODO: TO BE TILED`. 164 165 I will add some notes on this code in the near future. 176 '''Untiled code and other issues''' 177 178 See the attached [??? document]. 166 179 167 180 '''Extended haloes''' 168 181 169 The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=129 42%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch].182 The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. 170 183 There are few differences between this and the trunk implementation. 171 184