33 | | The full processor domain (1:jpi, 1:jpj) is split into one or more subdomains (tiles). |
34 | | |
35 | | To work on a tile, the DO loop macros in `do_loop_substitute` are modified to use a new set of domain indices. A new subroutine `DOM/domain/dom_tile` sets the values of these indices and is also used to initialise the tile to the full domain in `DOM/domain/dom_init`. |
36 | | |
37 | | A loop over tiles is implemented at the timestepping level in `OCE/step/stp`. The domain indices for the tile subdomain are set within this loop by `dom_tile`, then 'unset' (set back to the full domain) after exiting the loop. All DO loops within the tiling loop therefore work on the current tile, instead of the full processor domain. |
38 | | |
39 | | The number of tiles is determined by the tile lengths, `nn_tile_i` and `nn_tile_j` defined in a new namelist `namtile`, with respect to the full domain. |
40 | | |
41 | | __Branch__ |
42 | | |
43 | | [http://forge.ipsl.jussieu.fr/nemo/browser/NEMO/branches/UKMO/dev_r12745_HPC-02_Daley_Tiling_trial_public dev_r12745_HPC-02_Daley_Tiling_trial_public] |
44 | | |
45 | | __New subroutines__ |
46 | | |
47 | | * `dom_tile` - Set domain indices |
48 | | |
49 | | __Modified modules__ |
50 | | |
51 | | ''NOTE: the number of affected modules is expected to be much larger in the final implementation'' |
52 | | |
53 | | * `cfgs/SHARED/namelist_ref` - Add namelist `namtile` |
54 | | * `OCE/DOM/dom_oce` - Declare namelist variables |
55 | | * `OCE/DOM/domain` - Read `namtile` namelist and calculate tiling decomposition, add `dom_tile`, initialise domain indices |
56 | | * `OCE/TRA/traldf` - Changes to account for domain indices |
57 | | * `OCE/TRA/traldf_iso` - Changes to account for domain indices |
58 | | * `OCE/do_loop_substitute` - Implement domain indices |
59 | | * `OCE/par_oce` - Declare domain indices and tiling decomposition parameters |
60 | | * `OCE/step` - Add tiling loop and set domain indices using `dom_tile` |
61 | | * `OCE/step_oce` - Import `dom_tile` |
62 | | |
63 | | __Variables__ |
64 | | |
65 | | * Global variables |
66 | | * `ntsi`, `ntsj`- start index of tile |
67 | | * `ntei`, `ntej`- end index of tile |
68 | | * `ntsim1`, `ntsjm1`- start index of tile, minus 1 |
69 | | * `nteip1`, `ntejp1`- end index of tile, plus 1 |
70 | | * `ntile`- tile number |
71 | | * Parameters |
72 | | * `jpnitile`, `jpnjtile`, `jpnijtile`- number of tiles |
73 | | * Loop indices |
74 | | * `jtile`- loop over tiles |
75 | | * Namelist |
76 | | * `ln_tile`- Logical control on use of tiling |
77 | | * `nn_tile_i`, `nn_tile_j`- tile length |
78 | | * Pre-processor macros |
79 | | * `IND_2D`- substitution for ALLOCATE or DIMENSION arguments |
80 | | * Working variables |
81 | | * `iitile`, `ijtile`- tile number |
82 | | * Dummy arguments |
83 | | * `kntile` (`ntile`) |
84 | | |
85 | | __Namelist__ |
| 34 | The full processor domain (dimensions `jpi` x `jpj`) is split into one or more subdomains (tiles). |
| 35 | This is implemented by: |
| 36 | |
| 37 | '''1. Modifying the DO loop macros in `do_loop_substitute.h90` to use the tile bounds''' |
| 38 | |
| 39 | The tile domain is defined by a new set of domain indices (`ntsi`, `ntei`, `ntsj`, `ntej`), which are arrays with lengths equal to the number of tiles (`nijtile`) plus one and represent the internal part of the domain. The tile number (`ntile`) is used to obtain the indices for the current tile: |
| 40 | |
| 41 | {{{ |
| 42 | #!diff |
| 43 | - #define __kIs_ 2 |
| 44 | + #define __kIs_ ntsi(ntile) |
| 45 | }}} |
| 46 | |
| 47 | A new subroutine `dom_tile` (in `domain.F90`) sets the values of these indices during initialisation. |
| 48 | The zero index is used to store the indices for the full domain: |
| 49 | |
| 50 | {{{ |
| 51 | #!fortran |
| 52 | ntsi(0) = 1 + nn_hls |
| 53 | ntsj(0) = 1 + nn_hls |
| 54 | ntei(0) = jpi - nn_hls |
| 55 | ntej(0) = jpj - nn_hls |
| 56 | }}} |
| 57 | |
| 58 | '''2. Declaring SUBROUTINE-level arrays using the tile bounds''' |
| 59 | |
| 60 | A new substitution macro in `do_loop_substitute.h90`: |
| 61 | |
| 62 | {{{ |
| 63 | #define A2D __kIsm1_:__kIep1_,__kJsm1_:__kJep1_ |
| 64 | }}} |
| 65 | |
| 66 | is used such that: |
| 67 | |
| 68 | {{{ |
| 69 | #!diff |
| 70 | - ALLOCATE(jpi,jpj) DIMENSION(jpi,jpj) |
| 71 | + ALLOCATE(A2D) DIMENSION(A2D) |
| 72 | }}} |
| 73 | |
| 74 | and therefore operations between local working arrays (which have the dimensions of the tile) and global/input arrays (which have the dimensions of either the tile or full domain) require no further changes, unless using `:` subscripts as described below. |
| 75 | |
| 76 | '''3. Replacing `:` subscripts with a DO loop macro where appropriate''' |
| 77 | |
| 78 | This is only necessary when step 2 would introduce an array shape inconsistency: |
| 79 | |
| 80 | {{{ |
| 81 | #!diff |
| 82 | - REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d |
| 83 | - REAL(wp), DIMENSION(jpi,jpj) :: z2d |
| 84 | - z2d(:,:) = a3d(:,:,1). |
| 85 | + REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d |
| 86 | + REAL(wp), DIMENSION(A2D) :: z2d |
| 87 | + DO_2D_11_11 |
| 88 | + z2d(ji,jj) = a3d(ji,jj,1) |
| 89 | + END_2D |
| 90 | }}} |
| 91 | |
| 92 | '''4. Looping over tiles at the timestepping level''' |
| 93 | |
| 94 | The current tile number (`ntile`) is set within this loop in `stp`, then set to 0 after exiting the loop (and after initialisation, before the loop). |
| 95 | |
| 96 | {{{ |
| 97 | #!fortran |
| 98 | ! Loop over tile domains |
| 99 | DO jtile = 1, nijtile |
| 100 | IF( ln_tile ) ntile = jtile |
| 101 | CALL tra_ldf( kstp, Nbb, Nnn, ts, Nrhs ) ! lateral mixing |
| 102 | END DO |
| 103 | IF( ln_tile ) ntile = 0 ! Revert to tile over full domain |
| 104 | }}} |
| 105 | |
| 106 | DO loops within the tiling loop therefore work on the current tile (`ntile /= 0`), while those outside the loop work on the full domain (`ntile == 0`). |
| 107 | |
| 108 | '''5. A new namelist (`namtile`)''' |
| 119 | |
| 120 | The number of tiles is calculated from the tile lengths, `nn_ltile_i` and `nn_ltile_j`, with respect to the full domain. |
| 121 | |
| 122 | __Branch__ |
| 123 | |
| 124 | [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12945%40NEMO%2Fbranches%2FUKMO%2Fdev_r12745_HPC-02_Daley_Tiling_trial_public&old=12740%40NEMO%2Ftrunk Implementation in trunk] |
| 125 | |
| 126 | [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo Implementation in extended haloes branch] |
| 127 | |
| 128 | __New subroutines__ |
| 129 | |
| 130 | * `OCE/DOM/domain/dom_tile`- Calculate tiling variables (domain indices, number of tiles) |
| 131 | |
| 132 | __Modified modules__ |
| 133 | |
| 134 | * `cfgs/SHARED/namelist_ref`- Add `namtile` namelist |
| 135 | * `OCE/DOM/dom_oce`- Declare namelist variables |
| 136 | * `OCE/DOM/domain`- Read `namtile` namelist (`dom_nam`), calculate tiling variables (`dom_tile`) |
| 137 | * `OCE/IOM/prtctl`- Add IF statement to prevent execution of `prt_ctl` by each tile |
| 138 | * `OCE/TRA/traldf`- Add IF statements to prevent execution of `trd_tra` by each tile |
| 139 | * `OCE/TRA/traldf_iso`- Add IF statements (as above), modify local arrays for tiling |
| 140 | * `OCE/do_loop_substitute`- Modify DO loop macros to use domain indices, add `A2D` macro |
| 141 | * `OCE/par_oce`- Declare tiling variables |
| 142 | * `OCE/step`- Add tiling loop |
| 143 | * `OCE/timing`- Add IF statements to prevent execution of `timing_start` and `timing_stop` by each tile |
| 144 | |
| 145 | __New variables (excluding local)__ |
| 146 | |
| 147 | * Global variables |
| 148 | * `ntsi`, `ntsj`- start index of tile |
| 149 | * `ntei`, `ntej`- end index of tile |
| 150 | * `ntile`- current tile number |
| 151 | * `nijtile`- number of tiles |
| 152 | * Namelist |
| 153 | * `ln_tile`- logical control on use of tiling |
| 154 | * `nn_ltile_i`, `nn_ltile_j`- tile length |
| 155 | * Pre-processor macros |
| 156 | * `A2D`- substitution for ALLOCATE or DIMENSION arguments |
| 157 | |
| 158 | __Notes__ |
| 159 | |
| 160 | '''Untiled code''' |
| 161 | |
| 162 | Parts of the code that should only be executed by one tile (e.g. `numout` write statements) as well as code that has not yet been tiled (e.g. timing routines) have been enclosed in IF statements. |
| 163 | This code has been marked with `! TODO: TO BE TILED`. |
| 164 | |
| 165 | I will add some notes on this code in the near future. |
| 166 | |
| 167 | '''Extended haloes''' |
| 168 | |
| 169 | The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12942%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. |
| 170 | There are few differences between this and the trunk implementation. |
| 171 | |