28 | | The current approach to tiling is described below. A document describing the issues encountered to date are described in [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf this document]. |
29 | | |
30 | | Several modules have been tiled as of 18/06/20: `tra_ldf`, `tra_zdf`, `tra_adv` and `dia_ptr`. |
31 | | |
32 | | The tiling implementation has been tested using GYRE with 1 CPU. The tests comprise 10 day simulations using different tile decompositions (including no tiling) and different science options particular to the tiled modules. A test passes if the tiling does not change results at the bit level (`run.stat`) or in the diagnostics. |
| 28 | As of 24/09/20, most of the code called by the "active tracers" part of the step subroutine (between `trc_stp` and `tra_atf`) has been tiled. Solutions and workarounds for the issues encountered to date are described in [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf this document]. |
| 29 | |
| 30 | The tiling implementation has been tested using GYRE in benchmark mode with mono-processor and MPI configurations. The tests comprise 10 day simulations using different tile decompositions (including no tiling) and different science options particular to the tiled modules. A test passes if the tiling does not change results at the bit level (`run.stat`) or in the diagnostics. |
81 | | - ALLOCATE(jpi,jpj) DIMENSION(jpi,jpj) |
82 | | + ALLOCATE(A2D) DIMENSION(A2D) |
83 | | }}} |
84 | | |
85 | | and therefore operations between local working arrays (which have the dimensions of the tile) and global/input arrays (which have the dimensions of either the tile or full domain) require no further changes, unless using `:` subscripts as described below. |
86 | | |
87 | | '''3. Replacing `:` subscripts with a DO loop macro where appropriate''' |
88 | | |
89 | | This is only necessary when step 2 would introduce conformance issues: |
90 | | |
91 | | {{{ |
92 | | #!diff |
93 | | - REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d |
94 | | - REAL(wp), DIMENSION(jpi,jpj) :: z2d |
95 | | - z2d(:,:) = a3d(:,:,1). |
96 | | + REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d |
97 | | + REAL(wp), DIMENSION(A2D) :: z2d |
98 | | + DO_2D_11_11 |
99 | | + z2d(ji,jj) = a3d(ji,jj,1) |
100 | | + END_2D |
101 | | }}} |
102 | | |
103 | | '''4. Looping over tiles at the timestepping level''' |
| 81 | - ALLOCATE(jpi,jpj ) DIMENSION(jpi,jpj ) |
| 82 | + ALLOCATE(ST_2D(nn_hls)) DIMENSION(ST_2D(nn_hls)) |
| 83 | }}} |
| 84 | |
| 85 | These arrays then have the same dimensions as the tile if tiling is used, otherwise they will have the same dimensions as the full domain as before. Furthermore, the tile-sized arrays are declared with lower and upper bounds corresponding to the position of the tile in the full domain. Horizontal indices, for example in DO loops, will therefore apply to both tile- and full-sized arrays: |
| 86 | |
| 87 | {{{ |
| 88 | #!fortran |
| 89 | ! ntsi = 3, ntsj = 7, ntei = 5, ntej = 9 |
| 90 | REAL(wp), DIMENSION(ntsi:ntei,ntsj:ntej) :: z2d |
| 91 | REAL(wp), DIMENSION(jpi,jpj) :: a2d |
| 92 | |
| 93 | DO_2D(1,1,1,1) |
| 94 | z2d(ji,jj) = a2d(ji,jj) |
| 95 | END_2D |
| 96 | }}} |
| 97 | |
| 98 | This substitution is made for local working arrays where possible to minimise memory consumption when using tiling. |
| 99 | No further changes are generally required, except in specific cases described in [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf this document] and other common cases described in steps 5 & 6 below. |
| 100 | |
| 101 | '''3. Looping over tiles at the timestepping level''' |
| 133 | '''5. Replacing `:` subscripts with a DO loop macro where appropriate''' |
| 134 | |
| 135 | This is only necessary when step 2 would introduce conformance issues: |
| 136 | |
| 137 | {{{ |
| 138 | #!diff |
| 139 | - REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d |
| 140 | - REAL(wp), DIMENSION(jpi,jpj) :: z2d |
| 141 | - z2d(:,:) = a3d(:,:,1). |
| 142 | + REAL(wp), DIMENSION(jpi,jpj,jpk) :: a3d |
| 143 | + REAL(wp), DIMENSION(ST_2D(nn_hls)) :: z2d |
| 144 | + DO_2D(1,1,1,1) |
| 145 | + z2d(ji,jj) = a3d(ji,jj,1) |
| 146 | + END_2D |
| 147 | }}} |
| 148 | |
| 149 | '''6. Suppressing code that should not be called more than once per timestep''' |
| 150 | |
| 151 | Examples include ocean.output write statements and initialisation steps outside of an "_ini" routine. |
| 152 | |
172 | | * `A2D`- substitution for ALLOCATE or DIMENSION arguments |
173 | | |
174 | | __Notes__ |
175 | | |
176 | | '''Issues with the tiling implementation''' |
177 | | |
178 | | See the attached [https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/2020WP/HPC-02_Daley_Tiling/Tiling_code_issues.pdf document]. |
179 | | |
180 | | '''Extended haloes''' |
181 | | |
182 | | The tiling trial has also been implemented in the [http://fcm3/projects/NEMO.xm/changeset?reponame=&new=12979%40NEMO%2Fbranches%2FUKMO%2Fdev_r12866_HPC-02_Daley_Tiling_trial_extra_halo&old=12866%40NEMO%2Fbranches%2F2020%2Fdev_r12558_HPC-08_epico_Extra_Halo extended haloes branch]. |
183 | | There are few differences between this and the trunk implementation. |
| 186 | * `ST_*D`- substitutions for ALLOCATE or DIMENSION arguments |
| 187 | * `ST_*DT`- substitutions for ALLOCATE or DIMENSION arguments when the shape of the array is unknown |
| 188 | * Functions |
| 189 | * `is_tile`- Returns 0 if the array has the dimensions of the full domain, else 1 |