New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2020WP/HPC-02_Daley_Tiling – NEMO
wiki:2020WP/HPC-02_Daley_Tiling

Name and subject of the action

Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review

Summary

Action Implement 2D tiling (with the LFRA version of NEMO)
PI(S) Daley Calvert, Andrew Coward
Digest Implement 2D tiling to reduce traffic between main memory and L3 cache
Dependencies DO loop macros (2020WP/KERNEL-02_Coward_DoLoopMacros_part1), extended haloes (Italo Epicoco, Seb Masson and Francesca Mele), extension of XIOS to accept 2D tiles of data (Yann Meurdesoif & Seb Masson)
Branch source:/NEMO/branches/{YEAR}/dev_r{REV}_{ACTION_NAME}
Previewer(s) Gurvan Madec
Reviewer(s) Gurvan Madec
Ticket #2365

Description

Implement tiling over horizontal dimensions (i and j).

Branch

dev_r13383_HPC-02_Daley_Tiling

Summary of the tiling method

The processor domain (dimensions jpi x jpj) is split into one or more tiles/subdomains, which are iterated over asynchronously within the timestepping loop.

These tile domains are defined by a new set of indices representing the internal part of the domain (ntsi/ntei/ntsj/ntej). These indices replace those of the processor domain (Nis0/Nie0/Njs0/Nje0) in DO loops, array shape declarations, and other appropriate places.

These changes are implemented via new and existing CPP macros, allowing the tiling to be implemented with relatively few changes at lower levels. A number of temporary workarounds are required to preserve results, but will be removed before the 2020 merge party or as part of the 2021 work plan.

Details of the implementation

Namelist

In dom_nam (domain.F90) a new namelist (namtile) is read and control prints written to ocean.output:

!-----------------------------------------------------------------------
&namtile        !   parameters of the tiling
!-----------------------------------------------------------------------
   ln_tile = .false.     !  Use tiling (T) or not (F)
   nn_ltile_i = 10       !  Length of tiles in i
   nn_ltile_j = 10       !  Length of tiles in j
/

These variables are declared in dom_oce.F90.

Setting the tile domain

A new subroutine dom_tile (domain.F90) sets the values of the tile indices (ntsi/ntei/ntsj/ntej) and the active tile number (ntile).

During initialisation, this subroutine calculates the number of tiles (nijtile) and the tile indices, which are stored in public arrays (ntsi_a/ntei_a/ntsj_a/ntej_a) with lengths equal to nijtile + 1. When dom_tile is otherwise called, these arrays are used to set the tiling indices for the current tile (e.g. ntsi = ntsi_a(ntile)).

ntile = 0 indicates that tiling is disabled, i.e. the full domain is to be used.

dom_tile is called whenever the active tile needs to be set, if tiling needs to be disabled and for initialisation (in dom_init):

CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=3 ) ! Work on tile 3
CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=0 ) ! Work on the full domain
CALL dom_tile( ntsi, ntsj, ntei, ntej )          ! Initialisation (implies ktile=0)

Variables ntsi/ntei/ntsj/ntej/nijtile are declared in par_oce.F90. Variables ntsi_a/ntei_a/ntsj_a/ntej_a are declared in dom_oce.F90

Changes to CPP macros

In do_loop_substitute.h90, the DO loop macros are modified to instead use the tiling indices:

- #define DO_2D(B, T, L, R) DO jj = Njs0-(B), Nje0+(T)   ;   DO ji = Nis0-(L), Nie0+(R)
+ #define DO_2D(B, T, L, R) DO jj = ntsj-(B), ntej+(T)   ;   DO ji = ntsi-(L), ntei+(R)

A number of new macros have been added that replace jpi/jpj in DIMENSION and ALLOCATE statements (see “Local working array declarations” section below):

#define A1Di(H) ntsi-H:ntei+H                # H is equivalent to B/T/L/R in DO loop macros
#define A1Dj(H) ntsj-H:ntej+H
#define A2D(H) A1Di(H),A1Dj(H)

#define A1Di_T(T) (ntsi-nn_hls-1)*T+1:       # T is 1 (= ntsi:) or 0 (= 1:)
#define A1Dj_T(T) (ntsj-nn_hls-1)*T+1:
#define A2D_T(T) A1Di_T(T),A1Dj_T(T)

#define JPK  :                               
#define JPTS  :
#define KJPT  :

The purpose of the A1Di/A1Dj/A2D macros is to allow local working arrays to be declared with the size of the tile (or the full domain, if tiling is not used), minimising memory use. Furthermore, the tile-sized arrays will be declared with lower and upper bounds corresponding to the position of the tile in the full domain. Horizontal indices, for example in DO loops, will therefore apply to both tile- and full-sized arrays:

! ntsi = 3, ntsj = 7, ntei = 5, ntej = 9
REAL(wp), DIMENSION(ntsi:ntei,ntsj:ntej) :: z2d
REAL(wp), DIMENSION(jpi,jpj) :: a2d

DO_2D(1,1,1,1)
  z2d(ji,jj) = a2d(ji,jj)
END_2D

The A1Di_T/A1Dj_T/A2D_T macros are assumed-shape versions of the A1Di/A1Dj/A2D macros, used in dummy array argument declarations where the shape of the actual array argument is inconsistent between calls to the subroutine (see “Local working array declarations” section below).

The JPK/JPTS/KJPT macros are used where explicit-shape declarations have been replaced by assumed-shape declarations. Their only purpose is to preserve readability.

Changes at the timestepping level

The looping over tiles occurs in the stp subroutine. The domain indices for the current tile (ntile /= 0) are set at the start of each iteration. After exiting the loop (and before, during initialisation) the tiling is disabled (ntile == 0):

    ! Loop over tile domains
    DO jtile = 1, nijtile
       IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=jtile )
  
       ! Tiled region of code
    END DO

    IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile=0 )        ! Revert to full domain

The tiled code currently encompasses the active tracers (TRA) region of stp. The loop over tiles must currently be broken into two separate loops to preserve results. This is due to the temporary workaround implemented in tra_adv, which disables tiling for certain options and therefore can change the order in which the tracer trends are updated.

General changes at the module and subroutine levels

DO loop bounds

Each tile has an internal area and overlapping halo, but unlike the MPP domain the halo points are not set by lbc_lnk. The internal part of a tile may therefore be partly overwritten by the halo of an adjacent tile, which will change results. In these cases, DO loops must work on the internal part of the tile only.

This is generally only an issue for persistent variables (i.e. declared at the module level, or with SAVE), e.g. when zeroing an array:

- DO_3D_11_11(1, jpk)
-    akz(ji,jj,jk) = 0._wp
- END_3D
+ DO_3D_00_00(1, jpk)
+    akz(ji,jj,jk) = 0._wp
+ END_3D

or where an array appears on both sides of the assignment:

- DO_2D( 0, 1, 0, 0 )
+ DO_2D( 0, 0, 0, 0 )
     pts(ji,jj,1,jn,Krhs) = pts(ji,jj,1,jn,Krhs) + zfact * ( sbc_tsc_b(ji,jj,jn) + sbc_tsc(ji,jj,jn) ) / e3t(ji,jj,1,Kmm)

In some cases (e.g. tra_bbl_adv in trabbl.F90) DO loops must also work on the outer halo of a processor domain, which requires a slightly different approach:

- DO_2D( 1, 0, 1, 0 )
+ IF( ntsi == Nis0 ) THEN ; isi = 1 ; ELSE ; isi = 0 ; ENDIF    ! Avoid double-counting when using tiling
+ IF( ntsj == Njs0 ) THEN ; isj = 1 ; ELSE ; isj = 0 ; ENDIF
+ DO_2D( isi, 0, isj, 0 )

Local working array declarations

The new CPP macros in do_loop_substitute.h90 replace references to the full domain in explicit shape declarations for local working arrays:

- ALLOCATE(jpi,jpj      ) DIMENSION(jpi,jpj      )
+ ALLOCATE(A2D(nn_hls)  ) DIMENSION(A2D(nn_hls)  )

This allows the arrays to be declared with the size of the tile (or the full domain, if tiling is not used), minimising memory use.

This approach does not work for subroutines that are called with actual array arguments of varying shape; an assumed-shape declaration must be used instead. It is necessary to use the form specifying lower bounds (i.e. DIMENSION(start_i:, start_j:)), as this information is required to correctly index the array when tiling is used. However, these bounds must be passed as additional arguments to the subroutine.

To avoid widespread changes, the subroutine is replaced by a wrapper subroutine that calculates the bounds and passes them to the original subroutine. For example:

- SUBROUTINE eos_insitu( pts, prd, pdep )
-    REAL(wp), DIMENSION(jpi,jpj,jpk,jpts), INTENT(in   ) ::   pts
-    REAL(wp), DIMENSION(jpi,jpj,jpk     ), INTENT(  out) ::   prd
-    REAL(wp), DIMENSION(jpi,jpj,jpk     ), INTENT(in   ) ::   pdep
+ SUBROUTINE eos_insitu( pts, prd, pdep ) 
+    REAL(wp), DIMENSION(:,:,:,:), INTENT(in   ) ::   pts
+    REAL(wp), DIMENSION(:,:,:)  , INTENT(  out) ::   prd
+    REAL(wp), DIMENSION(:,:,:)  , INTENT(in   ) ::   pdep
+    
+    CALL eos_insitu_t( pts, is_tile(pts), prd, is_tile(prd), pdep, is_tile(pdep) )
+ END SUBROUTINE eos_insitu

+ SUBROUTINE eos_insitu_t( pts, ktts, prd, ktrd, pdep, ktdep )
+    INTEGER, INTENT(in   ) ::   ktts, ktrd, ktdep
+    REAL(wp), DIMENSION(A2D_T(ktts) ,JPK,JPTS), INTENT(in   ) ::   pts
+    REAL(wp), DIMENSION(A2D_T(ktrd) ,JPK     ), INTENT(  out) ::   prd
+    REAL(wp), DIMENSION(A2D_T(ktdep),JPK     ), INTENT(in   ) ::   pdep

Here, is_tile is an interface of functions returning 1 or 0 depending on the size of the array, e.g.:

FUNCTION is_tile_2d( pt )
   REAL(wp), DIMENSION(:,:), INTENT(in) ::   pt
   INTEGER :: is_tile_2d
   IF( ln_tile .AND. SIZE(pt, 1) < jpi ) THEN
      is_tile_2d = 1
   ELSE
      is_tile_2d = 0
   ENDIF
END FUNCTION is_tile_2d

and A2D_T is a version of the A2D CPP macro that returns 1:,1: if is_tile(array) = 0 (array is the size of the full domain) or ntsi:,ntsj: if is_tile(array) = 1 (array is the size of the tile). The JPK/JPTS macros each return : and are used to preserve readability.

These wrappers are added in the following locations:

  • IOM/prtctl.F90 (prt_ctl)
  • TRA/eosbn2.F90 (various subroutines)
  • TRA/traldf_iso.F90 (tra_ldf_iso)
  • TRA/traldf_lap_blp.F90 (tra_ldf_lap)
  • TRA/traldf_triad.F90 (tra_ldf_triad)
  • TRA/zpshde.F90 (zps_hde, zps_hde_isf)

: array subscripts

The above array declaration changes may introduce conformance issues when : subscripts are used for indexing, or if no indexing is used at all. These are resolved by instead using an equivalent DO loop:

- REAL(wp), DIMENSION(jpi,jpj,jpk)   :: a3d
- REAL(wp), DIMENSION(jpi,jpj)       :: z2d
- z2d(:,:) = a3d(:,:,1).
+ REAL(wp), DIMENSION(jpi,jpj,jpk)   :: a3d
+ REAL(wp), DIMENSION(A2D(nn_hls)) :: z2d
+ DO_2D(1,1,1,1)
+    z2d(ji,jj) = a3d(ji,jj,1)
+ END_2D

Code called once per timestep

Some code should only be called once per timestep (e.g. ocean.output write statements, initialisation steps, reading data from files) but will be called by each tile. IF statements are used to suppress these calls and generally take the form:

IF( ntile == 0 .OR. ntile == 1 )  THEN             ! Do only on the first tile
   ! ...
ENDIF

IF( ntile == 0 .OR. ntile == nijtile )  THEN       ! Do only on the last tile
   ! ...
ENDIF

Sometimes dom_tile must also be called to temporarily disable the tiling:

IF( ntile == 0 .OR. ntile == 1 )  THEN   ! Do only for the full domain
   itile = ntile
   IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile = 0 )     ! Use full domain
      CALL fld_read( kt, 1, sf_tsd )     !==   read T & S data at kt time step   ==!
   IF( ln_tile ) CALL dom_tile( ntsi, ntsj, ntei, ntej, ktile = itile )  ! Revert to tile
ENDIF

Special cases

diaptr module

At present, between dia_ptr and dia_ptr_hst the following operations are generally performed:

  1. Calculate the zonal integral
  2. Call mpp_sum
  3. Perform additional arithmetic or copy result onto 2D/3D arrays
  4. Call iom_put

The tiling is not easily implemented here due to steps 2 and 4. The diaptr module has therefore been largely restructured to accommodate the tiling, with the added bonus of significantly reducing the number of communications.

  • dia_ptr has been split into dia_ptr_zint (steps 1-2) and dia_ptr_iom (steps 3-4)
    • dia_ptr_zint is called for every tile, dia_ptr_iom is called after dia_ptr_zint has been run for all tiles
  • The zonal integrals that were calculated in dia_ptr are now calculated by dia_ptr_zint and are stored in two new module arrays, pvtr_int and pzon_int, which are used by dia_ptr_iom
  • Zonal integrals are calculated by calling ptr_sj (calculates the integral for the tile) and a new subroutine ptr_sum (accumulates the ptr_sj integrals and calls mpp_sum on the last tile only)
    • The number of communications is reduced from 90 to 13 (data for all basins are exchanged at once)
    • Code relating to mpp_sum has been removed from ptr_sj
    • Pointers and their target arrays have been removed from ptr_sj
  • dia_ptr_hst has been condensed by moving the loop over basins out of the IF statements

dia_ar5_hst subroutine

This code contains iom_put and lbc_lnk calls, which cannot be tiled. The code has therefore been rearranged to separate the transport calculations from the iom_put calls.

The transport calculations are performed for each tile and saved in two new module variables, hstr_adv and hstr_ldf. The iom_put calls are called only on the last tile, and the lbc_lnk calls have been removed as per #2367 (clean-up of communications).

prt_ctl subroutine

This code prints the sums of the input arrays over the processor domain, minus the sums calculated by the previous prt_ctl call. It is possible to implement tiling for this subroutine by aggregating the result over tiles, similar to the approach taken with dia_ptr. However, this was difficult to implement cleanly and bit-level differences in the global sum could not be avoided when using tiling.

I have taken a simpler approach where the sum is output for each tile, rather than the processor domain, if tiling is used. The prt_ctl utility can therefore be used to diagnose differences on a tile-by-tile basis per processor. However, the prt_ctl output cannot be compared between a simulation with tiling and one without.

timing module

The timing utility already works with the tiling; the only change is to ensure that the call iteration counter is incremented once for all tiles.

Trends diagnostics

Tiling has not yet been implemented in these diagnostics, meaning that tiling has to be disabled for the various trd_tra calls throughout the TRA modules.

Rather than add IF statements around each of these calls, I simply set ln_tile = .false. in trd_init if the trends diagnostics are used.

Temporary workarounds

This code has been marked with a ! TEMP: [tiling] comment.

iom_put calls

XIOS does not currently support tiling, so the data must be complete (i.e. all tiles must have finished) at the time of an iom_put call. The general workaround for this is to call iom_put only on the last tile. Additional workarounds are required for some local working arrays, which are not preserved between subsequent calls to the subroutine.

Some code was rearranged in order to calculate the diagnostic and call iom_put at the same time on the last tile (traldf_triad.F90). In other cases this was not possible, so the working arrays were declared with SAVE so that they could be processed by each tile (traadv.F90, tramle.F90). In all cases, it is necessary to declare the working arrays with the size of the full domain (DIMENSION(jpi,jpj)) instead of the tile (DIMENSION(A2D(nn_hls))).

XIOS support for tiling is expected to be fully implemented sometime in December-January. At this point all of the iom_put workarounds can be removed.

A preliminary development branch has been provided for testing by Olga Abramkina. It seems likely that the workarounds would need to stay in place for the merge, as I presume that we would not want to merge based on a development version of XIOS. However, the workarounds could certainly be removed post-merge before the 4.2 release.

lbc_lnk calls

Similar to iom_put calls, lbc_lnk can only be called once all tiles have finished and the data is complete. The general workaround for this is the same as for iom_put; call lbc_lnk only on the last tile.

This is done in a few cases (tranpc.F90, traqsr.F90), but often it was cleaner to simply disable the tiling due to the frequency of lbc_lnk calls. This has been implemented in tra_adv for all schemes except 2nd order centred advection (ln_traadv_cen = .true. with nn_cen_h = 2), in tra_ldf for all bi-laplacian schemes, and for calls to zps_hde. It was also necessary to split the tiling loop in step.F90 so that the first loop ended before tra_adv, in order to preserve results.

Most of the lbc_lnk calls are removed in the nn_hls = 2 case by #2367 (clean-up of communications), which will be merged with the tiling branch to form the basis of this year’s merge party. #2367 also removes several lbc_lnk calls that were used to set the halo points on arrays being passed to iom_put; these have already been removed from the tiling branch.

Tiling will only be used with nn_hls = 2, so most of the remaining workarounds for the lbc_lnk calls should be removed. However there are a number of changes in #2367 that prevent tiling, such as the addition of lbc_lnk calls in tra_adv and zps_hde, as well as expanding the bounds of some DO loops so that they work on the halo (see the “DO loop bounds” section above). Removal of the workarounds therefore depends on whether these issues can be resolved before the merge.

List of new variables and functions (excluding local)

  • Global variables (par_oce.F90, dom_oce.F90)
    • ntsi, ntsj- start index of tile
    • ntei, ntej- end index of tile
    • ntsi_a, ntsj_a- start indices of each tile
    • ntei_a, ntej_a- end indices of each tile
    • ntile- current tile number
    • nijtile- number of tiles
  • Module variables
    • hstr_adv, hstr_ldf (diaar5.F90)- saved transports
    • pvtr_int, pzon_int (diaptr.F90)- zonal integrals
    • jp_msk, jp_vtr (diaptr.F90)- indices for pvtr_int & pzon_int
    • nnpcc (tranpc.F90)- replaces local variable inpcc
  • Namelist namtile (dom_oce.F90)
    • ln_tile- logical control on use of tiling
    • nn_ltile_i, nn_ltile_j- tile length
  • Pre-processor macros (do_loop_substitute.h90)
    • A1Di/A1Dj/A2D- substitutions for ALLOCATE or DIMENSION arguments
    • A1Di_T/A1Dj_T/A2D_T- substitutions for ALLOCATE or DIMENSION arguments when the shape of the array is unknown
    • JPK/JPTS/KJPT- placeholders for : to preserve readability
  • Functions and subroutines
    • dom_tile (domain.F90)- Calculate/set tiling variables
    • is_tile (domutl.F90)- returns 0 if the array has the dimensions of the full domain, else 1
    • ptr_sum (diaptr.F90)- sum ptr_sj zonal integrals over tiles and processors to get total
  • The following subroutines have all been renamed to <SUBROUTINE>_t, where <SUBROUTINE> is now a wrapper function for <SUBROUTINE>_t:
    • eos_insitu, eos_insitu_pot, eos_insitu_2d, rab_3d, rab_2d, bn2, eos_fzp_2d (eosbn2.F90)
    • tra_ldf_iso (traldf_iso.F90)
    • tra_ldf_lap (traldf_lap_blp.F90)
    • tra_ldf_triad (traldf_triad.F90)
    • prt_ctl (prtctl.F90)
    • zps_hde, zps_hde_isf (zpshde.F90)

Documentation updates

Error: Failed to load processor box
No macro or processor named 'box' found

...

Preview

Error: Failed to load processor box
No macro or processor named 'box' found

...

Tests

SETTE

SETTE passes all tests (including with tiling turned on) and compares with the trunk. The Intel compiler (ifort 18.0.5 20180823) is used with XIOS in detached mode. The Cray compiler was not used due to #2394 (new Cray compiler does not work with new way of reading namelists) and due to older Cray compilers raising various errors.

Regular checks

  • Can this change be shown to produce expected impact (option activated)? YES
  • Can this change be shown to have a null impact (option not activated)? YES
  • Results of the required bit comparability tests been run: are there no differences when activating the development? YES (SETTE), NO (other tests)
    • If some differences appear, is reason for the change valid/understood? YES (see known failures)
    • If some differences appear, is the impact as expected on model configurations? YES
  • Is this change expected to preserve all diagnostics? NO (see known failures)
    • If no, is reason for the change valid/understood? YES
  • Are there significant changes in run time/memory? NO

Detailed SETTE results

Tiling is turned on in this report.

Current code is : NEMO/branches/2020/dev_r13383_HPC-02_Daley_Tiling @ r13745  ( last change @ r13745 )

SETTE validation report generated for : 

       NEMO/branches/2020/dev_r13383_HPC-02_Daley_Tiling @ r13745+ (last changed revision)

       on XC40_METO_IFORT arch file


!!---------------1st pass------------------!!

   !----restart----!   
WGYRE_PISCES_ST              run.stat    restartability  passed :  13745+
WGYRE_PISCES_ST              tracer.stat restartability  passed :  13745+
WORCA2_ICE_PISCES_ST         run.stat    restartability  passed :  13745+
WORCA2_ICE_PISCES_ST         tracer.stat restartability  passed :  13745+
WORCA2_OFF_PISCES_ST         tracer.stat restartability  passed :  13745+
WAMM12_ST                    run.stat    restartability  passed :  13745+
WORCA2_SAS_ICE_ST            run.stat    restartability  passed :  13745+
WAGRIF_DEMO_ST               run.stat    restartability  passed :  13745+
WWED025_ST                   run.stat    restartability  passed :  13745+
WISOMIP+_ST                  run.stat    restartability  passed :  13745+
WOVERFLOW_ST                 run.stat    restartability  passed :  13745+
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  13745+
WVORTEX_ST                   run.stat    restartability  passed :  13745+
WICE_AGRIF_ST                run.stat    restartability  passed :  13745+

   !----repro----!   
WGYRE_PISCES_ST              run.stat    reproducibility passed :  13745+
WGYRE_PISCES_ST              tracer.stat reproducibility passed :  13745+
WORCA2_ICE_PISCES_ST         run.stat    reproducibility passed :  13745+
WORCA2_ICE_PISCES_ST         tracer.stat reproducibility passed :  13745+
WORCA2_OFF_PISCES_ST         tracer.stat reproducibility passed :  13745+
WAMM12_ST                    run.stat    reproducibility passed :  13745+
WORCA2_SAS_ICE_ST            run.stat    reproducibility passed :  13745+
WORCA2_ICE_OBS_ST            run.stat    reproducibility passed :  13745+
WAGRIF_DEMO_ST               run.stat    reproducibility passed :  13745+
WWED025_ST                   run.stat    reproducibility passed :  13745+
WISOMIP+_ST                  run.stat    reproducibility passed :  13745+
WVORTEX_ST                   run.stat    reproducibility passed :  13745+
WICE_AGRIF_ST                run.stat    reproducibility passed :  13745+

   !----agrif check----!   
ORCA2 AGRIF vs ORCA2 NOAGRIF run.stat    unchanged  -    passed :  13745+ 13745+

   !----result comparison check----!   

check result differences between :
VALID directory : /home/d00/hadcv/cylc-run/u-bs939/share/sette/output/NEMO_VALIDATION at rev 13745+
and
REFERENCE directory : /home/d00/hadcv/cylc-run/u-bs939/share/sette_ref/output/NEMO_VALIDATION at rev 13688

WGYRE_PISCES_ST       run.stat    files are identical  
WGYRE_PISCES_ST       tracer.stat files are identical  
WORCA2_ICE_PISCES_ST  run.stat    files are identical  
WORCA2_ICE_PISCES_ST  tracer.stat files are identical  
WORCA2_OFF_PISCES_ST  tracer.stat files are identical  
WAMM12_ST             run.stat    files are identical  
WORCA2_SAS_ICE_ST     run.stat    files are identical  
WAGRIF_DEMO_ST        run.stat    files are identical  
WWED025_ST            run.stat    files are identical  
WISOMIP+_ST           run.stat    files are identical  
WVORTEX_ST            run.stat    files are identical  
WICE_AGRIF_ST         run.stat    files are identical  
WOVERFLOW_ST          run.stat    files are identical  
WLOCK_EXCHANGE_ST     run.stat    files are identical

Development testing

A configuration based on ORCA2_ICE_PISCES (without key_si3 or key_top) was used to test code modified by the tiling development. To facilitate cleaner testing, ln_trabbc, ln_trabbl, ln_icebergs, ln_rnf, ln_ssr, ln_tradmp, ln_ldfeiv, ln_traldf_msc, ln_mle, ln_zdfddm and ln_zdfiwm were all set to false. ln_qsr_2bd was used instead of ln_qsr_rgb, nn_fsbc was set to 1, and nn_ice and nn_fwb were set to 0.

Simulations of the tiling branch were run for 10 days with 1-day diagnostic output, for all scientific options relevant to the affected code. Each simulation was repeated with tiling turned on, using square tile sizes of 5 and 50 (the latter being equivalent to one tile over the full domain). Additionally, all simulations including those with tiling were repeated with nn_hls = 2.

run.stat and diagnostic output were compared with equivalent simulations of the trunk and 10-day simulations of the tiling branch that were run in two 5-day submissions (i.e. testing for restartability).

Version 8.3.4 of the Cray compiler was used with XIOS 2.5 revision 1565. A jpni = 4, jpnj = 8 decomposition was used with 6 XIOS processors.

Known failures in development tests

  • ln_trabbl = .true. with nn_bbl_adv > 0 gives different results when using tiling

This is due to a change in the order of computation of pt_rhs in tra_bbl_adv when using the tiling. The loop over i is broken up by the tiles, which causes the up-slope and down-slope contributions to be added in a different order on the intersections between tiles.

  • Some trends diagnostics have slightly different values at one point on the northfold in the development branch

This may be because I removed an lbc_lnk from tra_zdf, similar to the effect of removing the lbc_lnk for utr_bbl/vtr_bbl in tra_bbl. It could be restored if necessary, since we do not use tiling with the trends diagnostics.

  • NEMO fails with a stp_ctl error when using ln_traldf_hor = .true. or ln_zdfosm = .true., in both the development branch and trunk

I assume this is related to my configuration, as I did not have this issue when testing in GYRE.

Untested code

  • Code (e.g. zps_hde_isf) that requires ln_isfcav = .true.
  • Code in diaptr.F90 for the uocetr_vsum_cumul diagnostic (ptr_ci_2d subroutine)

XIOS hangs when trying to output this diagnostic, which may be due to the XIOS library being too old

  • Code in tra_adv that requires ln_wave .AND. ln_sdw = .true.,
  • Code in tra_adv that requires ln_vvl_ztilde .OR. ln_vvl_layer = .true.
  • Code in tra_asm_inc that requires ln_asmdin = .true.

I did not have the required input file (assim_background_state_DI.nc) for ORCA2, although I was able to make an idealised one for testing in GYRE.

  • Code in tra_asm_inc that requires ln_temnofreeze = .true. or ln_seaiceinc = .true.

These logicals are hard coded as false.

Review

Reviewer: Italo Epicoco (CMCC) Date: 30/11/2020

  • Is the proposed methodology now implemented?

YES

  • Are the code changes in agreement with the flowchart defined at preview step?

YES

  • Are the code changes in agreement with list of routines and variables as proposed at preview step?

YES, only the files included in TRA module have been changed

  • Is the in-line documentation accurate and sufficient?

YES

  • Do the code changes comply with NEMO coding standards?

YES

  • Is the development documented with sufficient details for others to understand the impact of the change?

YES

  • Is the project literature (manual, guide, web, …) now updated or completed following the proposed summary in preview section?

NO. The documentation about tiling should refer to the usage of the ln_tile, nn_ltile_i and nn_ltile_j variabiles added in the namelist. Moreover some hints to the user should be given to drive an efficient choice of the values for nn_ltile_i and nn_ltile_j.

  • Is the review fully successful?

YES

Last modified 3 years ago Last modified on 2020-11-30T20:05:57+01:00

Attachments (5)

Download all attachments as: .zip