New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2020WP/HPC-07_mocavero_mpi3 – NEMO
wiki:2020WP/HPC-07_mocavero_mpi3

Version 5 (modified by francesca, 3 years ago) (diff)

--

Name and subject of the action

Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review

Summary

Action MPI3 collective neighbours communications instead of point to point communications
PI(S) Silvia Mocavero and Italo Epicoco
Digest MPI-3 provides new neighbourhood collective operations that allow to perform halo exchange with a single MPI communication call.
Dependencies If any
Branch dev_r13296_HPC-07_mocavero_mpi3
Previewer(s) Mirek Andrejczuk
Reviewer(s) Mirek Andrejczuk
Ticket #2496

Description

This is the continuation of the work started in 2019 (HPC-12_Mocavero_mpi3).

MPI-3 provides new neighbourhood collective operations (i.e. MPI_Neighbor_allgather and MPI_Neighbor_alltoall) that allow to perform halo exchange with a single MPI communication call.

These collective communications have been integrated and tested on the NEMO code during 2019 in order to evaluate the code performance compared with the traditional point-to-point halo exchange currently implemented in NEMO. The first version of the implementation uses a cartesian topology, so it does not support 9-point stencil neither land domain exclusion and the north fold is handled as usual. The use of new collective communications has been tested on a representative kernel implementing the FCT advection scheme.

Preliminary tests show an improvement within a range of 18%-32% on the GYRE_PISCES configuration (with nn_GYRE=200), depending on the allocated number of cores. The output accuracy is preserved.

During 2020 we intend to integrate the graph topology to support the routines that use a 9-point stencil, the land domain exclusion and the north fold exchanges through MPI3 neighbourhood collective communications.

Implementation

Step 1: alignment of the dev_r13296_HPC-07_mocavero_mpi3 branch with the new trunk (after July merge party) (done)

Step 2: integration of graph topology to support halo exchange for both 5-points (when exchange with only north, south, east and west processes is enough to preserve data dependency) and 9-points stencil (when exchange with diagonal processes is needed) computation. Land domains exclusion is also handled due to the flexibility of graph topology. A parameter in lbc_lnk mpi3 routine call allows to choose between 5-points or 9-points exchange (done)

Step 3: add lbc_lnk mpi3 in traadv_fct.F90 (5-points stencil) and icedyn_rhg_evp.F90 (9-points stencil) files to perform comparability tests. Sette tests will be executed, also activating land domain exclusion

Step 4: perform performance tests to evaluate the gain in both 5-points and 9-points stencil (done)

Step 5: replacement of point-to-point communications with collective ones within the NEMO code. The choice between 5-points and 9-points exchange requires a data dependency analysis. The replacement will be performed in three steps:

step 5.1: all the lbc_lnk will be replaced with 9-points mpi3 exchange (a key_mpi3 will be introduced to preserve the old point-to-point exchange version to be used on architectures where MPI3 is not supported or it does not provide a performance gain) (done)
step 5.2: 5-points stencil exchange is introduced when data dependency is satisfied without diagonal exchange (in 2021)
step 5.3: key_mpi3 will be removed (when the ST will confirm that implementation is more performant)

Documentation updates

Error: Failed to load processor box
No macro or processor named 'box' found

...

Preview

Error: Failed to load processor box
No macro or processor named 'box' found

...

Tests

Error: Failed to load processor box
No macro or processor named 'box' found

...

Review

Error: Failed to load processor box
No macro or processor named 'box' found

...