Version 3 (modified by francesca, 4 months ago) (diff)

Name and subject of the action

Last edition: 10/28/20 10:58:13 by andmirek

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review

Summary

Action MPI3 collective neighbours communications instead of point to point communications
PI(S) Silvia Mocavero and Italo Epicoco
Digest MPI-3 provides new neighbourhood collective operations that allow to perform halo exchange with a single MPI communication call.
Dependencies If any
Branch dev_r13296_HPC-07_mocavero_mpi3
Previewer(s) Mirek Andrejczuk
Reviewer(s) Mirek Andrejczuk
Ticket #2496

Description

This is the continuation of the work started in 2019 (HPC-12_Mocavero_mpi3).

MPI-3 provides new neighbourhood collective operations (i.e. MPI_Neighbor_allgather and MPI_Neighbor_alltoall) that allow to perform halo exchange with a single MPI communication call.

These collective communications have been integrated and tested on the NEMO code during 2019 in order to evaluate the code performance compared with the traditional point-to-point halo exchange currently implemented in NEMO. The first version of the implementation uses a cartesian topology, so it does not support 9-point stencil neither land domain exclusion and the north fold is handled as usual. The use of new collective communications has been tested on a representative kernel implementing the FCT advection scheme.

Preliminary tests show an improvement within a range of 18%-32% on the GYRE_PISCES configuration (with nn_GYRE=200), depending on the allocated number of cores. The output accuracy is preserved.

During 2020 we intend to integrate the graph topology to support the routines that use a 9-point stencil, the land domain exclusion and the north fold exchanges through MPI3 neighbourhood collective communications.

Implementation

Step 1: alignment of the dev_r11470_HPC_12_mpi3 branch with the new trunk

Step 2: integration of graph topology to allow each process to exchange halo with diagonal processes (when 9-point stencil is needed) or with non-neighbours processes (when land domain exclusion is activated or north fold has to be handled)

Step 3: replacement of point-to-point communications with collective ones within the NEMO code

Documentation updates

Using previous parts, define the main changes to be done in the NEMO literature (manuals, guide, web pages, …).

Preview

Since the preview step must be completed before the PI starts the coding, the previewer(s) answers are expected to be completed within the two weeks after the PI has sent the request to the previewer(s).
Then an iterative process should take place between PI and previewer(s) in order to find a consensus

Possible bottlenecks:

  • the methodology
  • the flowchart and list of routines to be changed
  • the new list of variables wrt coding rules
  • the summary of updates in literature

Once an agreement has been reached, preview is ended and the PI can start the development into his branch.

Tests

Once the development is done, the PI should complete the tests section below and after ask the reviewers to start their review.

This part should contain the detailed results of SETTE tests (restartability and reproducibility for each of the reference configuration) and detailed results of restartability and reproducibility when the option is activated on specified configurations used for this test

Regular checks:

  • Can this change be shown to produce expected impact (option activated)?
  • Can this change be shown to have a null impact (option not activated)?
  • Results of the required bit comparability tests been run: are there no differences when activating the development?
  • If some differences appear, is reason for the change valid/understood?
  • If some differences appear, is the impact as expected on model configurations?
  • Is this change expected to preserve all diagnostics?
  • If no, is reason for the change valid/understood?
  • Are there significant changes in run time/memory?

Review

A successful review is needed to schedule the merge of this development into the future NEMO release during next Merge Party (usually in November).

Assessments:

  • Is the proposed methodology now implemented?
  • Are the code changes in agreement with the flowchart defined at preview step?
  • Are the code changes in agreement with list of routines and variables as proposed at preview step?
    If, not, are the discrepancies acceptable?
  • Is the in-line documentation accurate and sufficient?
  • Do the code changes comply with NEMO coding standards?
  • Is the development documented with sufficient details for others to understand the impact of the change?
  • Is the project literature (manual, guide, web, …) now updated or completed following the proposed summary in preview section?

Finding:

Is the review fully successful? If not, please indicate what is still missing


Once review is successful, the development must be scheduled for merge during next Merge Party Meeting.