New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2019WP/HPC-12_Mocavero_mpi3 – NEMO
wiki:2019WP/HPC-12_Mocavero_mpi3

Name and subject of the action

Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review

Summary

Analysis of scalability improvement using MPI3 new communications (e.g. collective neighbours communications) instead of point to point communications.

ticket: #2011

Preview

Mirek Andrejczuk

Plan outlined in ticket #2011 is OK. Most likely changes to the code will be limited to LBC/lib_mpp.F90. 
I'm happy to test the changes in MO operational configurations. May be worth considering implementation 
in which changing the number of halo points is easy.

Tests

Error: Failed to load processor box
No macro or processor named 'box' found
The change improves the communication time. 
Preliminary tests show an improvement within a range of 18%-32% 
on the GYRE_PISCES configuration (with nn_GYRE=200), 
depending on the allocated number of cores

Results of the required bit comparability tests: No differences between outputs

This change preserves all diagnostics

Review

Coding standard OK.

No need for changes in documentation.

The development is using mpi3 collective neighbours communications functionality for halo exchange. It has been implemented for the case without land suppression and in one routine- traadv_fct.F90 only. I think this is much needed development. Initial results are very encouraging. I see two major issues with the current implementation.

1) Is MPI3 standard common now and all centres/users of NEMO will be able to compile and run the model (looks it was introduced in 2012, but I'n not sure if it’s implemented in all recent MPI distributions),

Many MPI distributions (Intel-MPI, OpenMPI, MPICH) are compliant to MPI3 standard. If any centres do not use these distributions, it would not be so hard to upgrade MPI libraries.

2) Because the implementation is limited to only one routine it will be difficult to test impact on performance, moreover in the case of problems debugging of the code will be difficult because for one routine different approach to exchange messages will be used.

The call to MPI3 neighbourhood collectives has been added only to one routine in order to test the improvement before extending the approach to all the routines that use 5-point stencil. The impact on performance due to the MPI3 halo exchange can be evaluated by measuring the time spent by the routine execution.

Those two issues should be raised with the whole system team for discussion and potentially all centres should test if the code compiles/runs without any issues, BEFORE the decision to merge this development into the trunk is made. It would be good to have a logical flag controling use of this option (for now in traadv_fct.F90). This is a change to a major component of NEMO and not only SETTE tests and test cases, but also operational configurations used in the Centres should be tested if possible.

A compiler key can be easily introduced to control the use of new communications, depending on performance achieved on different systems and configurations.

I personally would propose a different strategy, which requires more development before merging this change into the trunk, but avoids potential problem outlined in point 1 and addresses point 2.

After implementing exchange for land suppression, a key_mpi3 should be introduce. If defined collective neighbours communications will be used for hallo exchange if not, what we have now will be used. There would be no need to name the newly added routines differently, on preprocessing stage exchange method will be set. My understanding is that the argument list in lbc_lnk will be exactly the same for both (new/old) cases. We would have two different approaches for halo exchange depending on the key key_mpi3 (in the same way XIOS1 and XIOS2 interfaces existed together in the past). By doing that number of changes in source code will be limited to files in LBC only. And if we are not happy with the performance of MPI3 we can continue using old approach until mpi3 (or newer standard) performs sufficiently good. If we are happy with the performance, we can just remove old method and key_mpi3 (after warning users about the change in advance).

Even if merging the current implementation of MPI3 support, which is partial, could be useful to test the new communications, I agree that land support is an important factor, so we can postpone the merge to 2020, when the graph topology will be implemented to support land suppression and 9-point stencil exchanges

Error: Failed to load processor box
No macro or processor named 'box' found
Last modified 4 years ago Last modified on 2019-11-29T22:57:59+01:00