Changes between Version 11 and Version 12 of 2019WP/HPC-12_Mocavero_mpi3

2019-11-29T22:38:49+01:00 (8 months ago)


  • 2019WP/HPC-12_Mocavero_mpi3

    v11 v12  
    49491)      Is MPI3 standard common now and all centres/users of NEMO will be able to compile and run the model (looks it was introduced in 2012, but I'n not sure if it’s implemented in all recent MPI distributions), 
     51''Many MPI distributions (Intel-MPI, OpenMPI, MPICH) are compliant to MPI3 standard. If any centres do not use these distributions, it would not be so hard to upgrade MPI libraries.'' 
    51532)      Because the implementation is limited to only one routine it will be difficult to test impact on performance, moreover in the case of problems debugging of the code will be difficult because for one routine different approach to exchange messages will be used.  
     55''The call to MPI3 neighbourhood collectives has been added only to one routine in order to test improvement before changing the whole code. The approach can be easily extended to all the routines that use 5-point stencil. The impact on performance due to the MPI3 halo exchange can be evaluated by measuring the time spent by the routine execution.'' 
    5658Those two issues should be raised with the whole system team for discussion and potentially all centres should test if the code compiles/runs without any issues, BEFORE the decision to merge this development into the trunk is made. It would be good to have a logical flag controling use of this option (for now in traadv_fct.F90). This is a change to a major component of NEMO and not only SETTE tests and test cases, but also operational configurations used in the Centres should be tested if possible. 
     60''A compiler key can be easily introduced to control the use of new communications, depending on performance achieved on different systems and configurations.'' 
    5962I personally would propose a different strategy, which requires more development before merging this change into the trunk, but avoids potential problem outlined in point 1 and addresses point 2. 
    6265After implementing exchange for land suppression, a key_mpi3 should be introduce. If defined collective neighbours communications will be used for hallo exchange if not, what we have now will be used. There would be no need to name the newly added routines differently, on preprocessing stage exchange method will be set. My understanding is that the argument list in lbc_lnk will be exactly the same for both (new/old) cases. We would have two different approaches for halo exchange depending on the key key_mpi3 (in the same way XIOS1 and XIOS2 interfaces existed together in the past).  By doing that number of changes in source code will be limited to files in LBC only. And if we are not happy with the performance of MPI3 we can continue using old approach until mpi3 (or newer standard) performs sufficiently good. If we are happy with the performance, we can just remove old method and key_mpi3 (after warning users about the change in advance). 
     67''Even if merging the current implementation of MPI3 support, which is partial, could be useful to test the new communications, I agree that land support is an important factor, so we can postpone the merge to 2020, when the graph topology will be implemented to support land suppression and 9-point stencil exchanges'' 
    6469{{{#!box help