New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
ticket/0679_mpp_sca (diff) – NEMO

Changes between Initial Version and Version 1 of ticket/0679_mpp_sca


Ignore:
Timestamp:
2010-06-10T12:45:14+02:00 (14 years ago)
Author:
acc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ticket/0679_mpp_sca

    v1 v1  
     1[[PageOutline]] 
     2Last edited [[Timestamp]] 
     3 
     4[[BR]] 
     5 
     6'''Author''' : acc  
     7 
     8'''ticket''' : #679 
     9 
     10'''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/DEV_1879_mpp_sca     DEV_1879_mpp_sca ]  
     11---- 
     12 
     13=== Description === 
     14 
     15This branch introduces code to minimise the use of the mpi_allgather operation during the north-fold exchanges. PRACE investigators found significant  
     16performance gains with similar changes when using large numbers of processors. [[BR]] 
     17 
     18'''Method'''[[BR]] 
     19A new routine is introduced into opa.F90 (opa_northcomms) that uses the existing method to work out which 
     20other processors are directly involved in the north fold exchanges. It does this for T,U,V,F points and uses the 
     21masks so that the neighbours won't be included if the boundary is wholly land.  
     22 
     23Once those lists have been established, the mpp_lbc_north routines (in lib_mpp.F90) will employ them to only exchange with 
     24"active" neighbours. These exchanges populate the same ztab array that the mpi_allgather method uses and then calls the 
     25lbc_nfd routine to carry out the fold operation. The difference is that instead of filling the whole ztab array (which requires 
     26every northern row processor to communicate with every other northern row processor), only those gridcells that will be folded onto an individual  
     27processor's domain are exchanged. The reduction in communication should lead to performance gains when using large numbers of 
     28processors. 
     29 
     30The current implementation has been successfully tested in standard ORCA2 and ORCA1 configurations. Test results are identical with and without the modifications.  
     31For these configurations, there is no degradation in performance. 
     32 
     33Still to do: 
     34 
     35 1. Work out how to deal with 'I' points 
     36 2. Check that the method works successfully when land-only regions have been discarded (i.e. jpnij /= jpni*jpnj) 
     37 3. Demonstrate and quantify the benefit with ORCA025 and ORCA12. 
     38 
     39 
     40 
     41 
     42 
     43---- 
     44=== Testing === 
     45Testing could consider (where appropriate) other configurations in addition to NVTK]. 
     46 
     47||NVTK Tested||!'''NO!'''|| 
     48||Other model configurations||YES|| 
     49||Processor configurations tested||ORCA2:2x2 and 8X4; ORCA1: 8x4 || 
     50||If adding new functionality please confirm that the [[BR]]New code doesn't change results when it is switched off [[BR]]and !''works!'' when switched on||YES|| 
     51 
     52=== Bit Comparability === 
     53||Does this change preserve answers in your tested standard configurations (to the last bit) ?||!'''YES/NO !'''|| 
     54||Does this change bit compare across various processor configurations. (1xM, Nx1 and MxN are recommended)||!'''YES/NO!'''|| 
     55||Is this change expected to preserve answers in all possible model configurations?||!'''YES/NO!'''|| 
     56||Is this change expected to preserve all diagnostics? [[BR]]!,,!''Preserving answers in model runs does not necessarily imply preserved diagnostics. !''||!'''YES/NO!'''|| 
     57 
     58If you answered !'''NO!''' to any of the above, please provide further details: 
     59 
     60 * Which routine(s) are causing the difference? 
     61 * Why the changes are not protected by a logical switch or new section-version 
     62 * What is needed to achieve regression with the previous model release (e.g. a regression branch, hand-edits etc). If this is not possible, explain why not. 
     63 * What do you expect to see occur in the test harness jobs? 
     64 * Which diagnostics have you altered and why have they changed?Please add details here........ 
     65 
     66---- 
     67=== System Changes === 
     68||Does your change alter namelists?||NO|| 
     69||Does your change require a change in compiler options?||NO|| 
     70 
     71---- 
     72=== Resources === 
     73!''Please !''summarize!'' any changes in runtime or memory use caused by this change......!'' 
     74 
     75---- 
     76=== IPR issues === 
     77||Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO?||YES|| 
     78