| 1 | [[PageOutline]] |
| 2 | Last edited [[Timestamp]] |
| 3 | |
| 4 | [[BR]] |
| 5 | |
| 6 | '''Author''' : acc |
| 7 | |
| 8 | '''ticket''' : #679 |
| 9 | |
| 10 | '''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/DEV_1879_mpp_sca DEV_1879_mpp_sca ] |
| 11 | ---- |
| 12 | |
| 13 | === Description === |
| 14 | |
| 15 | This branch introduces code to minimise the use of the mpi_allgather operation during the north-fold exchanges. PRACE investigators found significant |
| 16 | performance gains with similar changes when using large numbers of processors. [[BR]] |
| 17 | |
| 18 | '''Method'''[[BR]] |
| 19 | A new routine is introduced into opa.F90 (opa_northcomms) that uses the existing method to work out which |
| 20 | other processors are directly involved in the north fold exchanges. It does this for T,U,V,F points and uses the |
| 21 | masks so that the neighbours won't be included if the boundary is wholly land. |
| 22 | |
| 23 | Once those lists have been established, the mpp_lbc_north routines (in lib_mpp.F90) will employ them to only exchange with |
| 24 | "active" neighbours. These exchanges populate the same ztab array that the mpi_allgather method uses and then calls the |
| 25 | lbc_nfd routine to carry out the fold operation. The difference is that instead of filling the whole ztab array (which requires |
| 26 | every northern row processor to communicate with every other northern row processor), only those gridcells that will be folded onto an individual |
| 27 | processor's domain are exchanged. The reduction in communication should lead to performance gains when using large numbers of |
| 28 | processors. |
| 29 | |
| 30 | The current implementation has been successfully tested in standard ORCA2 and ORCA1 configurations. Test results are identical with and without the modifications. |
| 31 | For these configurations, there is no degradation in performance. |
| 32 | |
| 33 | Still to do: |
| 34 | |
| 35 | 1. Work out how to deal with 'I' points |
| 36 | 2. Check that the method works successfully when land-only regions have been discarded (i.e. jpnij /= jpni*jpnj) |
| 37 | 3. Demonstrate and quantify the benefit with ORCA025 and ORCA12. |
| 38 | |
| 39 | |
| 40 | |
| 41 | |
| 42 | |
| 43 | ---- |
| 44 | === Testing === |
| 45 | Testing could consider (where appropriate) other configurations in addition to NVTK]. |
| 46 | |
| 47 | ||NVTK Tested||!'''NO!'''|| |
| 48 | ||Other model configurations||YES|| |
| 49 | ||Processor configurations tested||ORCA2:2x2 and 8X4; ORCA1: 8x4 || |
| 50 | ||If adding new functionality please confirm that the [[BR]]New code doesn't change results when it is switched off [[BR]]and !''works!'' when switched on||YES|| |
| 51 | |
| 52 | === Bit Comparability === |
| 53 | ||Does this change preserve answers in your tested standard configurations (to the last bit) ?||!'''YES/NO !'''|| |
| 54 | ||Does this change bit compare across various processor configurations. (1xM, Nx1 and MxN are recommended)||!'''YES/NO!'''|| |
| 55 | ||Is this change expected to preserve answers in all possible model configurations?||!'''YES/NO!'''|| |
| 56 | ||Is this change expected to preserve all diagnostics? [[BR]]!,,!''Preserving answers in model runs does not necessarily imply preserved diagnostics. !''||!'''YES/NO!'''|| |
| 57 | |
| 58 | If you answered !'''NO!''' to any of the above, please provide further details: |
| 59 | |
| 60 | * Which routine(s) are causing the difference? |
| 61 | * Why the changes are not protected by a logical switch or new section-version |
| 62 | * What is needed to achieve regression with the previous model release (e.g. a regression branch, hand-edits etc). If this is not possible, explain why not. |
| 63 | * What do you expect to see occur in the test harness jobs? |
| 64 | * Which diagnostics have you altered and why have they changed?Please add details here........ |
| 65 | |
| 66 | ---- |
| 67 | === System Changes === |
| 68 | ||Does your change alter namelists?||NO|| |
| 69 | ||Does your change require a change in compiler options?||NO|| |
| 70 | |
| 71 | ---- |
| 72 | === Resources === |
| 73 | !''Please !''summarize!'' any changes in runtime or memory use caused by this change......!'' |
| 74 | |
| 75 | ---- |
| 76 | === IPR issues === |
| 77 | ||Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO?||YES|| |
| 78 | |