Form 64 (in 2017WP/HPC-06_Silvia Mocavero_hybrid)

Saved Values

in subcontext 'abstract'

implementation: 'Step 1: the new ZDF package, rewritten by Gurvan will be considered as test case. Introducing the ZDF manager that calls all the vertical physics packages and moving lateral boundary conditions update and restarts writing within the manager execution (at the end), the new package is compliant with the coarse-grained approach Step 2: a single OpenMP parallel region that includes all the ZDF packages (only a synchronization is needed); tiling of the MPI region along the horizontal direction (one or more tiles for each thread can be tested); natural decomposition in horizontal direction (no data dependencies) Step 3: implementantion of a module for the OpenMP horizontal decomposition; introduction of OpenMP directives within the ZDF manager; tests of restartability and reproducibility of the GYRE_PISCES configuration with different number of MPI/task and OpenMP threads; comparison of tracer.stat and restart files with the original version (rev 8279) Step 4: definition of a test plan: (i) selection of one or more target machines (ii) parallel efficiency evaluation by fixing the total number of nodes/cores and changing the number of MPI tasks and OpenMP threads; the GYRE_PISCES configuration will be tested at different resolutions taking into account that the scalability limit increases with the resolution as well as the communication overhead increases with the number of MPI tasks. ' by mocavero2017-09-29T11:55:13+02:00
manual: 'Using part 1 and 2, define the summary of changes to be done in the NEMO reference manual (tex files), and in the content of web pages.' by mocavero2017-06-22T19:01:49+02:00
description: 'The goal of this action is to evaluate the scalability improvement due to the introduction od a second level of parallelism in NEMO based on the MPI+X model (where X is OpenMP). The action started in 2016 with the implementation of a fine-grained version based on parallelization at loop level. The limited gain achieved with this approach led us to consider a new coarse-grained approach. This approach requires a code restructuring but is less invasive (in terms of number of OMP directives to be introduced within the code) and more performant since the OpenMP parallel region is extended. Step 1: selection of a representative test case at kernel level Step 2: design of the coarse-grained approach Step 3: implementation of the test case and restartability and reproducibility tests Step 4: evaluation of the performance improvement compared with MPI only version on different architectures Step 5: discussion with the ST about the impact on code restructuring' by mocavero2017-06-22T19:01:49+02:00

Change History

Changed on 2017-09-29T11:55:13+02:00 by mocavero:

  • implementation changed from
    Step 1: the new ZDF package, rewritten by Gurvan will be considered as test case. Introducing the ZDF manager that calls all the vertical physics packages and moving lateral boundary conditions update and restarts writing within the manager execution (at the end), the new package is compliant with the coarse-grained approach Step 2: a single OpenMP parallel region that includes all the ZDF packages (only a synchronization is needed); tiling of the MPI region along the horizontal direction (one or more tiles for each thread can be tested); natural decomposition in horizontal direction (no data dependencies) Step 3: implementantion of a module for the OpenMP horizontal decomposition; introduction of OpenMP directives within the ZDF manager; tests of restartability and reproducibility of the GYRE configuration with different number of MPI/task and OpenMP threads; comparison of tracer.stat and restart files with the original version (rev 8279) Step 4: definition of a test plan: (i) selection of one or more target machines (ii) parallel efficiency evaluation by fixing the total number of nodes/cores and changing the number of MPI tasks and OpenMP threads; the GYRE configuration will be tested at different resolutions taking into account that the scalability limit increases with the resolution as well as the communication overhead increases with the number of MPI tasks.
    to
    Step 1: the new ZDF package, rewritten by Gurvan will be considered as test case. Introducing the ZDF manager that calls all the vertical physics packages and moving lateral boundary conditions update and restarts writing within the manager execution (at the end), the new package is compliant with the coarse-grained approach Step 2: a single OpenMP parallel region that includes all the ZDF packages (only a synchronization is needed); tiling of the MPI region along the horizontal direction (one or more tiles for each thread can be tested); natural decomposition in horizontal direction (no data dependencies) Step 3: implementantion of a module for the OpenMP horizontal decomposition; introduction of OpenMP directives within the ZDF manager; tests of restartability and reproducibility of the GYRE_PISCES configuration with different number of MPI/task and OpenMP threads; comparison of tracer.stat and restart files with the original version (rev 8279) Step 4: definition of a test plan: (i) selection of one or more target machines (ii) parallel efficiency evaluation by fixing the total number of nodes/cores and changing the number of MPI tasks and OpenMP threads; the GYRE_PISCES configuration will be tested at different resolutions taking into account that the scalability limit increases with the resolution as well as the communication overhead increases with the number of MPI tasks.

Changed on 2017-07-04T17:49:35+02:00 by mocavero:

Changed on 2017-07-04T17:49:35+02:00 by mocavero:

  • implementation changed from
    Step 1: the new ZDF package, rewritten by Gurvan will be considered as test case. Introducing the ZDF manager that calls all the vertical physics packages and moving lateral boundary conditions update and restarts writing within the manager execution (at the end), the new package is compliant with the coarse-grained approach Step 2: a single OpenMP parallel region that includes all the ZDF packages (no synchronization needed); tiling of the MPI region along the horizontal direction (one or more tiles for each thread can be tested); natural decomposition in horizontal direction (no data dependencies) Step 3: implementantion of a module for the OpenMP horizontal decomposition; introduction of OpenMP directives within the ZDF manager; tests of restartability and reproducibility of the GYRE configuration with different number of MPI/task and OpenMP threads; comparison of tracer.stat and restart files with the original version Step 4: definition of a test plan: (i) selection of one or more target machines (ii) parallel efficiency evaluation by fixing the total number of nodes/cores and changing the number of MPI tasks and OpenMP threads; the GYRE configuration will be tested at different resolutions taking into account that the scalability limit increases with the resolution as well as the communication overhead increases with the number of MPI tasks.
    to
    Step 1: the new ZDF package, rewritten by Gurvan will be considered as test case. Introducing the ZDF manager that calls all the vertical physics packages and moving lateral boundary conditions update and restarts writing within the manager execution (at the end), the new package is compliant with the coarse-grained approach Step 2: a single OpenMP parallel region that includes all the ZDF packages (only a synchronization is needed); tiling of the MPI region along the horizontal direction (one or more tiles for each thread can be tested); natural decomposition in horizontal direction (no data dependencies) Step 3: implementantion of a module for the OpenMP horizontal decomposition; introduction of OpenMP directives within the ZDF manager; tests of restartability and reproducibility of the GYRE configuration with different number of MPI/task and OpenMP threads; comparison of tracer.stat and restart files with the original version (rev 8279) Step 4: definition of a test plan: (i) selection of one or more target machines (ii) parallel efficiency evaluation by fixing the total number of nodes/cores and changing the number of MPI tasks and OpenMP threads; the GYRE configuration will be tested at different resolutions taking into account that the scalability limit increases with the resolution as well as the communication overhead increases with the number of MPI tasks.

Changed on 2017-06-22T19:01:49+02:00 by mocavero: