Version 3 (modified by mocavero, 3 years ago) (diff)

NEMO HPC WG: Mon 20 Mar 2017

Attending: Claire Levy, Tim Graham, Maff Glover, Andy Porter, Miguel Castrillo, Oriol Tintò, Martin Schreiber, Silvia Mocaverro, Fedreric Dupont

1. Actions from previous meetings

1.1 Tim to check the impact of various schemes on the GYRE configuration

No progress on this point

1.2 Silvia to explore the memory leaks using the Allinea tool

Summary on the activity from the subgroup: Silvia performed some tests on the 3.6_stable version with and without using XIOS in december. Tests have been performed with XIOS1 and showed an increasing in memory allocation when the 3.6 version was executed with XIOS in attached mode. During a meeting of the subgroup in january, Tim suggested to test the code with XIOS2. The analysis has been carried out after updating the code to the last revision of the 3.6 stable version and shows that the execution is not affected by memory leaks. The analysis has been extended also to the same revision of the code executed without XIOS and with XIOS1 and these last tests have confirmed the results achieved with XIOS2. A detailed analysis is needed to understand the changes between the two revisions of the 3.6 stable in order to better understand the different behavior.

Action: Silvia to analyze the code modifications between the two versions.

2. Investigations of single core performance (Martin, Silvia, Tim)

Martin provides an update on the activity: after the integration of the perf_regions tool within NEMO, Tim wrote a script to extract the metrics measurement. We have info on timing and on cache performance for each routine. Even if there is a mismatch between the cache performance and the measured bandwidth, some preliminary results could be presented during the meeting in Barcelona

Action: Martin to provide some slides about single-core activity to be integrated in the more appropriate talk (to be decided in the next days).

3. Updates at the NEMO merge party and to the NEMO trunk (Silvia)

The hybrid parallel version has not been integrated in the trunk since some ST developers have expressed concerns about the code complexity, also considering the limited performance gain introduced by the OpenMP approach. The current OpenMP parallelization is fine-grain. Alternative parallelization approaches (e.g. coarse-grain, tiling) and their impacts on code performance and readability will be tested with a continuous feedback from the ST. Silvia has shared with Andy some information about the OpenMP parallelization to have some feedback and suggestions from him, due to his experience on this. Dmitry is working on the Intel compiler workshare issue, which should reduce the code changes complexity for arrays copy/initialization operations. Tim suggests to create a new branch with a single kernel (e.g. the advection scheme) and to test the different approaches on it. Silvia suggests to consider two kind of kernels (with and without halo exchange). An OpenMP parallel version of kernels developed by using the Psyclone-lite approach will be also considered.

Action: Silvia to create the new development branch, starting from the new trunk and to develop the fine-grain, coarse-grain and tiling versions of the target kernels. Andy to develop the Psyclone-lite version

4. PSyclone/NEMO update (Andy)

Since the Psyclone approach could be a bit invasive for the NEMO code, a new approach (based on DSL concept) has been tested within the IS-ENES2 project by STFC and is available on a github repository (DSL project). It is based on the development of a separate kernel for each loop on the grid points in the advection kernel implemented by CMCC, so that OpenMP, or OpenACC, or cache tiling can be implemented at kernel level to provide performance portability. Frederic suggests to have a review of the code changes from the ST to know if they are acceptable. The approach could be presented during the meeting in Barcelona to have a feedback from the ST. Martin says that it could be great to have some info on the performance improvement of this approach (measured with the perf_regions tool), for example on cache performance. The discussion between the ST and the HPC-WG could continue during the ST videoconfs by inviting HPC people to these meetings.

Action: Andy to provide some info on the approach and the performance improvement

5. Issues to be discussed in Barcelona (all)

There are three talks scheduled for the HPC session of the meeting in Barcelona: a first talk on the HPC-WG activities, with a focus on the main results and the discussion on the readability/performance trade-off; a second talk on the BGC HPC issues and the third talk on single-precision work, proposed by BSC. Claire suggests to spend no much time on the presentation of the work done because the main goal of the meeting is to discuss about questions on which we have a consensus and open questions. A list of the questions should be sent to the participants in order to have a fruitful discussion during the meeting. The talks will briefly summarize the work done and will be an introduction to the discussion. Mike has prepared a draft of the questions to be sent to the participants.

Action: Tim to discuss with Mike about the drafted list of questions and to iterate with the other people

6. Date for next meeting

Silvia proposed to have the next meeting in June. Attendees are available to attend the meeting in June.

Action: Mike to send the doodle poll