New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
WorkingGroups/HPC/Mins_sub_2017_06_13 – NEMO
wiki:WorkingGroups/HPC/Mins_sub_2017_06_13

Version 1 (modified by mocavero, 7 years ago) (diff)

--

NEMO HPC subgroup: Tue 13 Jun 2017

Attending: Claire Levy (CNRS), Mike Bell (Met Office), Tim Graham (Met Office), Matthew Glover (Met Office), Miroslaw Andrejczuk (Met Office), Andy Porter (STFC), Cyril Mazauric (ATOS), Miguel Castrillo (BSC), Silvia Mocavero (CMCC)

1. Single-core performance: integration NEMO/perf_regions tool and cache blocking optimizations, analysis with paraver/extrae (Tim, Silvia, Cyril, ...)

Some preliminary results on two different systems (at MetO and CMCC) have been presented. Some counters seem to be not correctly evaluated (i.e. Cycle with no instructions on MetO system, Branch miss predictions on CMCC system) and have to be checked. A plot showing correlations between different aspects of the performance counters could be very helpful. Moreover, a roofline model analysis could be important to compare real with peak performance on a target system. The analysis should be extended to other GYRE resolutions (i.e. to fit the strong scalability limit) and to other real configurations (i.e. including sea-ice and biochemistry).

Actions: Silvia to upload data on performance counters measurement in the dropbox (done). Tim and Silvia to continue the investigation taking into account suggestions provided during the meeting.

2. NEMO optimization from BULL

Preliminary results on NEMO single-core performance (by running one and multiple instances) on KNL have been presented. After testing the three main memory modes on KNL, cache mode seems to be the best compromise between performance and easy of use. Scalability analysis on KNL and BDW shows that node-to-node performance are quite similar while core-to-core performance are not so good on KNL due to the lower processor frequency. Parallel efficiency on KNL quickly decreases due to a poorly efficient implementation of the MPI library. The test of the hybrid version could be helpful. Single-core performance comparison among KNL and Intel CPUs (from SNB up to SKX) has been also provided. A performance gain (ratio) of 2 is achieved on SKX w.r.t SNB (maybe due to vectorization). The test of the branch without wrk_alloc should better exploit vectorization. The impact of multiple instances execution on SKX is high as on others CPUs. An analysis with the perf_regions tool could be useful.

Actions: Cyril to test OpenMP fine-grained version (branches/2016/dev_r6519_HPC_4) on KNL; Cyril to test the branch without wrk_alloc (branches/2017/dev_r7881_no_wrk_alloc); Cyril to test single-core performance on SKX with the perf_regions tool

3. Hybrid parallelization status

The implementation of the coarse-grained approach on the new ZDF package (written by Gurvan) has been completed and will be committed asap to be tested on different systems. The coarse-grained approach requires few directives to be added and allows to reduce the synchronization overhead (some extra-work is needed to handle I/O diagnostics operations within the OpenMP parallel region). On the other hand the code needs to be re-written to support the new parallelization approach. Info on performance are needed before extending this work to the whole code.

Action: Silvia to commit the ZDF OpenMP version

4. Psyclone-like approach advancements

Andy started to investigate a solution to fit the need to not change the NEMO coding rules and the need to interface Psyclone providing info needed to automatically introduce optimizations (OpenMP, OpenACC, cache blocking). Proposed solution is a parser able to recognize some code structures (e.g. loops) and to generate high-level representation of data structure needed to interface Psyclone. It is important to investigate the interaction between AGRIF and Psyclone preprocessing through the evaluation of the most efficient sequence.

Actions: Andy to continue to investigate the new Psyclone-like approach; Tim to provide Andy an example of the AGRIF pre-processed code

5. HPC chapter of the Development Strategy document

The draft of the chapter has been finalized by Tim and Silvia. Comments and suggestions from Claire, Mike and Andy have been integrated. The draft will be merged with the other chapters to be sent to the Developers Committee by the end of this week. There is the time to do other adjustments until the beginning of july.

6. Next meeting call

Next meeting will be the last week of August/first week of September.

Action: Silvia to send the doodle poll for the next meeting.

Attachments (2)