Version 1 (modified by mocavero, 3 years ago) (diff)

NEMO HPC subgroup: Mon 16 Oct 2017

Attending: Mike Bell (Met Office), Tim Graham (Met Office), Miroslaw Andrejczuk (Met Office), Martin Price (Met Office), Andy Porter (STFC), Miguel Castrillo (BSC), Mario Acosta (BSC), Claire Levy (CNRS), Sebastien Masson (CNRS), Dmitry Kuts (Intel), Martin Schreiber (Uniexe), Silvia Mocavero (CMCC)

1. Single-core performance

Tim and Silvia analysed (at routine level) the correlation between the difference in the elapsed time and the LLC misses when the number of instances running on the same socket increases. Tests have been performed on domain of 10x10, with 31 vertical levels. This analysis allows to identify (tracer advection and lateral diffusion) routines affected by memory access where cache blocking could improve performance. Tests on different domain sizes could be useful to understand if the behaviour is confirmed.

Actions: Silvia and Tim to perform the same tests with other domain sizes; Silvia to test cache blocking on tracer advection and lateral diffusion routines

Tim experienced the increasing of the total execution time when using the perf_regions tool on Met Office system, in particular on small domains.

Action: Martin S. and Tim to check why the execution slows down when perf_regions tool is used on Met Office? system (on small domains)

2. Hybrid parallelization status

A tiled implementation of the coarse-grained version has been developed and tested at CMCC. Preliminary results show an improvement when the socket is filled, compared with the pure MPI version. Mondher Chekky tested the original version of the coarse-grained version on MERCATOR system and confirmed the same results achieved by CMCC on the Athena system, that is the same parallel efficiency between the hybrid version and the pure MPI one. Sebastien is in contact with Yann Meurdesoif who is developing a library that implements hybrid parallelisation on XIOS. This library could be used instead of the current MPI library in NEMO to implement hybrid parallelisation without a many changes in the original code. Details on this library are needed to analyse the feasibility of the solution. A lot of work has been done to implement hybrid parallelisation in CROCO, also to reorganise the code to be compliant with hybrid implementation. Info on the approach and performance comparison with pure MPI only solution would be useful.

Actions: Silvia to perform computational performance tests on the tiled version of the coarse-grained parallelisation; Sebastien to provide info on the library used in XIOS; Mike to ask Florian (CROCO) to present the approach and results to the HPC WG

3. Numerical precision

Oriol is testing single precision on some variables to check both the results accuracy and performance improvement. He uses the Reduced Precision Emulator to investigate which variables can use single-precision without affecting results. An improvement of 15% in performance without changing the results has been achieved by changing the precision of some global variables in the ice model. The approach can be automatically extended to the whole model taking care to maintain double precision in linear algebra solvers.

Action: Claire to send a revised version of the document she sent sometime ago taking into consideration the work Oriol is doing and the preliminary results that Mario has summarised

4. Psyclone-like approach

A collaboration with the Australian Bureau of Meteorology to apply the exixting ocean API in PSyClone to their ocean model. 3 person/years will be spent in a new project to continue to work on PSyClone. Some effort will be spent to continue to work on NEMO. Experience and information on coarse-grained parallelisation and cache blocking can be useful to be used in the PSyClone transformations to be supported in the future.

Actions: Andy to test the PSyClone approach on ZDF; Silvia to send the link to the branch

5. Next meeting call

Next meeting will be the in the last week of November, first week of December.

Action: Silvia to send the doodle poll for the next meeting.

6. AOB

Martin P. is testing NEMO on the new machine at Met Office?