'''NEMO HPC subgroup: Tue 05 Dec 2017''' Attending: Mike Bell (Met Office), Tim Graham (Met Office), Miroslaw Andrejczuk (Met Office), Martin Price (Met Office), Matthew Glover (Met Office), Andy Porter (STFC), Mario Acosta (BSC), Sebastien Masson (CNRS), Martin Schreiber (Uniexe), Silvia Mocavero (CMCC) Apologies: Claire Levy (CNRS), Marie-Alice Foujols (CNRS) == 1. CROCO and NEMO performance == Laurent does not join the meeting. The presentation could be postponed to the next meeting (maybe the enlarged HPC-WG meeting). '''Action''': Mike to check Laurent availability == 2. Hybrid parallelization status == Silvia updates the group about the coarse-grained tiled parallelisation on the ZDF package. The new version improves the previous one and the pure MPI version at intranode level. The gain on the parallel efficiency is respectively 13% and 9% when the node of the CMCC system is filled out. Next steps will be the test of the approach on a new kernel (more representative from the communication point of view) and the comparison between the hybrid parallel approach and the cache blocking technique application in terms of intranode improvement. The library used in XIOS to introduce the OpenMP parallelisation is not yet ready to be used. '''Action''': Silvia to continue to work on the proposed activities == 3. Single-core performance == Some tests on the correlation between the difference on the execution time and the difference on the LLC misses have been performed on CMCC system by increasing the domain sizes (from 10x10 up to 50x50). The trend is confirmed: the increasing of the execution time when we execute more than one instance within the socket is strictly related to the increasing of the LLC misses. The implementation of the cache blocking on the fct advection scheme has been started. A key point is the definition of the best block size depending on the memory hierarchy parameters and the code we are working on. Cache blocking, such as other code transformations, can be automatically integrated in NEMO by using the PSyclone-like parser. The vectorization issue can be addressed in parallel with the memory access improvement. '''Actions''': Silvia to continue the investigation; Tim to test different domain sizes on the MetO system == 4. ESiWACE GA in december and ESiWACE2 preparation == The group basically agrees with the contents of the short presentation on NEMO for the ESiWACE GA. The talk should give an idea of the resolutions supported by the consortium (1 kilometer resolution is not supported by the NEMO science today and CMIP global experiments do not require this kind of resolution, then 1/12° currently and 1/36° and 1/48° in the future are the target resolutions); only resolutions that needed some HPC actions to be improved should be referred in the presentation. Tim suggests to indicate a gain of 20% (achieved on several GYRE configurations) instead of 60% due to the wrk_alloc removing. Benchmark activity on different HPC system (following Mike’s suggestion) will be added. '''Action''': Silvia to change the presentation accordingly to the suggestions There are three proposals in preparation (IS-ENES3, ESiWACE2 and the LC-SPACE-03-EO-2018) which include some activities on NEMO-HPC aspects. It is important as HPC-WG to coordinate the contribution to the three proposals. It could be useful to have a list of the HPC tasks and to put in the interest and the contribution from each institution. '''Actions''': Mike to set up a google doc listing the activities (from the HPC chapter of the NDS document) by next Monday and all to express the interest on the activities == 5. Next meeting call == Next meeting will be in March since in January there will be the enlarged group meeting. '''Action''': Silvia to send the doodle poll for the next meeting. == 6. AOB == Matthew performed some benchmark tests with NEMO-GYRE on a new ARM-based system. The code is 1.25 faster compared with a Broadwell socket due to memory bandwidth. A presentation of his work will be scheduled for the next meeting.