Changes between Initial Version and Version 1 of Working Groups/HPC/Mins_sub_2017_10_16

2017-10-19T15:14:04+02:00 (3 years ago)


  • Working Groups/HPC/Mins_sub_2017_10_16

    v1 v1  
     1'''NEMO HPC subgroup: Mon 16 Oct 2017''' 
     3Attending: Mike Bell (Met Office), Tim Graham (Met Office), Miroslaw Andrejczuk (Met Office), Martin Price (Met Office), Andy Porter (STFC), Miguel Castrillo (BSC), Mario Acosta (BSC), Claire Levy (CNRS), Sebastien Masson (CNRS), Dmitry Kuts (Intel), Martin Schreiber (Uniexe), Silvia Mocavero (CMCC) 
     6== 1.   Single-core performance == 
     8Tim and Silvia analysed (at routine level) the correlation between the difference in the elapsed time and the LLC misses when the number of instances running on the same socket increases. Tests have been performed on domain of 10x10, with 31 vertical levels. This analysis allows to identify (tracer advection and lateral diffusion) routines affected by memory access where cache blocking could improve performance. Tests on different domain sizes could be useful to understand if the behaviour is confirmed.  
     10'''Actions''': Silvia and Tim to perform the same tests with other domain sizes; Silvia to test cache blocking on tracer advection and lateral diffusion routines 
     12Tim experienced the increasing of the total execution time when using the perf_regions tool on Met Office system, in particular on small domains. 
     14'''Action''': Martin S. and Tim to check why the execution slows down when perf_regions tool is used on MetOffice system (on small domains) 
     17== 2.   Hybrid parallelization status == 
     19A tiled implementation of the coarse-grained version has been developed and tested at CMCC. Preliminary results show an improvement when the socket is filled, compared with the pure MPI version. Mondher Chekky tested the original version of the coarse-grained version on MERCATOR system and confirmed the same results achieved by CMCC on the Athena system, that is the same parallel efficiency between the hybrid version and the pure MPI one. 
     20Sebastien is in contact with Yann Meurdesoif who is developing a library that implements hybrid parallelisation on XIOS. This library could be used instead of the current MPI library in NEMO to implement hybrid parallelisation without a many changes in the original code. Details on this library are needed to analyse the feasibility of the solution. 
     21A lot of work has been done to implement hybrid parallelisation in CROCO, also to reorganise the code to be compliant with hybrid implementation. Info on the approach and performance comparison with pure MPI only solution would be useful.    
     24'''Actions''': Silvia to perform computational performance tests on the tiled version of the coarse-grained parallelisation; Sebastien to provide info on the library used in XIOS; Mike to ask Florian (CROCO) to present the approach and results to the HPC WG 
     27== 3.   Numerical precision   == 
     29Oriol is testing single precision on some variables to check both the results accuracy and performance improvement. He uses the Reduced Precision Emulator to investigate which variables can use single-precision without affecting results. An improvement of 15% in performance without changing the results has been achieved by changing the precision of some global variables in the ice model. The approach can be automatically extended to the whole model taking care to maintain double precision in linear algebra solvers. 
     31'''Action''': Claire to send a revised version of the document she sent sometime ago taking into consideration the work Oriol is doing and the preliminary results that Mario has summarised 
     34== 4.   Psyclone-like approach   == 
     36A collaboration with the Australian Bureau of Meteorology to apply the exixting ocean API in PSyClone to their ocean model. 
     373 person/years will be spent in a new project to continue to work on PSyClone. Some effort will be spent to continue to work on NEMO.  
     38Experience and information on coarse-grained parallelisation and cache blocking can be useful to be used in the PSyClone transformations to be supported in the future.  
     40'''Actions''': Andy to test the PSyClone approach on ZDF; Silvia to send the link to the branch 
     43== 5.   Next meeting call   == 
     45Next meeting will be the in the last week of November, first week of December. 
     47'''Action''': Silvia to send the doodle poll for the next meeting. 
     50== 6.   AOB   == 
     52Martin P. is testing NEMO on the new machine at MetOffice