'''NEMO HPC subgroup: Mon 27 Feb 2017''' Attending: Claire Levy (CNRS), Mike Bell (Met Office), Tim Graham (Met Office), Miroslaw Andrejczuk (Met Office), Matthew Glover (Met Office), Andy Porter (STFC), Miguel Castrillo (BSC), Oriol Tinto (BSC), Martin Schreiber (Uniexe), Silvia Mocavero (CMCC) == 1. Actions from previous meetings == == 1.1 NEMO WP2017 Claire asked for a list of the actions we would like to do in 2017 but don't have resource to do (done) Feedback from the Steering Committee on the need to improve the man power for the HPC work. The next version of the development strategy document should be written in a easily way to allow the submission of a new project; on the other hand HPC activities can be funded in the long term as part of the European Infrastructure projects (e.g. IS-ENES, led by Sylvie Joussaume). == 1.2 FLOPS over counting on Intel architectures: all to test Andy's parser (at least some NEMO kernels) before the next meeting == No progress on this point. == 1.3 Integration NEMO/perf_regions tool == Martin has solved the problem of the accuracy which affected the original timing computation in NEMO when nested regions were measured. Moreover, performance counters are now handled also in nested regions. Tim has provided the outputs of a first analysis of GYRE with the perf_regions tool (compiled as static library) and the analysis of the outputs is going on. The main problems to be addressed to perform the roofline analysis are the well known FLOPS over counting and the bandwidth measurement. A first analysis should be completed in two weeks. Tim has developed a python script to extract data on performance counters in a more readable way. '''Action''': Tim to share the python script by including it in the perf_regions repository. Involved people to provide the first analysis before the meeting in Barcelona. == 1.4 Perf_regions documentation == Andy has integrated the POSIX timers which improve the measurement accuracy on short runtime. == 1.5 Updates on memory leaks ticket == Silvia has analyzed the behavior of NEMO with XIOS2 and has sent the document with analysis outputs to the group. The analysis has been carried out after updating the code to the last revision of the 3.6 stable version and shows that the execution is not affected by memory leaks. The analysis has been extended also to the same revision of the code executed without XIOS and with XIOS1 and these last tests have confirmed the results achieved with XIOS2. A detailed analysis is needed to understand the changes between the two revisions of the 3.6 stable in order to better understand the different behavior. '''Action''': Silvia to analyze the code modifications between the two versions. == 2. Presentation on Single Precision (Oriol) == Oriol has presented the outcomes of the analysis performed on NEMO by running the code with mixed precision. The performance improvement (~40% of SYPD on 256 cores) and the difference on the outputs (~1° on the SST) are reported in the presentation. Miroslav suggests to test the improvement turning off vectorization. Martin comments that the improvement achieved by running IFS in reduced precision is due to the normalization to 1 of the coefficients, maybe we could not achieve the same results in NEMO. Claire highlights the importance to look at this kind of work due to the potential gain. However, the needs in terms of precision results of the NEMO community can be very different since there are a variety of applications. Tim suggests to provide different kinds of variables in NEMO to allow to set different levels of precision. Oriol would like to use an emulator from Oxford to study the behavior of NEMO with different precisions. Mike comments that extra precision could be important when increments are accumulated. '''Action''': Oriol to share detailed numbers on performance improvement; to test the improvement turning off vectorization; to test different precisions by using the Oxford emulator. == 3. PSyclone and NEMO (Andy & Silvia) == CMCC and STFC are working to test the PSyKAl approach on a NEMO kernel. A sequential version of the original code will be modified to add vectorization and cache blocking by CMCC. In the meanwhile, STFC is working on the development of the new sequential PSyKAl version. A comparison at performance level will be done at the end of the development phase. This work allows not only to compare the performance of the two versions but also to evaluate the complexity of the PSyKAl implementation on a NEMO kernel and to provide information about this to the NEMO development community. The stand-alone code is available on a github repository. == 4. Hybrid parallelization status (Silvia) == The OpenMP implementation has been discussed during the Merge Party in December and integrated in the trunk. However, some System Team experts are not so convinced of the modifications due to the loss of code readability and the increasing of complexity introduced by the OpenMP parallelization and the limited gain in performance. Martin suggests to share a document describing the code complexity problem in order to evaluate alternative solutions. Silvia asks to discuss more in general about the problem to address HPC issues without affecting the readability and flexibility of the code. Mike suggests to discuss about this issue during the next Enlarged Developer’s Committee meeting. Claire suggests to consider different OpenMP strategies (e.g. based on tiling) as in Dynamico atmospheric model since it seems to be more promising from the computational point of view; this could convince the System Team to integrate the new developments. Andy highlights that Dynamico and NEMO are different due to the data layout and the tiling approach could be not so efficient for NEMO. '''Action''': Silvia to discuss with Andy and Martin about the OpenMP approach used in NEMO; the need to combine the model developments and HPC optimization strategies will be addressed during the Enlarged Developer’s Committee meeting. == 5. Next meeting call == Next meeting will be in the second half of April. '''Action''': Silvia to send the doodle poll for the next meeting.