Changes between Version 1 and Version 2 of Working Groups/HPC/Mins_sub_2017_02_27


Ignore:
Timestamp:
2017-04-20T23:47:01+02:00 (3 years ago)
Author:
mocavero
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Working Groups/HPC/Mins_sub_2017_02_27

    v1 v2  
    1 '''NEMO HPC subgroup: Thurs 13 Oct 2016''' 
     1'''NEMO HPC subgroup: Mon 27 Feb 2017''' 
    22 
    3 Attending: Mike Bell (Met Office), Miguel Castrillo (BSC), Tim Graham (Met Office), Silvia Mocavero (CMCC), Andy Porter (STFC), Martin Schreiber (Uni Exeter 
     3Attending: Claire Levy (CNRS), Mike Bell (Met Office), Tim Graham (Met Office), Miroslaw Andrejczuk (Met Office), Matthew Glover (Met Office), Andy Porter (STFC), Miguel Castrillo (BSC), Oriol Tinto (BSC), Martin Schreiber (Uniexe), Silvia Mocavero (CMCC 
    44 
    55 
     6== 1.   Actions from previous meetings == 
     7  
     8== 1.1  NEMO WP2017 
    69 
    7 == 1.   Single-node performance == 
     10Claire asked for a list of the actions we would like to do in 2017 but don't have resource to do (done) 
     11 
     12Feedback from the Steering Committee on the need to improve the man power for the HPC work. The next version of the development strategy document should be written in a easily way to allow the submission of a new project; on the other hand HPC activities can be funded in the long term as part of the European Infrastructure projects (e.g. IS-ENES, led by Sylvie Joussaume). 
     13 
     14 
     15== 1.2  FLOPS over counting on Intel architectures: all to test Andy's parser (at least some NEMO kernels) before the next meeting == 
     16 
     17No progress on this point. 
     18 
     19== 1.3  Integration NEMO/perf_regions tool == 
     20 
     21Martin has solved the problem of the accuracy which affected the original timing computation in NEMO when nested regions were measured. Moreover, performance counters are now handled also in nested regions. Tim has provided the outputs of a first analysis of GYRE with the perf_regions tool (compiled as static library) and the analysis of the outputs is going on. The main problems to be addressed to perform the roofline analysis are the well known FLOPS over counting and the bandwidth measurement. A first analysis should be completed in two weeks. 
     22Tim has developed a python script to extract data on performance counters in a more readable way. 
     23 
     24 
     25'''Action''': Tim to share the python script by including it in the perf_regions repository. Involved people to provide the first analysis before the meeting in Barcelona. 
     26 
     27 
     28== 1.4  Perf_regions documentation == 
     29 
     30Andy has integrated the POSIX timers which improve the measurement accuracy on short runtime. 
     31 
     32== 1.5  Updates on memory leaks ticket == 
     33 
     34Silvia has analyzed the behavior of NEMO with XIOS2 and has sent the document with analysis outputs to the group. The analysis has been carried out after updating the code to the last revision of the 3.6 stable version and shows that the execution is not affected by memory leaks. The analysis has been extended also to the same revision of the code executed without XIOS and with XIOS1 and these last tests have confirmed the results achieved with XIOS2. A detailed analysis is needed to understand the changes between the two revisions of the 3.6 stable in order to better understand the different behavior. 
     35 
     36 
     37'''Action''': Silvia to analyze the code modifications between the two versions. 
     38 
     39 
     40== 2.   Presentation on Single Precision (Oriol) == 
    841  
    9 == 1.1  FLOPS over-counting issue & systems (including KNL) access == 
     42Oriol has presented the outcomes of the analysis performed on NEMO by running the code with mixed precision. The performance improvement (~40% of SYPD on 256 cores) and the difference on the outputs (~1° on the SST) are reported in the presentation. 
    1043 
    11 To solve the over-counting issue, the idea (proposed by Andy and agreed by the subgroup) is to combine the results provided by a fortran parser able to provide FLOPS count for each loop and the measurement provided by the perf_regions tool for the other metrics 
     44Miroslav suggests to test the improvement turning off vectorization. 
     45Martin comments that the improvement achieved by running IFS in reduced precision is due to the normalization to 1 of the coefficients, maybe we could not achieve the same results in NEMO. 
     46Claire highlights the importance to look at this kind of work due to the potential gain. However, the needs in terms of precision results of the NEMO community can be very different since there are a variety of applications. 
     47Tim suggests to provide different kinds of variables in NEMO to allow to set different levels of precision. 
     48Oriol would like to use an emulator from Oxford to study the behavior of NEMO with different precisions. 
     49Mike comments that extra precision could be important when increments are accumulated. 
    1250 
    13 '''Action''': Andy to test the parser on the NEMO code. First results could be available next week. 
    1451 
    15 The group has now access to several Intel architectures, except for KNL. Silvia checked the possibility to access KNL nodes provided by Intel. The access requires to sign an NDA. The new Archer to be delivered this month will include some KNL processors, then the group decided to not follow up on the Intel NDA offer 
    16  
    17 '''Action''': all to check the possibility to access other systems (e.g. ARM), also including KNL. Martin already provided info on the ARM system. 
    18  
    19 == 1.2  status of perf_regions integration in NEMO == 
    20  
    21 The activity is almost completed and the tool has been tested on the GYRE benchmark configuration. Some minor changes to the python script are missing and the integration of the ARCH files for the combined NEMO and tool compilation is planned. 
    22  
    23 '''Action''': Tim and Martin to finalize the python script. All to add the ARCH files for the systems they have access. Some initial assessments of the GYRE performance should be done before the Developers' committee meeting (Tim with some assistance from Martin and Andy). This will be quite challenging given that the Developers' meeting is early December but it's a good target to aim for. 
    24  
    25 == 1.3  perf_regions document == 
    26  
    27 Martin sent the document to the subgroup. The document contains a brief description of the tool. 
    28  
    29 '''Action''': all to check the document to add missing information. 
    30  
    31 Martin proposed to start thinking at a publication on the perf_regions tool, by considering two case studies (NEMO and an astrophysics code that will use the tool). 
    32  
    33 '''Action''': Martin to send an outline of the paper to understand its scope, all to evaluate the interest to contribute. 
    34  
    35 == 2.   HPC chapter of the NEMO development strategy document - writing process == 
    36   
    37 A draft of the document strategy will be discussed during the next Merge Party (30 nov - 2 dec) and the Developer's Committee (7-8 dec). The chapter should include the list of key points for the next 10 years HPC development strategy. Some attention has to be paid to efficiency, flexibility and scalability. The main goal is to run efficiently the code with a local domain 10 by 10. The current limit is ~40 by 40. The first draft of the HPC chapter has been written by Mike and revised by Silvia and Miguel. 
    38  
    39 '''Action''': Mike to integrate Miguel's revisions and to circulate the document within the sub-group. All to check and agree on the list of key points.    
     52'''Action''': Oriol to share detailed numbers on performance improvement; to test the improvement turning off vectorization; to test different precisions by using the Oxford emulator.    
    4053  
    4154 
    42 == 3.   NEMO Workplan actions (2016-2017) == 
     55== 3.   PSyclone and NEMO (Andy & Silvia) == 
    4356 
    44 The definition of the 2017 NEMO workplan is under discussion. The HPC-WG could contribute to the next workplan with two new actions: (i) analysis of single-core performance by using the perf_regions tool (PI: HPC-subgroup) and (ii) replacement of wrk_alloc with allocate: test of performance improvement and extension strategy (PI: Tim and Silvia). 
     57CMCC and STFC are working to test the PSyKAl approach on a NEMO kernel. A sequential version of the original code will be modified to add vectorization and cache blocking by CMCC. In the meanwhile, STFC is working on the development of the new sequential PSyKAl version. A comparison at performance level will be done at the end of the development phase. This work allows not only to compare the performance of the two versions but also to evaluate the complexity of the PSyKAl implementation on a NEMO kernel and to provide information about this to the NEMO development community. The stand-alone code is available on a github repository. 
     58   
     59== 4.   Hybrid parallelization status (Silvia)   == 
    4560  
     61The OpenMP implementation has been discussed during the Merge Party in December and integrated in the trunk. However, some System Team experts are not so convinced of the modifications due to the loss of code readability and the increasing of complexity introduced by the OpenMP parallelization and the limited gain in performance. 
    4662 
    47 '''Action''': Tim and Silvia to propose the new actions to the System Team during the next Merge Party.  
     63Martin suggests to share a document describing the code complexity problem in order to evaluate alternative solutions. 
     64Silvia asks to discuss more in general about the problem to address HPC issues without affecting the readability and flexibility of the code. 
     65Mike suggests to discuss about this issue during the next Enlarged Developer’s Committee meeting. 
     66Claire suggests to consider different OpenMP strategies (e.g. based on tiling) as in Dynamico atmospheric model since it seems to be more promising from the computational point of view; this could convince the System Team to integrate the new developments. 
     67Andy highlights that Dynamico and NEMO are different due to the data layout and the tiling approach could be not so efficient for NEMO. 
    4868 
    49 == 9.   Date for next meeting   == 
     69 
     70'''Action''': Silvia to discuss with Andy and Martin about the OpenMP approach used in NEMO; the need to combine the model developments and HPC optimization strategies will be addressed during the Enlarged Developer’s Committee meeting. 
     71 
     72== 5.   Next meeting call   == 
    5073  
    51 Next meeting will be in the second half of November. 
     74Next meeting will be in the second half of April. 
    5275 
    5376