Changes between Version 1 and Version 2 of Working Groups/HPC/Mins_sub_2016_07_15


Ignore:
Timestamp:
2017-04-20T22:36:44+02:00 (3 years ago)
Author:
mocavero
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Working Groups/HPC/Mins_sub_2016_07_15

    v1 v2  
    1 '''NEMO HPC subgroup: Fri 20 Mar 2017''' 
     1'''NEMO HPC subgroup: Fri 15 Jul 2016''' 
    22 
    3 Attending: Claire Levy, Tim Graham, Maff Glover, Andy Porter, Miguel Castrillo, Oriol Tintò, Martin Schreiber, Silvia Mocaverro, Fedreric Dupont   
     3Attending: Tim Graham, Martin Schreiber, Miguel Castrillo, Oriol Tinto, Cyril Mazauric, Andrew Porter, Silvia Mocavero   
    44 
    55 
     6== 1.   Summary of HPC-WG activities == 
     7  
     8BSC is working on two new activities to improve NEMO scalability by: (i) reducing the communication frequency (by using redundant computation) in LIM time splitting and (ii) improving load balance in north-folding 
    69 
    7 == 1.   Actions from previous meetings == 
     10== 2.   Next steps on single node performance testing: == 
     11 
     12   2.1  benchmark suite to be tested by running concurrent sequential instances (deadline: asap). Feedback on the test results by email (all) 
     13 
     14   2.2  performance counters branch TODOs (deadline September): 
     15 
     16        2.2.1 Make output look similar to current NEMO output (Silvia) 
     17 
     18        2.2.2 Replace gettimeofday call with more efficient version, taking into account portability issue (Andy) 
     19 
     20   2.3  integration of performance counters in the NEMO code (deadline October). Tim to produce a script which automatically integrates the performance counters and timing profiling within the NEMO framework 
     21 
     22== 3.   Date for next meeting   == 
    823  
    9 == 1.1  Tim to check the impact of various schemes on the GYRE configuration == 
    10  
    11         No progress on this point 
    12  
    13 == 1.2  Silvia to explore the memory leaks using the Allinea tool == 
    14  
    15         Summary on the activity from the subgroup: Silvia performed some tests on the 3.6_stable version with and without using XIOS in december. Tests have been performed with XIOS1 and showed an increasing in memory allocation when the 3.6 version was executed with XIOS in attached mode. During a meeting of the subgroup in january, Tim suggested to test the code with XIOS2. The analysis has been carried out after updating the code to the last revision of the 3.6 stable version and shows that the execution is not affected by memory leaks. The analysis has been extended also to the same revision of the code executed without XIOS and with XIOS1 and these last tests have confirmed the results achieved with XIOS2. A detailed analysis is needed to understand the changes between the two revisions of the 3.6 stable in order to better understand the different behavior. 
    16  
    17 '''Action''': Silvia to analyze the code modifications between the two versions. 
    18  
    19 == 2.   Investigations of single core performance (Martin, Silvia, Tim) == 
    20   
    21 Martin provides an update on the activity: after the integration of the perf_regions tool within NEMO, Tim wrote a script to extract the metrics measurement. We have info on timing and on cache performance for each routine. Even if there is a mismatch between the cache performance and the measured bandwidth, some preliminary results could be presented during the meeting in Barcelona 
    22  
    23 '''Action''': Martin to provide some slides about single-core activity to be integrated in the more appropriate talk (to be decided in the next days).    
    24   
    25  
    26 == 3.   Updates at the NEMO merge party and to the NEMO trunk (Silvia) == 
    27  
    28 The hybrid parallel version has not been integrated in the trunk since some ST developers have expressed concerns about the code complexity, also considering the limited performance gain introduced by the OpenMP approach. The current OpenMP parallelization is fine-grain. Alternative parallelization approaches (e.g. coarse-grain, tiling) and their impacts on code performance and readability will be tested with a continuous feedback from the ST. 
    29 Silvia has shared with Andy some information about the OpenMP parallelization to have some feedback and suggestions from him, due to his experience on this. 
    30 Dmitry is working on the Intel compiler workshare issue, which should reduce the code changes complexity for arrays copy/initialization operations.  
    31 Tim suggests to create a new branch with a single kernel (e.g. the advection scheme) and to test the different approaches on it. Silvia suggests to consider two kind of kernels (with and without halo exchange).  
    32 An OpenMP parallel version of kernels developed by using the Psyclone-lite approach will be also considered. 
    33   
    34  
    35 '''Action''': Silvia to create the new development branch, starting from the new trunk and to develop the fine-grain, coarse-grain and tiling versions of the target kernels. Andy to develop the Psyclone-lite version  
    36  
    37  
    38 == 4.   PSyclone/NEMO update (Andy) == 
    39  
    40 Since the Psyclone approach could be a bit invasive for the NEMO code, a new approach (based on DSL concept) has been tested within the IS-ENES2 project by STFC and is available on a github repository (DSL project). It is based on the development of a separate kernel for each loop on the grid points in the advection kernel implemented by CMCC, so that OpenMP, or OpenACC, or cache tiling can be implemented at kernel level to provide performance portability. 
    41 Frederic suggests to have a review of the code changes from the ST to know if they are acceptable. 
    42 The approach could be presented during the meeting in Barcelona to have a feedback from the ST. 
    43 Martin says that it could be great to have some info on the performance improvement of this approach (measured with the perf_regions tool), for example on cache performance. 
    44 The discussion between the ST and the HPC-WG could continue during the ST videoconfs by inviting HPC people to these meetings. 
    45  
    46   
    47 '''Action''': Andy to provide some info on the approach and the performance improvement   
    48  
    49  
    50 == 5.   Issues to be discussed in Barcelona (all)  == 
    51   
    52 There are three talks scheduled for the HPC session of the meeting in Barcelona: a first talk on the HPC-WG activities, with a focus on the main results and the discussion on the readability/performance trade-off; a second talk on the BGC HPC issues and the third talk on single-precision work, proposed by BSC.  
    53 Claire suggests to spend no much time on the presentation of the work done because the main goal of the meeting is to discuss about questions on which we have a consensus and open questions. A list of the questions should be sent to the participants in order to have a fruitful discussion during the meeting. 
    54 The talks will briefly summarize the work done and will be an introduction to the discussion. 
    55 Mike has prepared a draft of the questions to be sent to the participants. 
    56  
    57     
    58 '''Action''': Tim to discuss with Mike about the drafted list of questions and to iterate with the other people 
    59  
    60 == 9.   Date for next meeting   == 
    61   
    62 Silvia proposed to have the next meeting in June. Attendees are available to attend the meeting in June. 
    63  
    64  
    65 '''Action''': Mike to send the doodle poll   
     24A doodle will be open (Silvia)