'''NEMO HPC subgroup: Weds 23 Nov 2016''' Attending: Mike Bell (Met Office), Miguel Castrillo (BSC), Tim Graham (Met Office), Silvia Mocavero (CMCC), Andy Porter (STFC) == 1. Actions from previous meetings == == 1.1 Access to KNL (@MetO) == The system will be available next year. Tim and Mike will have access to the system, so some members of the HPC WG will be able to test the code on the system == 1.2 Chapter on HPC of the NEMO Development Strategy == The draft of the chapter has been written by Tim and revised by the subgroup. It has been discussed during the HPC-WG meeting and it is ready for the next developers committee. We will revise the document at the beginning of the next year. == 1.3 NEMO WP2017 == HPC-1 (CMCC & MetO), 2 (BSC and MERCATOR), 3 (MetO), 4 (CMCC) of the current WP will be continued during 2017. A new shared action on single-core performance analysis and optimization will be proposed as HPC subgroup. Two other actions will be proposed by MetO: the wrk_alloc removing (to be performed as last step of the merge party or just after it) and the investigation on using XIOS for reading input files (useful for operational runs). '''Action''': Tim and Silvia to propose these actions during the next Merge Party. == 1.4 FLOPS over counting on Intel architectures: Andy's parser and Intel SDE64 == Andy has developed the first version of the parser and sent an email with the installation and execution instructions. The tool is able to provide an estimate of the FLOPS executed by each inner-most loop of the parsed code (the whole NEMO code or just a file). SDE64 can be used on Intel architecture to have a comparison with the output from the parser. '''Action''': all to test both Andy's parser and the SDE64 on the NEMO code (or a portion of it). == 1.5 Integration NEMO/perf_regions tool == Tim has shared the perf_regions tool and the updated NEMO branch with Silvia and Miguel who tested the code. A problem with the perf_regions tool occurs when nested call to performance counters are used. Silvia suggests to avoid nested calls to perf counters in NEMO (info on the NEMO calls-tree could be taken into account from the perf_regions tool). Tim and Mike suggest to investigate the option of getting the perf_regions tool to handle nested timer calls. '''Action''': Martin, Tim and Silvia to discuss which is the best and easiest solution. == 1.6 Perf_regions documentation == Martin has sent the draft of the perf_regions documentation to the other contributors. Silvia has integrated the document. '''Action''': Tim and Andy to iterate over the document. Andy to contact Martin for solving the POSIX timers integration. == 2. Single-core performance test plan == Silvia has sent Tim a draft of the document to Tim. Tim suggests to use low resolution benchmarks (40x40 and 10x10) '''Action''': Tim to add his comments and to share the document to the other subgroup members. == 3. Updates on memory leaks ticket == Silvia is investigating the memory leaks issue by using the ALLINEA tool. A long run of the ORCA2-LIM3 config has been tested both on the 3.6 stable and trunk codes (without XIOS) by using 63 processors. Allocated and deallocated memory seem to be the same. Tests with XIOS are in progress. '''Action''': Silvia to send the outputs of the Allinea memory profiler to the subgroup. == 9. Next meeting call == Next meeting will be in January (2nd or 4th week). '''Action''': Silvia to send the doodle poll for the next meeting