New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
WorkingGroups/HPC (diff) – NEMO

Changes between Version 18 and Version 19 of WorkingGroups/HPC


Ignore:
Timestamp:
2016-03-08T14:43:51+01:00 (8 years ago)
Author:
mikebell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WorkingGroups/HPC

    v18 v19  
    22 
    33= '''NEMO HPC''' = 
     4Working group leader (and responsible for wiki pages) : Mike Bell 
     5 
     6---- 
     7== Members of the Working group: == 
     8 * Jeremy Appleyards (NVIDIA) 
     9 * Lucien Anton (CRAY) 
     10 * Mike Bell (Met Office) 
     11 * Tom Bradley (NVIDIA)  
     12 * Miguel Castrillo (BSC) 
     13 * Mondher Chekki (Mercator-Ocean)  
     14 * Marie-Alice Foujols (CNRS)  
     15 * Tim Graham (Met Office) 
     16 * Matt Glover (Met Office) 
     17 * Jason Holt (NOC) 
     18 * Dmitry Kuts (Intel)  
     19 * Claire Levy (CNRS) 
     20 * Gurvan Madec (CNRS) 
     21 * Sébastien Masson (CNRS)  
     22 * Cyril Mazauric 
     23 * Silvia Mocavero 
     24 * Andrew Porter (STFC) 
     25 * Stan Posey (NVIDIA) 
     26 * Martin Schreiber (Univ of Exeter)  
     27 * Kim Serradell (BSC) 
     28 * Oriol Tinto (BSC) 
     29 * Julien le Sommer (CNRS) 
     30 
     31 
     32---- 
     33Old version of page  
     34 
    435Working group leader (and responsible for wiki pages) : Sébastien Masson.[[BR]] 
    536 
    6 Page to be updated by Mike Bell  
    737 
    838---- 
     
    2959A strong improvement of NEMO scalability is needed to be able to take advantage of the new machines. This probably means a deep review/rewrite of NEMO code at some point in the future (beyond 5 years from now?). At the same time, we already know that CMIP7 won't use an ocean model that has not been strongly tested and validated and will stick to a NEMO model not so far from the existing one. [[BR]] This means that we need to: 
    3060 
    31   1) keep improving the current structure of NEMO so it works quite efficiently for almost 10 more years (until the end of CMPI7). [[BR]]    2) start to work on a new structure that would fully tested and validated at least for CMIP8 in about 10 years. [[BR]] 
     61  1) keep improving the current structure of NEMO so it works quite efficiently for almost 10 more years (until the end of CMPI7). [[BR]]      2) start to work on a new structure that would fully tested and validated at least for CMIP8 in about 10 years. [[BR]] 
    3262 
    3363Based on this, we propose to divide the work according to 3 temporal windows [[BR]] 
     
    3565'''0-3 years''': improvements with existing code: [[BR]] 
    3666 
    37   1) remove solvers and global sums (to be done in 3.7) 1) reduce the number of communications: do less and bigger communications (group communications, use larger halo). main priority: communications in the time splitting and sea-ice rheology. [[BR]]    2) reduce the number of communications: remove useless communications (a lot of them are simply associated with output...) [[BR]]    3) introduce asynchronous communications  [[BR]]    4) check code vectorization (SIMD instructions) [[BR]] 
     67  1) remove solvers and global sums (to be done in 3.7) 1) reduce the number of communications: do less and bigger communications (group communications, use larger halo). main priority: communications in the time splitting and sea-ice rheology. [[BR]]      2) reduce the number of communications: remove useless communications (a lot of them are simply associated with output...) [[BR]]      3) introduce asynchronous communications  [[BR]]      4) check code vectorization (SIMD instructions) [[BR]] 
    3868 
    3969'''0-5 years''': improvements through the introduction of OpenMP:  [[BR]] 
     
    4373'''beyond 5 years''': [[BR]] 
    4474 
    45   GungHo     or not GungHo    , that is the question... 
     75  GungHo       or not GungHo      , that is the question... 
    4676 
    4777== Agenda: == 
     
    6191  • BGC : obviously on-line coarsening significantly reduces the cost of BGC models, further improvement can be achieved by considering SMS term fo BGC as a big 1D vector and a compuation over only the required area (ocean point only, oceans and euphotique layer only etc...). Same idea for sea-ice physics... 
    6292 
    63   • Remark: the version of MON currently under development (MOM5: switch to C-grid, use of finit volume approach,...) is using FMS, a GungHo    type approach...and "There are ''' dozens'''  of scientists and engineers at GFDL focused on meeting the evolving needs of climate scientists pushing the envelope of computational tools for studying climate" 
     93  • Remark: the version of MON currently under development (MOM5: switch to C-grid, use of finit volume approach,...) is using FMS, a GungHo      type approach...and "There are '''   dozens'''    of scientists and engineers at GFDL focused on meeting the evolving needs of climate scientists pushing the envelope of computational tools for studying climate" 
    6494 
    6595'''Sebastien''' -- (2014 November 17): some ideas I heard about asynchronous communications: 
     
    77107• compute inner domain during communication of the halo: 
    78108 
    79 In the past, at CMCC we have carried out some optimization activities on a regional configuration (covering the mediterranean basin at 1/16°) of NEMO (v3.2). The performance analysis highlighted the SOR as one of the most computational intensive kernels, so our optimizations have been focused also on it. One activity aimed at overlapping communication and computation changing the algorithm in the following way: (i) halo computation, (ii) asynchronous communications and (iii) computation over the inner domain (overlapped with communication). The new algorithm has been evaluated on the old MareNostrum system (dismissed in 2013), in the context of an HPC-Europa application. It has been theoretically evaluated using the Dimemas tool (developed at BSC) showing that the new algorithm performed better than the old one, but the experimental results did not confirm the expectations. However, we can plan to test the communication/computation overlap paradigm on new architectures. The idea could be to extract some kernels characterized by the "do loops" you talked about, to change the communication algorithm and to test it before deciding to extend the modification to the entire the code.  
     109In the past, at CMCC we have carried out some optimization activities on a regional configuration (covering the mediterranean basin at 1/16°) of NEMO (v3.2). The performance analysis highlighted the SOR as one of the most computational intensive kernels, so our optimizations have been focused also on it. One activity aimed at overlapping communication and computation changing the algorithm in the following way: (i) halo computation, (ii) asynchronous communications and (iii) computation over the inner domain (overlapped with communication). The new algorithm has been evaluated on the old MareNostrum system (dismissed in 2013), in the context of an HPC-Europa application. It has been theoretically evaluated using the Dimemas tool (developed at BSC) showing that the new algorithm performed better than the old one, but the experimental results did not confirm the expectations. However, we can plan to test the communication/computation overlap paradigm on new architectures. The idea could be to extract some kernels characterized by the "do loops" you talked about, to change the communication algorithm and to test it before deciding to extend the modification to the entire the code. 
    80110 
    81111• larger halo: could be done only on some variables that are for example using neighbor of neighbor (see what was done on sor solver).