New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2019WP/HPC-04_MCastrillo_HPDAonlineDiagGPU (diff) – NEMO

Changes between Version 8 and Version 9 of 2019WP/HPC-04_MCastrillo_HPDAonlineDiagGPU


Ignore:
Timestamp:
2020-01-07T18:35:08+01:00 (4 years ago)
Author:
mcastril
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2019WP/HPC-04_MCastrillo_HPDAonlineDiagGPU

    v8 v9  
    4141=== Description 
    4242 
    43  [tf.textarea:description -id=piform -class=taform 'Describe the goal of development, and the methodology.\n\nAdd reference documents or publications if relevant.' 0 20] 
    44  
     43High performance data analytics solutions aiming at tackling the online diagnostics of the NEMO model will be explored as complementary components in the model diagnostics software eco-system. Online techniques leveraging fast (low latency and real-time) data analytics approaches (e.g. on fat nodes) will be evaluated in real cluster environments. In particular, an interface of NEMO to the High Performance Data Analitics (HPDA) framework will be designed and implemented for online diagnostics.  
    4544 
    4645=== Implementation 
    4746 
    48  [tf.textarea:implementation -id=piform -class=taform 'Describe flow chart of the changes in the code.\n\nList the .F90 files and modules to be changed.\n\nDetailed list of new variables (including namelists) to be defined. Give for each the chosen name (following coding rules) and definition.' 0 20] 
     47The rationale of this activity is to improve the NEMO computational performance by executing the computations for diagnostics on GPU. As first step, the portability of NEMO diagnostic calculations to GPUs has been analyzed, exploring how to adapt these regions from the current MPI implementation to the CUDA paradigm. A toy model has been created to perform preliminary tests, that were done using the dia_hsb diagnostic. The code itself is executed 50x faster than in a single CPU but the data transfer to and from GPU is the main bottleneck.  
    4948 
    5049