New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2021WP/HPC-11_mcastril_HPDAonline DiagGPU – NEMO
wiki:2021WP/HPC-11_mcastril_HPDAonline DiagGPU

HPC-11_mcastril_HPDAonline DiagGPU

Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review

Summary

Action HPC-11_mcastril_HPDAonline DiagGPU
PI(S) Miguel Castrillo
Digest High Performance GPU Diagnostics Online - Final merge. Test case for the dia_hsb diagnostic port to GPU.
Dependencies
Branch source:/NEMO/branches/2021/dev_r13747_HPC-11_mcastril_HPDAonline_DiagGPU/
Previewer(s) Italo Epicoco
Reviewer(s) Italo Epicoco
Ticket #2662

Description

High performance data analytics solutions aiming at tackling the online diagnostics of the NEMO model will be explored as complementary components in the model diagnostics software eco-system. Online techniques leveraging fast (low latency and real-time) data analytics approaches (e.g. on fat nodes) will be evaluated in real cluster environments. In particular, an interface of NEMO to the High Performance Data Analitics (HPDA) framework will be designed and implemented for online diagnostics.

The rationale of this activity is to improve the NEMO computational performance by executing the computations for diagnostics on GPU.

Implementation

The portability of NEMO diagnostic calculations to GPUs was analyzed, exploring how to adapt these regions from the current MPI implementation to the CUDA paradigm. A toy model was created to perform preliminary tests, that were done using the dia_hsb diagnostic. The code itself was executed 50x faster than in a single CPU but the data transfer to and from GPU is the main bottleneck.

Afterwards, in the full model, the CPU/GPU communications were removed from the critical path, by using an asynchronous method or parallelizing them with a computation phase. At the same point, the efficiency of the overall solution was improved, by mitigating the impact of the offloaded data.

Documentation updates

...

Preview

...

Tests

Test case functionality: Test hardware and software requirements, test if sample code return value on the desirable range.

Test case setup:

Check if one or more compatible GPU is present Check if GPU driver is compatible Run sample and test returned value

Test case verification value:

Return logical value pointing if requirements are present. Return float relative error from expected sample value. This value must be smaller than 0.1%.

Status of the test case as for now: Not developed yet.

Expected characteristics:

a) When the option is not activated, the code should pass all the set test and should not display any difference in results nor computational performance.

b) When the option is enabled, a system test must be performed to check if the system fulfills the minimum hardware requirement, there will be some differences that should be appreciable:

The implementation will reduce run time and have a small increase in memory fingerprint. The impact is expected to make the time of execution of the ported diagnostics routines negligible compared to the non GPU ported case.

The diagnostics results won't be bit-to-bit identical to CPU-only implementation runs or in other hardware but will be reproducible in the same hardware and stack software.

The ported diagnostic results will be dependent on float point implementation of the GPU hardware and software. There will be no change in other NEMO subroutines.

Review

...

Last modified 3 years ago Last modified on 2021-05-10T14:43:16+02:00