Version 9 (modified by mcastril, 9 months ago) (diff)

High Performance Diagnostics Online - 1st Phase

Last edition: 01/07/20 18:35:08 by mcastril

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Abstract

Summary

Action High Performance Diagnostics Online - 1st Phase
PI(S) Miguel Castrillo

Digest

Sketch of the implementation of high performance online diagnostics for NEMO in GPUs

Dependencies
Expected for
Ticket #2160
Branch NEMO/branches/$YEAR/dev_r{REV}_{ACTION_NAME}
Previewer(s)
Reviewer(s)
Link

'.' => '/nemo/wiki/2019WP/HPC-04_MCastrillo_HPDAonlineDiagGPU'

Abstract

This section should be completed before starting to develop the code, in order to find agreement with the previewer(s) on the method beforehand.

Description

High performance data analytics solutions aiming at tackling the online diagnostics of the NEMO model will be explored as complementary components in the model diagnostics software eco-system. Online techniques leveraging fast (low latency and real-time) data analytics approaches (e.g. on fat nodes) will be evaluated in real cluster environments. In particular, an interface of NEMO to the High Performance Data Analitics (HPDA) framework will be designed and implemented for online diagnostics.

Implementation

The rationale of this activity is to improve the NEMO computational performance by executing the computations for diagnostics on GPU. As first step, the portability of NEMO diagnostic calculations to GPUs has been analyzed, exploring how to adapt these regions from the current MPI implementation to the CUDA paradigm. A toy model has been created to perform preliminary tests, that were done using the dia_hsb diagnostic. The code itself is executed 50x faster than in a single CPU but the data transfer to and from GPU is the main bottleneck.

Reference manual and web pages updates

Updated on 09/20/2020 20:46:19 by anonymous

Once the PI has completed this section, he should send a mail to the previewer(s) asking them to preview the work within two weeks.

Preview

Since the preview step must be completed before the PI starts the coding, the previewer(s) answers are expected to be completed within the two weeks after the PI has sent his request.
For each question, an iterative process should take place between PI and previewer(s) in order to reach a "YES" answer for each of the following questions.

Questions Answer Comment
Does the previewer agree with the proposed methodology?
Does the previewer agree with the proposed flowchart and list of routines to be changed?
Does the previewer agree with the proposed new list of variables, including agreement with coding rules?
Does the previewer agree with the proposed summary of updates in reference manual?
… … …

Updated on 09/20/2020 20:46:19 by anonymous

Once all "YES" have been reached, the PI can start the development into his development branch.

Tests

Once the development is done, the PI should complete this section below and ask the reviewers to start their review in the lower section.

Questions Answer Comment
Can this change be shown to produce expected impact? (if option activated)?
Can this change be shown to have a null impact? (if option not activated)
Detailed results of restartability and reproducibility when the option is activated. Please indicate the configuration used for this test
Detailed results of SETTE tests (restartability and reproducibility for each of the reference configuration)
Results of the required bit comparability tests been run: Are there no differences when activating the development?
If some differences appear, is reason for the change valid/understood?
If some differences appear, is the !ticket describing in detail the impact this change will have on model configurations?
Is this change expected to preserve all diagnostics?
If no, is reason for the change valid/understood?
Are there significant changes in run time/memory?
… … …

Updated on 09/20/2020 20:46:19 by anonymous

Review

A successful review is needed to schedule the merge of this development into the future NEMO release during next Merge Party (usually in November~December).

Code changes and documentation

Question Answer Comment
Is the proposed methodology now implemented?
Are the code changes in agreement with the flowchart defined at Preview step?
Are the code changes in agreement with list of routines and variables as proposed at Preview step?
If not, are the discrepancies acceptable?
Is the in-line documentation accurate and sufficient?
Do the code changes comply with NEMO coding standards?
Is the !ticket of development documented with sufficient details for others to understand the impact of the change?
Are the reference manual tex files now updated following the proposed summary in preview section?
Is there a need for some documentation on the web pages (in addition to in-line and reference manual)?
If yes, please describe and ask PI. A yes answer must include all documentation available.
… … …

Review Summary

Is the review fully successful?

Updated on 09/20/2020 20:46:19 by anonymous

Once review is successful, the development must be scheduled for merge during next Merge Party Meeting.