NEMO HPC meeting on Mixed Precision

Last edition on 11/30/18 15:49:17 by mcastril

Participants

  • BSC: Miguel Castrillo, Oriol Tintó, Mario Acosta
  • CMCC: Italo Epicoco, Francesca Mele
  • CNRS: Claire Levy, Sebastien Masson, Eric Maisonnave
  • MOI: Clément Bricaud
  • STFC: Andrew Porter
  • UKMO: Mike Bell
  • ECMWF: Nils Wedi, Michael Lange, Peter Dueben
  • TUM: Martin Schreiber
  • ATOS: David Guibert

Agenda

  • Presentation of the work by the BSC and open questions to continue with the integration in NEMO. O. Tintó. ~30 min
  • Questions and further explanations about the methodology and the results. All. ~15 min
  • Discussion about the next steps. All. ~45 min

Minutes

Oriol did a half an hour presentation about his work with mixed precision, ideas about what should be needed to have mixed precision implemented in NEMO, and the feedback that is needed from the NEMO community.

There were some technical issues (apparently hangouts connection limit) that prevented some users (Martin) to attend the discussion.

There was a lot of discussion in the hour allocated for questions and interaction:

Methodology:

Sebastien asked about the grouping method, how the variables were being counted, and how the tests are done when different groups are merged. Mike also asked about how the variables were being counted and if some variables are counted twice. Mike was also interested about difficulties parsing variables. Oriol told about the dependencies between different variables (arguments for function/routine calls). There is a lot of small issues that could take some time solving it manually.

Italo was concerned about that the results could change depending on how you partition the groups. Oriol answered that there are two strategies to prevent unwanted results when combining two groups of variables, in first place the use of lower thresholds for smaller groups to prevent the errors after combination to be higher than desired, and in second place evaluating the sets using different initial conditions. Changing the grouping method could be studied to prevent possible errors. Italo also showed interest about the type used by the emulator (real + integer). There was a confusion because it was not really well explained during the presentation. The method presented its only a way to explore which variables can use less precision without compromising the outputs, and the results presented regarding the savings in terms of memory are only estimates. The simulations performed with the emulator are not useful in terms of evaluating the implications in performance and in fact are much slower than usual double precision simulations. Finally, Italo also asked about the influence of using different configurations or parameters in the result. The more configurations are tested the bigger will be the coverage and the confidence about the results, so more configurations should be tested.

Peter Dueben pointed out that using different resolutions the results may change but a lot of information from ORCA2 tests will be indeed very useful also for ORCA025. Oriol answered that while it is true that trying to port the results from ORCA2 to ORCA025 the errors were bigger, the differences that appeared were still very small. Peter also asked about the number of timesteps run and he highlighted the importance of the acceptance test, and that the accuracy test should be dependant on the use case. He also asked about how much it takes to produce a real implementation. There are many ways to go from the current point to a final implementation and the path chosen will determine whether this can happen sooner or later.

Accuracy tests:

Mike asked if the tools used would be available for users. Oriol said that it could be considered if this is interesting for the community, however at this point the tools may be difficult to use for everyone but the developer and therefore some things should be generalised and simplified before. Miguel said that anyhow the accuracy test should be available for users since it can be useful in many other situations rather than the work with mixed precision. Mike stated that the accuracy test depends a lot on the case.

Claire showed her concern about the accuracy test (ORCA2, 10 days, no ice), explaining that they have to decide on some accuracy tests that are convincing. In fact this was the first point in the Feedback needed slide.

Sebastien reminded that some process in deep ocean circulation are very difficult to observe in small timescales. Nils then said that a version valid up to seasonal timescales could be released, and ECMWF could help to develop it. Mike supported the idea and said that this methodology could be used in combination with other approaches as the one in ECMWF. Peter Dueben however said the approach in ECMWF is put everything in single precision and then fix the errors.

Implementation:

Nils suggested that people should select which mode of mixed precision is using when running the code. Oriol answered that it should be straightforward to run a mixed-precision implementation fully in double-precision if needed. Nils added that it is possible to allow to select between different levels of precision that could be chosen.

Claire agreed with the perspective and the idea of taking mixed precision but taking care of the risks. She pointed out that most of the NEMO users are not the HPC type of user and that for the NEMO ST the robustness is a must. Oriol answered that users not using HPC would benefit also from this development because the reduction of memory is expected to have more impact when using less computational resources and agrees that robustness is important.

There are a lot of use cases and configurations in the community. Claire pointed out that the NEMO 4 stable release will be released soon so the implementation in mixed precision should target the next version. She asked if it could be possible to follow a different approach to achieve a mixed precision version of the code by going back to run everything in single precision and fix the parts that are not able to use single. Nils however said that it is difficult to follow this approach. Oriol’s opinion was that now the method has been developed it would be much easier to follow the current approach.

Mario and Oriol explained that this is a first step in which only the variables that can be safely changed to single precision will be changed but afterwards the places where right now is not possible to reduce the precision can be modified algorithmically to allow it to happen. Mike asked about having two codes, mixed and double precision, always that there is a way to change from one to the other easily.

Nils reminded that you it is possible to change at compilation or run time from double to single precision. In ECMWF they change the working precision at compilation time. Some problems can arise but it should be avoided to duplicate many lines of the code.

Last modified 21 months ago Last modified on 2018-11-30T15:49:17+01:00