Version 10 (modified by rblod, 10 years ago) (diff)

Last edited Timestamp?

Author : Rachid Benshila

ticket : #677

Branch : DEV_r1879_mpp_rep


Implementation of both methods to get mpp reproducibility, one from ECMWF (key_mpp_rep1) and the other from DFO (key_mpp_rep2). The target is to choose one, thanks to my reviewer's advices, but athis time (7th of June), I made an intensive use of cpp keys to delimit clearly the both methods.

Both (or at least rep2, rep1 as far as I understand)) are based on the Idea of self compensated summation, see the paper "Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in parallel applications, Yun He and Chris Ding, Journal of supercomputing, Vol 18, Number 3, pages 259-277, doi 10.1023/A1008153532043.

We have(,Knuth's trick(The Art of Computer Programming’, Vol 2, p. 203),

Let u and v be the two sp-numbers.

Compute u’=(u+v)-v, v’=(u+v)-u and v”=(u+v)-v’ 

Under very general conditions (concerning the reliability of rounding procedures) the following theorem holds: 

Double_prec_sum(u,v) = (u + v) + ( (u-u’) + (v-v”) )

|                                  |

most significant                      least significant

part of result part

where ‘+’ and ‘-’ mean the usual single-precision addition and subtraction. So we keep track of the truncation error and add it.

These methods have been implemented in a new module lib_fortran.F90 with a few additions in lib_mpp.F90. In the sake of simplicity, I implemented a glob_sum function which is either a standard one( SUM + CALL mpp_sum), either one of the otw methods and the switch is done in lib_fortran.

I suppressed the nbit_cmp logical and added a lk_mpp_rep instead (still needed in limdyn for example). I had issues with both methods with agressive compilation options, and also with sea-ice (at least with ORCA2), so my ORCA2 tests are without sea-ice (see ticket #678).

Nota: I also used this branch to implement a SIGN function (in lib_fortran) which overwrite the standard fortran one (key_nosignedzero) to keep the f90 behaviour.

Performance: tested on IBM Pwer6 with ORCA025 :

186695.845 , 543.695690.451 , 560.091714.916 , 566.557
216709.906 , 564.650729.994 , 583.716710.971 , 568.351

average Elapsed Time (s),CPU Time (s)


Testing could consider (where appropriate) other configurations in addition to NVTK].

NVTK Tested'''YES'''
Other model configurations'''YES'''
Processor configurations tested[ Enter processor configs tested here ]
If adding new functionality please confirm that the
New code doesn't change results when it is switched off
and ''works'' when switched on

(Answering UNSURE is likely to generate further questions from reviewers.)

'Please add further summary details here'

  • Processor configurations tested
  • etc——

Bit Comparability

Does this change preserve answers in your tested standard configurations (to the last bit) ?'''YES '''
Does this change bit compare across various processor configurations. (1xM, Nx1 and MxN are recommended)'''YES'''
Is this change expected to preserve answers in all possible model configurations?'''YES'''
Is this change expected to preserve all diagnostics?
,,''Preserving answers in model runs does not necessarily imply preserved diagnostics. ''

If you answered '''NO''' to any of the above, please provide further details:

  • Which routine(s) are causing the difference?
  • Why the changes are not protected by a logical switch or new section-version
  • What is needed to achieve regression with the previous model release (e.g. a regression branch, hand-edits etc). If this is not possible, explain why not.
  • What do you expect to see occur in the test harness jobs?
  • Which diagnostics have you altered and why have they changed?Please add details here……..

System Changes

Does your change alter namelists?'''YES '''
Does your change require a change in compiler options?'''NO '''

If any of these apply, please document the changes required here…….


''Please ''summarize'' any changes in runtime or memory use caused by this change……''

IPR issues

Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO?'''YES/ NO '''

If No:

  • Identify the collaboration agreement details
  • Ensure the code routine header is in accordance with the agreement, (Copyright/Redistribution? etc).Add further details here if required……….