Changes between Version 9 and Version 10 of ticket/0677_mpp_rep
- Timestamp:
- 2010-07-19T00:30:33+02:00 (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ticket/0677_mpp_rep
v9 v10 5 5 '''ticket''' : #677 6 6 7 '''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/DEV_r1879_mpp_rep DEV_r1879_mpp_rep ] 7 '''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/DEV_r1879_mpp_rep DEV_r1879_mpp_rep] 8 8 9 ---- 9 10 10 === Description === 11 11 Implementation of both methods to get mpp reproducibility, one from ECMWF (key_mpp_rep1) and the other from DFO (key_mpp_rep2). The target is to choose one, thanks to my reviewer's advices, but athis time (7th of June), I made an intensive use of cpp keys to delimit clearly the both methods. 12 12 13 Both are based on the Idea of self compensated summation, see the paper "Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in parallel applications, Yun He and Chris Ding, Journal of supercomputing, Vol 18, Number 3, pages 259-277, doi 10.1023/A1008153532043.13 Both (or at least rep2, rep1 as far as I understand)) are based on the Idea of self compensated summation, see the paper "Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in parallel applications, Yun He and Chris Ding, Journal of supercomputing, Vol 18, Number 3, pages 259-277, doi 10.1023/A1008153532043. 14 14 15 We have :15 We have(,Knuth's trick(The Art of Computer Programming’, Vol 2, p. 203), 16 16 17 sum = a+b 17 Let u and v be the two sp-numbers. 18 18 19 error = b + (a-sum) 19 Compute u’=(u+v)-v, v’=(u+v)-u and v”=(u+v)-v’ 20 20 21 In the next addition, the error is first added back : 21 Under very general conditions (concerning the reliability of rounding procedures) the following theorem holds: 22 22 23 (sum,error) = SCS(a,b)23 Double_prec_sum(u,v) = (u + v) + ( (u-u’) + (v-v”) ) 24 24 25 (sum1,error1) = SCS(sum,c+error) 25 | | 26 27 most significant least significant 28 29 part of result part 30 31 where ‘+’ and ‘-’ mean the usual single-precision addition and subtraction. So we keep track of the truncation error and add it. 26 32 27 33 These methods have been implemented in a new module lib_fortran.F90 with a few additions in lib_mpp.F90. In the sake of simplicity, I implemented a glob_sum function which is either a standard one( SUM + CALL mpp_sum), either one of the otw methods and the switch is done in lib_fortran. … … 32 38 33 39 Performance: tested on IBM Pwer6 with ORCA025 : 34 || \ || STD || REP1 || REP2 || 35 ||186||695.845 , 543.695 || 690.451 , 560.091 || 714.916 , 566.557 || 36 ||216||709.906 , 564.650 || 729.994 , 583.716 || 710.971 , 568.351 || 40 41 ||\||STD||REP1||REP2|| 42 ||186||695.845 , 543.695||690.451 , 560.091||714.916 , 566.557|| 43 ||216||709.906 , 564.650||729.994 , 583.716||710.971 , 568.351|| 37 44 38 45 average Elapsed Time (s),CPU Time (s) 39 40 41 46 42 47 === Testing ===