Version 15 (modified by techene, 4 years ago) (diff) |
---|
Star coordinate faster implementation
Change the way to deal with the vertical scale factors in NEMO in order to save parallel processing time for z-star coordinate. This modification can be activated through a cpp key : key_qco.
Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?
The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.
Summary
Action | optimisation of the vertical scale factor e3 computation |
---|---|
PI(S) | Techene, Madec |
Digest | compute e3 on the fly from e3_0(:,:,:,Ktl) * ( 1 + ssh(:,:,Ktl) / h_0( :,: ) * mask( :,:,: ) instead of storing e3t/u/v/w/f... |
Dependencies | If any |
Branch | source:/NEMO/branches/2020/dev_r12377_KERNEL-06_techene_e3 |
Branch | source:/NEMO/branches/2020/dev_r13327_KERNEL-06_techene_e3_v2 |
Previewer(s) | Madec, Chanut, Masson |
Reviewer(s) | Madec |
Ticket | #2385 |
Description
NEMO current version requires memory for scale factor storage e3[P] at P-point computation uses interpolation of the e3t 4D table at P = {u-, v-, w-, f-, uw-, vw-} points. This means 7 4D tables stored in memory. The idea consists in computing scale factors e3[P](ji,jj,jk,Ktl)on the fly with r3[P] = ssh[P] / h_0 and e3[P]_0 instead of using memory. This should help to improve run time when running parrallel. Indeed, processors have as least two memory level : fast memory and slow RAM memory. In parrallel runs the processing time is no longer limited by computation time but by memory access time. That is the reason why trying to minimise memory buffering. Asselin filter management is done recomputing r3[P] directly with the filtered ssh. z-tilde management is done through e3[P]_0 that may varies with time in the z-tilde case.
Implementation
KERNEL-06's version 1 implementation : /NEMO/branches/2020/dev_r12377_KERNEL-06_techene_e3
starting point
In NEMO r12377 scale factors (e3*) at u-v-w-uw-vw-f-points are interpolated from e3t at Kbb, Kmm, and Kaa. The module in charge of scale factor management is src/OCE/DOM/domvvl.F90. domvvl.F90 interpolation routine is called :
- at initialisation or restart at u-v-w-uw-vw-f-points
nemogcm --> nemo_init --> dom_init --> dom_vvl_init --> dom_vvl_rst --> dom_vvl_zgr --> dom_vvl_interpol
- at each time step for "after" scale factor at u-v-points each time ssh[Naa] is computed
nemogcm --> stp --> dom_vvl_sf_nxt --> e3t[Kaa] --> dom_vvl_interpol
- at each time step for "now" scale factor at u-v-points e3t is directly filtered
nemogcm --> stp --> tra_atf --> e3t[Kmm] --> dyn_atf --> e3t[Kmm] --> dom_vvl_interpol
- at each time step after time swapping for "now" at f-point and "before" scale factor both at w-uw-vw-points
nemogcm --> stp --> dom_vvl_sf_update --> dom_vvl_interpol
developpments
In version 1 of source://dev_r12377_KERNEL-06_techene_e3 we implement changes progressively and validate step by step regarding GYRE_PISCES test case. We remove all the vvl routines usage thus we replace e3t/u/v/w/uw/vw at 3 time step + e3f (19) 3d tables storage and twice e3u/v "after" + e3u/v "now" + e3f/w/uw/vw "now" + e3w/uw/vw "before" (13) 3d tables interpolation at each step and e3u/v/f/w/uw/vw "now" + e3u/v/w/uw/vw "before" (11) 3d tables interpolation at initialisation or restart by r3t/u/v at 3 time steps and r3f i.e. (10) 2d table storage and on the fly light computation. For backward compatibility we introduce a cpp key key_qco in order to isolate new scale factor implementation from former vvl version. qco stand for for "quasi eulerian coordinate".
- new variables added in dom_oce and domain
r3P with P = [t,u,v,f] are 2d in space ssh/hP0
- new module in DOM : dom_qco
-- dom_qco_r3c ! compute ssh/ht0 at t-point and interpolate ssh/h.0 at u-v-f-points from ssh -- dom_qco_zgr ! set ssh/ht0 at t-u-v-f-points for Kmm and at t-u-v-points for Kbb --> dom_qco_r3c[Kmm] at t-u-v-f-points --> dom_qco_r3c[Kbb] at t-u-v -points
- new substitute in DOM : domzgr_substitute When key_qco is active e3. is no longer a variable but an expression. Each time e3. appears in a routine an include domzgr_substitute in the module enables to replace it by a (e3._0 ( 1 + r3. ) * mask.) like expression.
# define e3t(i,j,k,t) (e3t_0(i,j,k)*(1._wp+r3t(i,j,t)*tmask(i,j,k))) # define e3u(i,j,k,t) (e3u_0(i,j,k)*(1._wp+r3u(i,j,t)*umask(i,j,k))) # define e3v(i,j,k,t) (e3v_0(i,j,k)*(1._wp+r3v(i,j,t)*vmask(i,j,k))) # define e3f(i,j,k) (e3f_0(i,j,k)*(1._wp+r3f(i,j)*fmask(i,j,k))) # define e3w(i,j,k,t) (e3w_0(i,j,k)*(1._wp+r3t(i,j,t))) # define e3uw(i,j,k,t) (e3uw_0(i,j,k)*(1._wp+r3u(i,j,t))) # define e3vw(i,j,k,t) (e3vw_0(i,j,k)*(1._wp+r3v(i,j,t)))
- finally we extended the e3. modification to h. r1_h. and gde..
# define ht(i,j) (ht_0(i,j)+ssh(i,j,Kmm)) # define hu(i,j,t) (hu_0(i,j)*(1._wp+r3u(i,j,t))) # define hv(i,j,t) (hv_0(i,j)*(1._wp+r3v(i,j,t))) # define r1_hu(i,j,t) (r1_hu_0(i,j)/(1._wp+r3u(i,j,t))) # define r1_hv(i,j,t) (r1_hv_0(i,j)/(1._wp+r3v(i,j,t))) # define gdept(i,j,k,t) (gdept_0(i,j,k)*(1._wp+r3t(i,j,t))) # define gdepw(i,j,k,t) (gdepw_0(i,j,k)*(1._wp+r3t(i,j,t))) # define gde3w(i,j,k) (gdept_0(i,j,k)*(1._wp+r3t(i,j,Kmm))-ssh(i,j,Kmm))
When this key_qco is not activated NEMO should be exactly the same as the trunk.
Important points :
- e3. expression involves tables of distinct dimension then e3.(:,:,: ) call fails it may be necessary to introduce temporary variables (same for water height expression)
- "e3. =" is no longer possible
- e3t/u/v/f modifications did not introduce any difference in the results, e3w modification does because both approaches vvl and qco do not take into account of the bottom level in the same way
- in GYRE e3w_0 are not the half sum of e3u_0 so the way it is implemented in the reference version is not convinient
- e3. substitution makes lines longer than 136 character this may be a problem for compilers (most have been checked but not all)
- ssh filtering has been displaced upper in order to provide filtered r3P in TOP asselin filtering
- when key_qco is not active we pass SETTE and this version r13167 has been delivered for mid-merge party ! Some silly allocating memory bugs found and a not that silly bug in implicit mode triggering for SPITZ12 configuration (Dt instead of 2Dt required).
KERNEL-06's version 2 implementation : /NEMO/branches/2020/dev_r13327_KERNEL-06_techene_e3_v2
NEMO intermediate version 1 implements scales factors computed from sea surface interpolation (2d field) instead but the whole structure of the code remains. Note that to validate "NEMO intermediate version 1" we change the code line by line and compare results of GYRE configuration with TOP de-activated. Differences in the results appear when changing the W-point scale factor interpolation from T-point scale factor into the sea surface scaling since the bottom level is not considered in the same way. Indeed for GYRE configuration e3w_0 are not the half sum of e3u_0, so the way it is implemented in the reference version is not convinient... [Etape 1]
NEMO intermediate version 2 implements scales factors computed from sea surface interpolation (2d field) instead. The initialisation compute sea surface to h_0 called r3 coefficients (which are 2d). These r3 coefficients are updated after each sea surface modification (after time splitting and asselin filtering) and interpolated at U-V-F-points using new but similar routines as domvvl routines. An extra substitute routine helps to substitute each e3 to its expression (e3P_0 ( 1 + r3P ) * maskP). Sea surface filtering is displaced before Asselin filtering of speed (u,v) and tracer. This version 2 should give exactly the same results as version 1 and it does ! [Etapes 2 & 3]
NEMO intermediate version 3 deals with cleanning the code by adding the substitution and removing e3 computation along the code of OCE. It also takes care of the lines lenghts that should be shorter than 136 caracters, some are missing... [Etapes 4 & 5]
In order to take into account the new index/loop management, NEMO intermediate version 4 consists in merging the results with trunk 12698 the resulting revision is 12724. Note that in this new trunk revision Jerome changed the way to deal with Asselin filter (traatf and dynatf), intermediate version 4 needs to adapt accordingly. [Etape 6]
NEMO intermediate version 5 implements a clean way to deal with the key_qco and also deals with the removal of gde* and h* of memory. It removes e3 from the whole code, to deal with TOP there is to play with pointer of the sea surface height and change where it is computed in step. [Etape 7]
RUN SETTE and deliver version for mid-merge party ! Some silly allocating memory bugs found and a not that silly bug in the implicit mode for SPITZ12 configuration. [Etape 8, 9 & 11]
...
Documentation updates
...
Preview
Eventually, all the dom_vvl_interpol call are removed, each time e3 is called we use a substitute to replace e3 by e3_0 (1 + ssh / h_0). For backward compatibility a cpp key manages the use of the new version vs. the old version. We will duplicate modules such as step and domvvl into stepLF and domQE (QE stands for Quasi Eulerian) and create a subtitute module.
Integrated in mid merge trunk.
List the Fortran modules and subroutines to be created. substitute.F90
Step 1 : Check the error for e3t, e3w between the current way to compute e3 at T-, W-point and the proposed way to compute e3 at T-, W-point.
- prints added with no change in the results
Step 2 : First we change only the core routine in domvvl which should be changed into domQE.
- add new variables, duplicate step into steplf and domvvl into domQE
- change interpolation routines into scaling routines in domQE
Step 3 : Then we change the Asselin filtering routine indeed because water forcing are applied locally.
- change Asselin routines (maybe not required since e3 scale with vertical with JC modif)
Step 4 : Finally we remove the interpol routine in the whole code
- remove interpolating routine in all the code (AGRIF, OFF,...)
- use a SUBSTITUTE when there are e3 CALL
- make some changes in step and domQE to have the whole thing consistent
...
Tests
... We want to track and maybe explain the differences observed at every steps. Reference set up : For that we produce a reference data set with the trunk -r 12377 using the GYRE_PISCES configuration where top cpp_key has been removed. We run it on 120 time steps. The drag coefficient is zero. We XIOS output an averaged field every 5 days.
Step 1 : We print MAXVAL of error between both way to compute the vertical scale factors at each time step, note that we cancelled forcing (in the r12377 revision it should not change anything since water forcings such as run off and emp scale with the vertical).
error between proposed and former way to compute vertical scale factors at time kt = 1, 120, 85 e3t (1) 0.0000000000000000 3999.6591076268369 4.54747350886E-013 e3t (2) 5.68434188608E-014 5.11590769747E-013 4.54747350886E-013 e3w 4.64477238892E-007 6.13657050507E-006 5.27333801869E-006 gde3w 1.81898940354E-012 2.72848410531E-012 2.72848410531E-012
QUESTION : Why do we have such an error on the e3w scale factors ? It is not consistent with machine accuracy error. It seems to be related to the e3w_0 computation. How do we compute e3w_0 ? OK SOLVED ! THIS IS A KIND OF ERROR IN THE CODE !!! DUE TO THE FACT THAT E3W_0(jk) != 0.5 * ( E3T_0(jk) + E3T_0(jk-1) )...
Step 2 : Change the code in domvvl turn into domqe. Duplicate step.F90 into steplf.F90 and call domqe routines inside.
We observe small errors but not errors at the truncature level as expected with the curent trunk version. This is due to the differences spotted above. WE CAN NO LONGER USE THE TRUNK PRODUCTION AS A REFERENCE...
Review
...