Changes between Version 20 and Version 21 of 2020WP/KERNEL-06_techene_better_e3_management


Ignore:
Timestamp:
2020-10-26T17:26:15+01:00 (6 months ago)
Author:
techene
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • 2020WP/KERNEL-06_techene_better_e3_management

    v20 v21  
    2525=== Description 
    2626 
    27 NEMO current version requires memory for scale factor storage e3[P] at P-point computation uses interpolation of the e3t 4D table at P = {u-, v-, w-, f-, uw-, vw-} points. This means 7 4D tables stored in memory. The idea consists in computing scale factors e3[P](ji,jj,jk,Ktl)on the fly with r3P = ssh[P] / h_0 and e3[P]_0 instead of using memory. This should help to improve run time when running parrallel. Indeed, processors have as least two memory level : fast memory and slow RAM memory. In parrallel runs the processing time is no longer limited by computation time but by memory access time. That is the reason why trying to minimise memory buffering.  
    28 Asselin filter management is done recomputing r3[P] directly with the filtered ssh. 
    29 z-tilde management is done through e3[P]_0 that may varies with time in the z-tilde case. 
    30  
     27In z* vertical configuration, NEMO r12377 uses memory to store and update vertical scale factors e3[P] where P = {t-, u-, v-, w-, f-, uw-, vw-} points at "before", "now" and "after" time steps. This means memory storage 6 x 4D + 1 x 3D tables, memory acces and CPU time for updating 3D scale factors. 
     28The code modification consists in computing scale factor e3[P] on the fly using each time it is needed with formula e3[P](Kt) = e3[P]_0 * (1 + r3[P](Kt) * mask[P])  where r3P(Kt) (= ssh[P](Kt) / h_0) is a 2D table computed from ssh update at P = {u-,v-,f-} points accordingly with ssh update along a step.  
     29This change is only applied in case key_qco is activated. 
     30 
     31Because we reduce the number of tables reached in memory we have a better chance to keep using fast RAM memory. Because we no longer compute 3D interpolation but 2D instead algorithm complexity is smaller and use less CPU time. Both make computation about 10% faster whatever the domain size (tested between 10x10 to 100x100 points per computation node). when cutting communications. 
     32 
     33This branche also comes with improvements from KERNEL-07 such as symmetric diffusion tensor #2527 implemented in dynldf_lap_blp used controled by nn_dynldf_typ namelist parameter. It contains dynvor correction for using ln_dynvor_msk and a proper fix for using ENS and ENE with partial steps described in #2555. Finally a new shallow water test case has been added.  
    3134 
    3235