New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
ticket/0829 (diff) – NEMO

Changes between Version 16 and Version 17 of ticket/0829


Ignore:
Timestamp:
2011-10-24T12:15:26+02:00 (13 years ago)
Author:
clevy
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ticket/0829

    v16 v17  
    1 [[PageOutline]] 
    2 Last edited [[Timestamp]] 
    3  
    4 [[BR]] 
    5  
    6 '''Author''' : rblod (Rachid Benshila)  
     1[[PageOutline]] Last edited [[Timestamp]] 
     2 
     3'''Author''' : rblod (Rachid Benshila) 
    74 
    85'''ticket''' : #829 
    96 
    10 '''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem      dev_r2769_LOCEAN_dynamic_mem ]  
    11 ---- 
    12  
     7'''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem dev_r2769_LOCEAN_dynamic_mem] 
     8 
     9---- 
    1310=== Description === 
    14  
    15 Computing  aspects of dynamic memory implementation are already described there http://forge.ipsl.jussieu.fr/nemo/wiki/2011WP/2011Stream2/DynamicMemory, possible consequences  there https://forge.ipsl.jussieu.fr/nemo/wiki/2011Stream2/DynamicMemory_improvments . This branch deals with the first aspect, ie the practical implementation. [[BR]] 
    16 Dynamic memory implementation is clearly a step forward, an current implementation from branch dev_r2586_dynamic_mem; is quiet clean, with careful checks of availability of the work arrays. However: 
     11Computing  aspects of dynamic memory implementation are already described there http://forge.ipsl.jussieu.fr/nemo/wiki/2011WP/2011Stream2/DynamicMemory, possible consequences  there https://forge.ipsl.jussieu.fr/nemo/wiki/2011Stream2/DynamicMemory_improvments . This branch deals with the first aspect, ie the practical implementation. [[BR]] Dynamic memory implementation is clearly a step forward, an current implementation from branch dev_r2586_dynamic_mem; is quiet clean, with careful checks of availability of the work arrays. However: 
     12 
    1713 * Assignation of work arrays by hand leads to some difficulties considering the number of options and combinations of options available in NEMO 
    1814 * In term of memory, the number of work arrays have to be hard-coded to the maximum combination, ie we always use more memory than needed 
     15 
    1916Investigation of improvements follows the following steps: 
    20  * Implementation of timing functionalities : this topic has been discussed within NEMO group for years, since the dynamic memory developments are impacting all the routines, implementing timing in the same time makes sense  
     17 
     18 * Implementation of timing functionalities : this topic has been discussed within NEMO group for years, since the dynamic memory developments are impacting all the routines, implementing timing in the same time makes sense 
    2119 * Small changes in current implementation : the work-arrays are a in list automatically incremented and decremented, no more assigned by hand. This solves limitation 1 above. 
    2220 * More radical change : the working space is build dynamically. This solves limitation 2. 
    2321 
    2422==== 1- Timing ==== 
    25  
    2623This functionality doesn't aim to replace advanced software used for optimisation but: 
     24 
    2725 * to give a rough idea of performance (CPU and elapsed) 
    2826 * use the same tools and format on all computers( fortran intrinsec CPU_TIME and WMPI_TIME) 
    29 It is bases on a linked chain of informations to be able to add dynamically new sections and add sub-sections[[BR]] 
    30 Implementation: 
     27 
     28It is bases on a linked chain of informations to be able to add dynamically new sections and add sub-sections[[BR]] Implementation: 
     29 
    3130 * CALL timing_init in nemogcm_init 
    3231 * CALL timing_finalize at the end of nemoggcm 
    3332 * at the end of step, IF( kt == nit000) CALL timing_reset (once the list of varibles has been built) 
    3433 * in each routine to instrument : CALL timing_start('NAME')    CALL timing_stop('NAME') 
    35 Imbricated sub-sections are allowed and their time is then subtracted from the mother section unless the call of timing of the section is done in the following way CALL timing_start('NAME')    CALL timing_stop('NAME',section)  
     34 
     35Imbricated sub-sections are allowed and their time is then subtracted from the mother section unless the call of timing of the section is done in the following way CALL timing_start('NAME')    CALL timing_stop('NAME',section) 
    3636 
    3737Sample of output: 
     38 
    3839{{{ 
    3940       CNRS - NERC - Met OFFICE - MERCATOR-ocean - CMCC - INGV 
     
    8081 
    8182}}} 
    82  
    8383Comparaison with prof output : 
     84 
    8485{{{ 
    8586Name                 %Time     Seconds     Cumsecs  #Calls   msec/call 
     
    9192.__traadv_tvd_NMOD_n   4.6        8.91      139.55     400     22.27 
    9293}}} 
    93  
    94 It was done in https://forge.ipsl.jussieu.fr/nemo/changeset/2771[[BR]] 
    95 NOt that timing is not needed to change dynamic allocation, It's just an opportunity, in case of we edit all the routines. 
     94It was done in https://forge.ipsl.jussieu.fr/nemo/changeset/2771[[BR]] NOt that timing is not needed to change dynamic allocation, It's just an opportunity, in case of we edit all the routines. 
    9695 
    9796==== 2- Auto-assignement ==== 
    98  
    9997Instead of choosing by hand the number of a working array, we introduce for each type of work arrays a structure a arrays, with an associated increment: 
     98 
    10099{{{ 
    101100   TYPE work_space_3d 
     
    108107}}} 
    109108Then in each routine, we declare local arrays as pointers 
     109 
    110110{{{ 
    111111  REAL(wp), DIMENSION (:,:,:), POINTER ::   zwi, zwz 
    112112}}} 
    113113And  we call the subroutines nemo_allocate which points to a work arrays and increment the counter, and nemo_deallocate to decrement 
     114 
    114115{{{ 
    115116 CALL nemo_allocate(zwi)     ! begin routine 
    116117 CALL nemo_deallocate(zwi)     ! end routine 
    117118}}} 
    118 It was implemented in wrk_nemo_2 and implemented for test in traadv_tvd. To avoid changing all routines before a definitive choice, we choose to keep the old way and duplicate wrk_nemo in wrk_nemo_2 (later saved as wrk_nemo_2_simple), so we declare the double amount of memory, this would of course not be the case if it was implemented in all routines[[BR]] 
    119 To avoid memory leaks, we could check for instance at the end of step that each counter is equal to one.[[BR]] 
    120 This was implemented here : http://forge.ipsl.jussieu.fr/nemo/changeset/2775 [[BR]] 
    121 and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90_simple 
     119It was implemented in wrk_nemo_2 and implemented for test in traadv_tvd. To avoid changing all routines before a definitive choice, we choose to keep the old way and duplicate wrk_nemo in wrk_nemo_2 (later saved as wrk_nemo_2_simple), so we declare the double amount of memory, this would of course not be the case if it was implemented in all routines[[BR]] To avoid memory leaks, we could check for instance at the end of step that each counter is equal to one.[[BR]] This was implemented here : http://forge.ipsl.jussieu.fr/nemo/changeset/2775 [[BR]] and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90_simple 
    122120 
    123121==== 3- Dynamic dynamic memory ==== 
    124  
    125 The point here is to avoid to have hard-coded the maximum of potential work arrays in use, and to optimize the memory size, especially for applications expensive in memory (biogeochemistry, assimilation)[[BR]] 
    126 Here again, as a preliminary test, the implementation is done is in wrk_nemo_2.F90 and the previous one is renamed wrk_nemo_2_simple.[[BR]] 
    127 For each type of work arrays, we use an associated chained list to build the working arrays needed. But when we exit a routine, we do not destroy the working array created but we just point back to the beginning of the list. If the following routine need one array more, we just add an element in the chain. Actually at the end of the first time step we should have built exactly  the total amount of memory needed.  
     122The point here is to avoid to have hard-coded the maximum of potential work arrays in use, and to optimize the memory size, especially for applications expensive in memory (biogeochemistry, assimilation)[[BR]] Here again, as a preliminary test, the implementation is done is in wrk_nemo_2.F90 and the previous one is renamed wrk_nemo_2_simple.[[BR]] For each type of work arrays, we use an associated chained list to build the working arrays needed. But when we exit a routine, we do not destroy the working array created but we just point back to the beginning of the list. If the following routine need one array more, we just add an element in the chain. Actually at the end of the first time step we should have built exactly  the total amount of memory needed. 
     123 
    128124{{{ 
    129125   TYPE work_space_3d 
     
    136132   TYPE(work_space_3d), POINTER :: s_wrk_3d_root, s_wrk_3d 
    137133}}} 
    138 Then same way than above 
    139 Then in each routine, we declare local arrays as pointers 
     134Then same way than above Then in each routine, we declare local arrays as pointers 
     135 
    140136{{{ 
    141137 REAL(wp), DIMENSION (:,:,:), POINTER ::   zwi, zwz 
     
    144140 CALL nemo_deallocate(zwi)     ! to come back  
    145141}}} 
    146  
    147142At his point, this looks very nice, but I'm questioning myself if we shouldn't simply use a standard dynamic memory implementation, instead of trying to do complicated things. I don't have the knowledge to answer. I guess the answer can be formulated in term of performances : 
     143 
    148144 * is it more expensive to allocate/deallocate at each call in a standard way 
    149145 * or to CALL a subroutine pointing toward an existing already allocated array 
    150 I got alos some concerns about future evolutions. If we imagine having a large variety of arrays (not only jpi,jpj,jpk) it could become hard to maintain.[[BR]] 
    151 [[BR]] 
    152  
     146 
     147I got alos some concerns about future evolutions. If we imagine having a large variety of arrays (not only jpi,jpj,jpk) it could become hard to maintain.[[BR]] [[BR]] 
    153148 
    154149Anyway, an example can be found there http://forge.ipsl.jussieu.fr/nemo/changeset/2776[[BR]] 
    155  and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90 
     150 
     151  and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90 
    156152 
    157153---- 
     
    159155Testing could consider (where appropriate) other configurations in addition to NVTK]. 
    160156 
    161 ||NVTK Tested||!'''YES/NO!'''|| 
    162 ||Other model configurations||!'''YES/NO!'''|| 
    163 ||Processor configurations tested||[ Enter processor configs tested here ]|| 
    164 ||If adding new functionality please confirm that the [[BR]]New code doesn't change results when it is switched off [[BR]]and !''works!'' when switched on||!'''YES/NO/NA!'''|| 
     157|| NVTK Tested || !'''YES/NO!''' || 
     158|| Other model configurations || !'''YES/NO!''' || 
     159|| Processor configurations tested || [ Enter processor configs tested here ] || 
     160|| If adding new functionality please confirm that the [[BR]]New code doesn't change results when it is switched off [[BR]]and !''works!'' when switched on || !'''YES/NO/NA!''' || 
    165161 
    166162(Answering UNSURE is likely to generate further questions from reviewers.) 
    167163 
    168 'Please add further summary details here' 
     164'''Testing dynamical memory:''' 
     165 
     166A sequence of tests has been done on Power6 (vargas) and titane (Bull novascale) in order to compare the 3 ways of coding the dynamical allocation.[[BR]]The GYRE configuration has been used, and the interface for nex dynamical allocation has been coded for this configuration.[[BR]]Testing has been done for 2 dimensions (CFG=24 and CFG=96 equivallent to global 1/4°). 
     167 
     168For all tests, the model runs properly and Elapsed and CPU time are equivallent for the 3 solutions and for a given configuration. Since the interface is identical for the 2 new build routine. It seems reasonable to implement the best solution, I;E. the last one, optimising memory. 
     169 
     170Implementation for GYRE took around 2 days. 
    169171 
    170172 * Processor configurations tested 
     
    172174 
    173175=== Bit Comparability === 
    174 ||Does this change preserve answers in your tested standard configurations (to the last bit) ?||!'''YES/NO !'''|| 
    175 ||Does this change bit compare across various processor configurations. (1xM, Nx1 and MxN are recommended)||!'''YES/NO!'''|| 
    176 ||Is this change expected to preserve answers in all possible model configurations?||!'''YES/NO!'''|| 
    177 ||Is this change expected to preserve all diagnostics? [[BR]]!,,!''Preserving answers in model runs does not necessarily imply preserved diagnostics. !''||!'''YES/NO!'''|| 
     176|| Does this change preserve answers in your tested standard configurations (to the last bit) ? || !'''YES/NO !''' || 
     177|| Does this change bit compare across various processor configurations. (1xM, Nx1 and MxN are recommended) || !'''YES/NO!''' || 
     178|| Is this change expected to preserve answers in all possible model configurations? || !'''YES/NO!''' || 
     179|| Is this change expected to preserve all diagnostics? [[BR]]!,,!''Preserving answers in model runs does not necessarily imply preserved diagnostics. !'' || !'''YES/NO!''' || 
    178180 
    179181If you answered !'''NO!''' to any of the above, please provide further details: 
     
    187189---- 
    188190=== System Changes === 
    189 ||Does your change alter namelists?||!'''YES/NO !'''|| 
    190 ||Does your change require a change in compiler options?||!'''YES/NO !'''|| 
     191|| Does your change alter namelists? || !'''YES/NO !''' || 
     192|| Does your change require a change in compiler options? || !'''YES/NO !''' || 
    191193 
    192194If any of these apply, please document the changes required here....... 
     
    198200---- 
    199201=== IPR issues === 
    200 ||Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO?||!'''YES/ NO !'''|| 
     202|| Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO? || !'''YES/ NO !''' || 
    201203 
    202204If No: