New URL for NEMO forge! http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.

ticket/0829 (diff) – NEMO

Context Navigation

Changes between Version 16 and Version 17 of ticket/0829

Timestamp:: 2011-10-24T12:15:26+02:00 (13 years ago)
Author:: clevy
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ticket/0829

-                      v16
+                      v17
+[[PageOutline]]
+Last edited [[Timestamp]]
+[[BR]]
+'''Author''' : rblod (Rachid Benshila)
+[[PageOutline]] Last edited [[Timestamp]]
+'''Author''' : rblod (Rachid Benshila)
 '''ticket''' : #829
 '''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem      dev_r2769_LOCEAN_dynamic_mem ]
+----
+'''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem dev_r2769_LOCEAN_dynamic_mem]
+----
 === Description ===
+Computing  aspects of dynamic memory implementation are already described there http://forge.ipsl.jussieu.fr/nemo/wiki/2011WP/2011Stream2/DynamicMemory, possible consequences  there https://forge.ipsl.jussieu.fr/nemo/wiki/2011Stream2/DynamicMemory_improvments . This branch deals with the first aspect, ie the practical implementation. [[BR]]
+Dynamic memory implementation is clearly a step forward, an current implementation from branch dev_r2586_dynamic_mem; is quiet clean, with careful checks of availability of the work arrays. However:
+Computing  aspects of dynamic memory implementation are already described there http://forge.ipsl.jussieu.fr/nemo/wiki/2011WP/2011Stream2/DynamicMemory, possible consequences  there https://forge.ipsl.jussieu.fr/nemo/wiki/2011Stream2/DynamicMemory_improvments . This branch deals with the first aspect, ie the practical implementation. [[BR]] Dynamic memory implementation is clearly a step forward, an current implementation from branch dev_r2586_dynamic_mem; is quiet clean, with careful checks of availability of the work arrays. However:
  * Assignation of work arrays by hand leads to some difficulties considering the number of options and combinations of options available in NEMO
  * In term of memory, the number of work arrays have to be hard-coded to the maximum combination, ie we always use more memory than needed
 Investigation of improvements follows the following steps:
+ * Implementation of timing functionalities : this topic has been discussed within NEMO group for years, since the dynamic memory developments are impacting all the routines, implementing timing in the same time makes sense
+ * Implementation of timing functionalities : this topic has been discussed within NEMO group for years, since the dynamic memory developments are impacting all the routines, implementing timing in the same time makes sense
  * Small changes in current implementation : the work-arrays are a in list automatically incremented and decremented, no more assigned by hand. This solves limitation 1 above.
  * More radical change : the working space is build dynamically. This solves limitation 2.
 ==== 1- Timing ====
 This functionality doesn't aim to replace advanced software used for optimisation but:
  * to give a rough idea of performance (CPU and elapsed)
  * use the same tools and format on all computers( fortran intrinsec CPU_TIME and WMPI_TIME)
+It is bases on a linked chain of informations to be able to add dynamically new sections and add sub-sections[[BR]]
+Implementation:
+It is bases on a linked chain of informations to be able to add dynamically new sections and add sub-sections[[BR]] Implementation:
  * CALL timing_init in nemogcm_init
  * CALL timing_finalize at the end of nemoggcm
  * at the end of step, IF( kt == nit000) CALL timing_reset (once the list of varibles has been built)
  * in each routine to instrument : CALL timing_start('NAME')    CALL timing_stop('NAME')
+Imbricated sub-sections are allowed and their time is then subtracted from the mother section unless the call of timing of the section is done in the following way CALL timing_start('NAME')    CALL timing_stop('NAME',section)
+Imbricated sub-sections are allowed and their time is then subtracted from the mother section unless the call of timing of the section is done in the following way CALL timing_start('NAME')    CALL timing_stop('NAME',section)
 Sample of output:
 {{{
        CNRS - NERC - Met OFFICE - MERCATOR-ocean - CMCC - INGV
 …
 }}}
 Comparaison with prof output :
 {{{
 Name                 %Time     Seconds     Cumsecs  #Calls   msec/call
 …
 .__traadv_tvd_NMOD_n   4.6        8.91      139.55     400     22.27
 }}}
+It was done in https://forge.ipsl.jussieu.fr/nemo/changeset/2771[[BR]]
+NOt that timing is not needed to change dynamic allocation, It's just an opportunity, in case of we edit all the routines.
+It was done in https://forge.ipsl.jussieu.fr/nemo/changeset/2771[[BR]] NOt that timing is not needed to change dynamic allocation, It's just an opportunity, in case of we edit all the routines.
 ==== 2- Auto-assignement ====
 Instead of choosing by hand the number of a working array, we introduce for each type of work arrays a structure a arrays, with an associated increment:
 {{{
    TYPE work_space_3d
 …
 }}}
 Then in each routine, we declare local arrays as pointers
 {{{
   REAL(wp), DIMENSION (:,:,:), POINTER ::   zwi, zwz
 }}}
 And  we call the subroutines nemo_allocate which points to a work arrays and increment the counter, and nemo_deallocate to decrement
 {{{
  CALL nemo_allocate(zwi)     ! begin routine
  CALL nemo_deallocate(zwi)     ! end routine
 }}}
+It was implemented in wrk_nemo_2 and implemented for test in traadv_tvd. To avoid changing all routines before a definitive choice, we choose to keep the old way and duplicate wrk_nemo in wrk_nemo_2 (later saved as wrk_nemo_2_simple), so we declare the double amount of memory, this would of course not be the case if it was implemented in all routines[[BR]]
+To avoid memory leaks, we could check for instance at the end of step that each counter is equal to one.[[BR]]
+This was implemented here : http://forge.ipsl.jussieu.fr/nemo/changeset/2775 [[BR]]
+and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90_simple
+It was implemented in wrk_nemo_2 and implemented for test in traadv_tvd. To avoid changing all routines before a definitive choice, we choose to keep the old way and duplicate wrk_nemo in wrk_nemo_2 (later saved as wrk_nemo_2_simple), so we declare the double amount of memory, this would of course not be the case if it was implemented in all routines[[BR]] To avoid memory leaks, we could check for instance at the end of step that each counter is equal to one.[[BR]] This was implemented here : http://forge.ipsl.jussieu.fr/nemo/changeset/2775 [[BR]] and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90_simple
 ==== 3- Dynamic dynamic memory ====
+The point here is to avoid to have hard-coded the maximum of potential work arrays in use, and to optimize the memory size, especially for applications expensive in memory (biogeochemistry, assimilation)[[BR]]
+Here again, as a preliminary test, the implementation is done is in wrk_nemo_2.F90 and the previous one is renamed wrk_nemo_2_simple.[[BR]]
+For each type of work arrays, we use an associated chained list to build the working arrays needed. But when we exit a routine, we do not destroy the working array created but we just point back to the beginning of the list. If the following routine need one array more, we just add an element in the chain. Actually at the end of the first time step we should have built exactly  the total amount of memory needed.
+The point here is to avoid to have hard-coded the maximum of potential work arrays in use, and to optimize the memory size, especially for applications expensive in memory (biogeochemistry, assimilation)[[BR]] Here again, as a preliminary test, the implementation is done is in wrk_nemo_2.F90 and the previous one is renamed wrk_nemo_2_simple.[[BR]] For each type of work arrays, we use an associated chained list to build the working arrays needed. But when we exit a routine, we do not destroy the working array created but we just point back to the beginning of the list. If the following routine need one array more, we just add an element in the chain. Actually at the end of the first time step we should have built exactly  the total amount of memory needed.
 {{{
    TYPE work_space_3d
 …
    TYPE(work_space_3d), POINTER :: s_wrk_3d_root, s_wrk_3d
 }}}
 Then same way than above
+Then in each routine, we declare local arrays as pointers
+Then same way than above Then in each routine, we declare local arrays as pointers
 {{{
  REAL(wp), DIMENSION (:,:,:), POINTER ::   zwi, zwz
 …
  CALL nemo_deallocate(zwi)     ! to come back
 }}}
 At his point, this looks very nice, but I'm questioning myself if we shouldn't simply use a standard dynamic memory implementation, instead of trying to do complicated things. I don't have the knowledge to answer. I guess the answer can be formulated in term of performances :
  * is it more expensive to allocate/deallocate at each call in a standard way
  * or to CALL a subroutine pointing toward an existing already allocated array
+I got alos some concerns about future evolutions. If we imagine having a large variety of arrays (not only jpi,jpj,jpk) it could become hard to maintain.[[BR]]
+[[BR]]
+I got alos some concerns about future evolutions. If we imagine having a large variety of arrays (not only jpi,jpj,jpk) it could become hard to maintain.[[BR]] [[BR]]
 Anyway, an example can be found there http://forge.ipsl.jussieu.fr/nemo/changeset/2776[[BR]]
+ and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90
+  and wrk_nemo_2 is there http://forge.ipsl.jussieu.fr/nemo/browser/branches/2011/dev_r2769_LOCEAN_dynamic_mem/NEMOGCM/NEMO/OPA_SRC/wrk_nemo_2.F90
 ----
 …
 Testing could consider (where appropriate) other configurations in addition to NVTK].
 ||NVTK Tested||!'''YES/NO!'''||
 ||Other model configurations||!'''YES/NO!'''||
 ||Processor configurations tested||[ Enter processor configs tested here ]||
 ||If adding new functionality please confirm that the [[BR]]New code doesn't change results when it is switched off [[BR]]and !''works!'' when switched on||!'''YES/NO/NA!'''||
+|| NVTK Tested || !'''YES/NO!''' ||
+|| Other model configurations || !'''YES/NO!''' ||
+|| Processor configurations tested || [ Enter processor configs tested here ] ||
+|| If adding new functionality please confirm that the [[BR]]New code doesn't change results when it is switched off [[BR]]and !''works!'' when switched on || !'''YES/NO/NA!''' ||
 (Answering UNSURE is likely to generate further questions from reviewers.)
+'Please add further summary details here'
+'''Testing dynamical memory:'''
+A sequence of tests has been done on Power6 (vargas) and titane (Bull novascale) in order to compare the 3 ways of coding the dynamical allocation.[[BR]]The GYRE configuration has been used, and the interface for nex dynamical allocation has been coded for this configuration.[[BR]]Testing has been done for 2 dimensions (CFG=24 and CFG=96 equivallent to global 1/4°).
+For all tests, the model runs properly and Elapsed and CPU time are equivallent for the 3 solutions and for a given configuration. Since the interface is identical for the 2 new build routine. It seems reasonable to implement the best solution, I;E. the last one, optimising memory.
+Implementation for GYRE took around 2 days.
  * Processor configurations tested
 …
 === Bit Comparability ===
 ||Does this change preserve answers in your tested standard configurations (to the last bit) ?||!'''YES/NO !'''||
 ||Does this change bit compare across various processor configurations. (1xM, Nx1 and MxN are recommended)||!'''YES/NO!'''||
 ||Is this change expected to preserve answers in all possible model configurations?||!'''YES/NO!'''||
 ||Is this change expected to preserve all diagnostics? [[BR]]!,,!''Preserving answers in model runs does not necessarily imply preserved diagnostics. !''||!'''YES/NO!'''||
+|| Does this change preserve answers in your tested standard configurations (to the last bit) ? || !'''YES/NO !''' ||
+|| Does this change bit compare across various processor configurations. (1xM, Nx1 and MxN are recommended) || !'''YES/NO!''' ||
+|| Is this change expected to preserve answers in all possible model configurations? || !'''YES/NO!''' ||
+|| Is this change expected to preserve all diagnostics? [[BR]]!,,!''Preserving answers in model runs does not necessarily imply preserved diagnostics. !'' || !'''YES/NO!''' ||
 If you answered !'''NO!''' to any of the above, please provide further details:
 …
 ----
 === System Changes ===
 ||Does your change alter namelists?||!'''YES/NO !'''||
 ||Does your change require a change in compiler options?||!'''YES/NO !'''||
+|| Does your change alter namelists? || !'''YES/NO !''' ||
+|| Does your change require a change in compiler options? || !'''YES/NO !''' ||
 If any of these apply, please document the changes required here.......
 …
 ----
 === IPR issues ===
 ||Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO?||!'''YES/ NO !'''||
+|| Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO? || !'''YES/ NO !''' ||
 If No: