New URL for NEMO forge! http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.

2011WP/2011Stream2/DynamicMemory (diff) – NEMO

Context Navigation

Changes between Version 5 and Version 6 of 2011WP/2011Stream2/DynamicMemory

Timestamp:: 2010-11-20T08:50:57+01:00 (13 years ago)
Author:: gm
Comment:: Gurvan' comments added

Legend:

: Unmodified
: Added
: Removed
: Modified

2011WP/2011Stream2/DynamicMemory

-                      v5
+                      v6
 = Discussion of coding approach/style for dynamic memory =
+Last edited [[Timestamp]]
+[[PageOutline]]
+== S2-1 : Andrew P.  : opening of the discusion ==
 As a basis for the discussion, here's how I've currently coded NEMO (v.3.2) to use dynamic memory. I've used !''...!'' to indicate places where I've missed-out chunks of code for clarity/brevity.
 …
 #endif
 }}}
 The inclusion of the new code is currently controlled by the cpp key 'key_mpp_dyndist' but it would greatly improve the cleanliness of the code if we make a complete break from the static-memory version and thus can drop the use of this cpp key. My understanding of the conclusion of the meeting in Southampton was that this is what we're going to do.
 This addition to opa_init() calls two further new routines, opa_partition() and opa_alloc():
 {{{
    SUBROUTINE opa_partition
 …
 ''opa_alloc()'' oversees the allocation of all of the module-level arrays:
 {{{
    SUBROUTINE opa_alloc
 …
    END SUBROUTINE opa_alloc
 }}}
 Each of the modules must then be edited and the declaration of arrays with dimensions that depend upon jpi or jpj be changed. e.g. for OPA_SRC/DOM/dom_oce.F90:
 {{{
 MODULE dom_oce
 …
 #endif
 }}}
+Finally (I think), there's the issue of work-space or local arrays. These are arrays local to subroutines that are declared with their dimensions in terms of jpi and jpj. Since jpi and jpj are no longer parameters, these arrays are now 'automatic' -- that is, they are allocated on the stack when the program enters the subroutine and free'd again when it leaves. Running out of stack memory will cause the program to abort abruptly and therefore it is more controlled to use allocate for big workspace arrays. (Although the amount of available stack can be set to 'unlimited' by using the shell's '''''ulimit''''' command, what limit this actually results in will be OS/set-up dependent.)
+I suggest therefore that we edit the code such that any large (3d or higher?) workspace arrays are made explicitly allocatable and then allocated the first time the subroutine is called. If we declare them with the SAVE attribute then the allocated memory will remain available to us in future calls, ''e.g.'':
+Finally (I think), there's the issue of work-space or local arrays. These are arrays local to subroutines that are declared with their dimensions in terms of jpi and jpj. Since jpi and jpj are no longer parameters, these arrays are now 'automatic' -- that is, they are allocated on the stack when the program enters the subroutine and free'd again when it leaves. Running out of stack memory will cause the program to abort abruptly and therefore it is more controlled to use allocate for big workspace arrays. (Although the amount of available stack can be set to 'unlimited' by using the shell's '''''ulimit''''' command, what limit this actually results in will be OS/set-up dependent.) I suggest therefore that we edit the code such that any large (3d or higher?) workspace arrays are made explicitly allocatable and then allocated the first time the subroutine is called. If we declare them with the SAVE attribute then the allocated memory will remain available to us in future calls, ''e.g.'':
 {{{
    SUBROUTINE dia_ptr( kt )
 …
 If an allocate() is done in this way then handling any failure is difficult since there's no guarantee that the allocate() call will have failed for all MPI processes. Therefore, I think the best bet is to print out an error message and then do an MPI_Abort on MPI_COMM_WORLD.
 ----
+>>Richard H. start
+I agree with the sentiment about dropping 'key_mpp_dyndist' in favour of only supporting the dynamic memory code. (On the basis that proliferation of cpp keys makes maintenance and development difficult in the long term and implies the need to test model developments using equivalent configurations under both static and dynamic configurations).
+Re the local array work space. Agree that we can't rely on ulimit. With regard to allocating and saving large workspace arrays, would it be viable to allocate space for these in some generic sense at the start of the run rather than locally within each subroutine or code area? That might give us the opportunity of recycling the same space rather than allocating space specifically for each subroutine or code area. It might, however imply the need to
 pass the generic array around through argument lists e.g.:
+== S2-2 : Richard H.  comments ==
+I agree with the sentiment about dropping 'key_mpp_dyndist' in favour of only supporting the dynamic memory code. (On the basis that proliferation of cpp keys makes maintenance and development difficult in the long term and implies the need to test model developments using equivalent configurations under both static and dynamic configurations).
+Re the local array work space. Agree that we can't rely on ulimit. With regard to allocating and saving large workspace arrays, would it be viable to allocate space for these in some generic sense at the start of the run rather than locally within each subroutine or code area? That might give us the opportunity of recycling the same space rather than allocating space specifically for each subroutine or code area. It might, however imply the need to  pass the generic array around through argument lists e.g.:
 {{{
 …
     CALL SOME_ROUTINE_Z(GEN_WORK_ARRAY1(1,1,1,1), GEN_WORK_ARRAY2(1,1,1,2), etc)
 }}}
 Then in the routines we have:
 {{{
     SUBROUTINE SOME_ROUTINE_X(local_work_array_X1, local_work_array_X2, etc)
 …
     END SUBROUTINE
 }}}
+Anyway, just a thought.
+Aborting using MPI_COMM_WORLD is particularly pertinent to coupled (OASIS based) models (otherwise things just tend to dangle).
+>>Richard H. end
+Anyway, just a thought.
+Aborting using MPI_COMM_WORLD is particularly pertinent to coupled (OASIS based) models (otherwise things just tend to dangle).
 ----
+== S2-3 : Gurvan M.  comments ==
+ * Definitively, we have to make a complete break from static-memory version. The key_mpp_dyndist should disappear. We have all agreed on that at the developer committee.
+ * A namelist (namcfg, cfg stands for ConFiGuration) will provide the domain size (jpiglo, jpjglo, jpk) and the total number of processors and the cutting in i and j direction (jpnij, jpni, jpnj), as well as the configuration name and resolution (cp_cfg, jp_cfg) etc...  In fact all the information given in par_oce_...h90
+ * Obviously the full dynamical allocation will result in the suppression of almost all CPP keys. A priori, the only exceptions are the keys related to the CPP "substitute" (vertical coordinate, eddy coefficient 2D and 3D...). Please, for the first implementation, do not suppress the keys. It should be done as a second step. This will require to significantly change the namelists and the OPA documentation...  In other word, this will be a version 4.0 not simply a 3.4....
+ * Note that since v3.3 there is a step_oce.F90 module corresponding to almost all modules used in step.F90. A simple USE step_oce in opa_alloc will simplify the routine.
+ * Note also that, for modularity reasons, the sea-ice and biogeochemical tracers should not be allocated by the opa_alloc routine. Instead, a lim_alloc can be created in LIM_SRC_3 (lim_alloc_2 in LIM_SRC_2) called by sbc_init (itself called in opa_init)  if sea-ice is activated and  trc_alloc (or even pisces_alloc, lobster_alloc etc  for TOP_SRC...)   called by trc_init.
+ * issue of work-space or local arrays:
+  In my opinion, we can simply return back to what was done in earlier versions of OPA (v1.0 to v6.0 !!). Declare and allocate one for all 4 3D work arrays, and 4 2D wok arrays. Then use them as workspace in the subroutines. I say 4, as ti was sufficient in those release. Currently, some more can be required, and with the Griffies operator and the merge of TRA and TRC routines some 4D local arrays have appeared arrays.
+  We can check in the code the maximum number of 4D, 3D and 2D arrays are required  to decide the exact number. It should not be that large.
+  Note that such a technique is already used in some modules.For example in zdftke, I use the fact that after field (ua, va, ta, sa) are only used in the momentum and tracer part, so that in the computation of the physics there are considered as workspace.
+  So what I suggest a new module wrk_nemo  (_nemo since it will be probably used in OPA, LIM, CICE, TOP...) :
+{{{
+MODULE wrk_nemo
+   !!======================================================================
+   !!                       ***  MODULE  wrk_nemo  ***
+   !! NEMO work space:  define and allocated work space arrays used in all component of NEMO
+   !!=====================================================================
+   !! History :  4.0  !  2011-01  (xxxx)  Original code
+   !!----------------------------------------------------------------------
+   !!----------------------------------------------------------------------
+   !!   wrk_malloc    : update momentum and tracer Kz from a tke scheme
+   !!----------------------------------------------------------------------
+   USE par_oce        ! ocean parameters
+   USE in_out_manager ! I/O manager
+   IMPLICIT NONE
+   PRIVATE
+   PUBLIC   wrk_malloc   ! routine called in opa module (opa_init routine)
+   LOGICAL , PUBLIC, PARAMETER              ::   lk_zdftke = .TRUE.  !: TKE vertical mixing flag
+   REAL(wp), DIMENSION(:,:)    , PUBLIC ::   wrk_2d_1, wrk_2d_1, ...   !: 2D workspace
+   REAL(wp), DIMENSION(:,:,:)  , PUBLIC ::   wrk_3d_1, wrk_3d_1, ...   !: 3D workspace
+   REAL(wp), DIMENSION(:,:,:,:), PUBLIC ::   wrk_4d_1, wrk_4d_1, ...   !: 4D workspace
+   !! * Substitutions
+#  include "domzgr_substitute.h90"
+#  include "vectopt_loop_substitute.h90"
+   !!----------------------------------------------------------------------
+   !! NEMO/OPA 4.0 , NEMO Consortium (2010)
+   !! $Id$
+   !! Software governed by the CeCILL licence     (NEMOGCM/NEMO_CeCILL.txt)
+   !!----------------------------------------------------------------------
+CONTAINS
+   SUBROUTINE wrk_malloc
+      !!----------------------------------------------------------------------
+      !!                   ***  ROUTINE wrk_malloc  ***
+      !!
+      !! ** Purpose :   Define in memory one for all the NEMO 2D, 3D and 4d work space arrays
+      !!----------------------------------------------------------------------
+      INTEGER :: ierror   ! local integer
+      !!----------------------------------------------------------------------
+      !
+      ALLOCATE(wrk_2d_1(jpi,jpj)           , wrk_2d_1(jpi,jpj)          , ...     &
+         &     wrk_3d_1(jpi,jpj,jpk)       , wrk_3d_1(jpi,jpj,jpk)      , ...     &
+         &     wrk_4d_1(jpi,jpj,jpk, jpts) , wrk_4d_1(jpi,jpj,jpk,jpts) , ...     , Stat=ierror )
+      !
+      IF( ierror /= 0 )   CALL ctl_stop( 'wrk_malloc: unable to allocate work arrays' )
+      !
+   END SUBROUTINE wrk_malloc
+   !!======================================================================
+END MODULE wrk_nemo
+}}}
+Then, your example of dia_ptr routine becomes:
+{{{
+   SUBROUTINE dia_ptr( kt )
+      !!----------------------------------------------------------------------
+      !!                  ***  ROUTINE dia_ptr  ***
+      ...
+      !!----------------------------------------------------------------------
+      USE wrk_nemo,   vt  =>   wrk_3D_1   ! use ua as workspace
+      USE wrk_nemo,   vs  =>   wrk_3D_2   ! use va as workspace
+      !
+      INTEGER, INTENT(in) ::   kt   ! ocean time step index
+      !
+      INTEGER  ::   jk, jj, ji   ! dummy loop
+      REAL(wp) ::   zsverdrup    ! conversion from m3/s to Sverdrup
+      REAL(wp) ::   zpwatt       ! conversion from W    to PW
+      REAL(wp) ::   zggram       ! conversion from g    to Pg
+      !!----------------------------------------------------------------------
+      !!
+      IF( kt == nit000 .OR. MOD( kt, nf_ptr ) == 0 )   THEN
+      ...
+      ...
+}}}
+  Note that in this example, I have already introduced a 'USE oce, vt   => ua' ...   since dia_ptr is a diagnostics, so that after arrays are available as work space.
+  The main '''DANGER''' of this approach is that the developer must carefull check that the work space he wants to use is not already used to store some information. In particular in case of a subroutine, sub1,  calling another one, sub2, if sub2 needs a work space it will be more secured to have the work space in argument of sub2.
+  I think, we can impose a rule for all subroutine called a lower level than step or opa_init, the required work space have to be passed in argument.
+  I don't see any other drawbacks for this technique...  I also think that if we carefully check and rewrote some part of the code, then hopefully, only 2D and 3D work space arrays will be necessary.
+----
+== S2-x : XXX'  comments ==
+----