New URL for NEMO forge! http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.

2011WP/2011Stream2/DynamicMemory (diff) – NEMO

Context Navigation

Changes between Version 14 and Version 15 of 2011WP/2011Stream2/DynamicMemory

Timestamp:: 2011-02-09T17:27:55+01:00 (13 years ago)
Author:: gm
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

2011WP/2011Stream2/DynamicMemory

-                      v14
+                      v15
 = Discussion of coding approach/style for dynamic memory =
 Last edited [[Timestamp]]
 …
 == S2-1 : Andrew P.  : opening of the discusion ==
 As a basis for the discussion, here's how I've currently coded NEMO (v.3.2) to use dynamic memory. I've used !''...!'' to indicate places where I've missed-out chunks of code for clarity/brevity.
 …
 ----
 == S2-2 : Richard H.  comments ==
 I agree with the sentiment about dropping 'key_mpp_dyndist' in favour of only supporting the dynamic memory code. (On the basis that proliferation of cpp keys makes maintenance and development difficult in the long term and implies the need to test model developments using equivalent configurations under both static and dynamic configurations).
 …
 Aborting using MPI_COMM_WORLD is particularly pertinent to coupled (OASIS based) models (otherwise things just tend to dangle).
+----
+----
 == S2-3 : Gurvan M.  comments ==
  * Definitively, we have to make a complete break from static-memory version. The key_mpp_dyndist should disappear. We have all agreed on that at the developer committee.
 …
  * issue of work-space or local arrays:
   In my opinion, we can simply return back to what was done in earlier versions of OPA (v1.0 to v6.0 !!). Declare and allocate one for all 4 3D work arrays, and 4 2D wok arrays. Then use them as workspace in the subroutines. I say 4, as ti was sufficient in those release. Currently, some more can be required, and with the Griffies operator and the merge of TRA and TRC routines some 4D local arrays have appeared arrays.
+  In my opinion, we can simply return back to what was done in earlier versions of OPA (v1.0 to v6.0 !!). Declare and allocate one for all 4 3D work arrays, and 4 2D wok arrays. Then use them as workspace in the subroutines. I say 4, as ti was sufficient in those release. Currently, some more can be required, and with the Griffies operator and the merge of TRA and TRC routines some 4D local arrays have appeared arrays.
   We can check in the code the maximum number of 4D, 3D and 2D arrays are required  to decide the exact number. It should not be that large.
   Note that such a technique is already used in some modules.For example in zdftke, I use the fact that after field (ua, va, ta, sa) are only used in the momentum and tracer part, so that in the computation of the physics there are considered as workspace.
   So what I suggest a new module wrk_nemo  (_nemo since it will be probably used in OPA, LIM, CICE, TOP...) :
+  Note that such a technique is already used in some modules.For example in zdftke, I use the fact that after field (ua, va, ta, sa) are only used in the momentum and tracer part, so that in the computation of the physics there are considered as workspace.
+  So what I suggest a new module wrk_nemo  (_nemo since it will be probably used in OPA, LIM, CICE, TOP...) :
 {{{
 …
 END MODULE wrk_nemo
 }}}
+Then, your example of dia_ptr routine becomes:
+Then, your example of dia_ptr routine becomes:
 {{{
 …
       ...
 }}}
   Note that in this example, I have already introduced a 'USE oce, vt   => ua' ...   since dia_ptr is a diagnostics, so that after arrays are available as work space.
 …
 ----
 == S2-4 : Italo E.  comments ==
 Hi all, I have just a couple of comments.
 Re the opa_partition routine and the policy for choosing the "best" partition, I suggest to set jpni and jpnj such that the local subdomain is as much "square" as possible. Indeed the "best" performance, with the current domain decomposition, is reached when the local subdomain has a square shape.
+I suggest to modify the opa_patition as follows
+Re the opa_partition routine and the policy for choosing the "best" partition, I suggest to set jpni and jpnj such that the local subdomain is as much "square" as possible. Indeed the "best" performance, with the current domain decomposition, is reached when the local subdomain has a square shape. I suggest to modify the opa_patition as follows
 {{{
 ...
 …
 ...
 }}}
+Re the allocation of work arrays.
+The sharing of work arrays among different routines gives us the possibility to save relevant memory space; so the idea to have a module such as wrk_nemo could be useful. However the usage of those arrays could introduce several contraindications: 1. the code could be less readable; 2. when I write a new routine that calls some other already available, I must be sure that I will not use the same work arrays.
+Re the allocation of work arrays. The sharing of work arrays among different routines gives us the possibility to save relevant memory space; so the idea to have a module such as wrk_nemo could be useful. However the usage of those arrays could introduce several contraindications: 1. the code could be less readable; 2. when I write a new routine that calls some other already available, I must be sure that I will not use the same work arrays.
 Some actions can be adopted in order to reduce the dangerously of such work arrays, but I would avoid to use routine arguments for passing work arrays. Typically the usage of work arrays is strictly related to the kind of implementation of the routine; on the other hand, the routine prototype should be as stable as possible during the refinement/optimization/modification of the routine implementation. The maintenance of the code becomes very heavy if updating the implementation of one routine implies also the modification of its prototype.
+For those routines, at lower level, I suggest to declare locally their work allocatable arrays with the SAVE attribute
+----
+For those routines, at lower level, I suggest to declare locally their work allocatable arrays with the SAVE attribute
+----
 == S2-5 : Andrew P's follow-up  comments ==
 I like Gurvan's suggestion of a module containing globally-accessible work-space arrays. We could add some error-checking functionality to this by having an 'in_use' flag for each work-space array in the module. Before using a work-space array, a developer should check that the appropriate flag is .FALSE. and if it is, set it to .TRUE. while they are using it. Once they are done using the array the flag should be set back to .FALSE.
 …
 ----
 == S2-6 : Marie-Alice Foujols'  comments ==
 As this modification will impact all the code, I suggest to use a script to easily redo modification. It'll be usefull for NEMO users to compare old part of their own copie of code with new one. If this script is distributed, they could use it to change their code and to easily incorpore their modifications to the new version. I suggest also to avoid cosmetic changse (move of comments, line splitting, ....) for the same reason : reduce time for users to compare their own copie with new version of NEMO including dynamic allocation.
 …
 Hope this helps.
+----
+----
 == S2-7 : Andy Porter's 3rd set of comments ==
 I can appreciate that an almost global change like this will be difficult for users who have locally modified versions. Ideally the source-code revision-control system would facilitate applying the changes to a locally-modified version/branch - one that isn't in the official repository. Unfortunately I don't think subversion has this functionality (although I'd be very pleased to learn otherwise). Certainly I'll do my best to avoid unnecessary cosmetic changes. However, while I can imagine that scripting the change of module arrays from static to dynamic might be possible, I don't think the same can be said of the work-space arrays and they account for a lot of the code changes.
+In fact, I'm discovering that some routines have an awful lot of workspace arrays.
+e.g.:
+In fact, I'm discovering that some routines have an awful lot of workspace arrays. e.g.:
 {{{
 …
       IF( kt == nit000 ) THEN             !* initialisation
 }}}
+I make that 21 2D workspace arrays! Should the global workspace module contain that
+many or should we make some of these into module-wide arrays?
+Do people want jpk to be treated like jpi and jpj and have it become a run-time
+variable or is it OK to leave it as a compile-time parameter? My thinking is that
+one doesn't change the no. of levels in a model lightly and it has no bearing on
+the MPP domain decomposition.
+----
+I make that 21 2D workspace arrays! Should the global workspace module contain that many or should we make some of these into module-wide arrays?
+Do people want jpk to be treated like jpi and jpj and have it become a run-time  variable or is it OK to leave it as a compile-time parameter? My thinking is that  one doesn't change the no. of levels in a model lightly and it has no bearing on  the MPP domain decomposition.
+----
+== S2-8 : Gurvan's 2nd comments ==
+I don't thing having many 2D work arrays is a problem. 21 2D arrays are still much smaller than a single 3D array (jpk is usually between 30 and 70).[[BR]]As a starting point, I prefer the solution in which we define as many 2D and 3D allocatable working arrays as necessary in the worth case. [[BR]]In a first step this will be much more simple. In a second stage, if the large number of work arrays is only for a few modules that are not systematically used, then we can decide to only systematically allocate let say 10 work arrays and in those module allocate the additional one (obviously testing before whether they are already allocated or not).
+For jpk, it is true that jpk will not be changed at run-time, BUT with AGRIF the mother and child can have a different jpk (this is a new feature planned to be introduced this year). Therefore jpk MUST be considered as a run-time variable together with jpi and jpj.
+About the computation of jpni, jpnj at run-time or in namelist....  the problem I have in mind is the suppression of land-only processor. For the moment the user give the i and j processor cuting AND the number of really used processor (jpnij). It is unclear for me how this can be chosen at run-time...
+----
 == S2-x : XXX'  comments ==
+----
+----