New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
Changeset 11536 for NEMO/trunk/doc/latex/NEMO/subfiles/chap_LBC.tex – NEMO

Ignore:
Timestamp:
2019-09-11T15:54:18+02:00 (5 years ago)
Author:
smasson
Message:

trunk: merge dev_r10984_HPC-13 into the trunk

File:
1 edited

Legend:

Unmodified
Added
Removed
  • NEMO/trunk/doc/latex/NEMO/subfiles/chap_LBC.tex

    r11445 r11536  
    2020{Boundary condition at the coast (\protect\np{rn\_shlat})} 
    2121\label{sec:LBC_coast} 
    22 %--------------------------------------------nam_lbc------------------------------------------------------- 
     22%--------------------------------------------namlbc------------------------------------------------------- 
    2323 
    2424\nlst{namlbc} 
     
    113113\begin{description} 
    114114 
    115 \item[free-slip boundary condition (\np{rn\_shlat}\forcode{ = 0}):] the tangential velocity at 
     115\item[free-slip boundary condition (\np{rn\_shlat}\forcode{=0}):] the tangential velocity at 
    116116  the coastline is equal to the offshore velocity, 
    117117  \ie\ the normal derivative of the tangential velocity is zero at the coast, 
     
    119119  (\autoref{fig:LBC_shlat}-a). 
    120120 
    121 \item[no-slip boundary condition (\np{rn\_shlat}\forcode{ = 2}):] the tangential velocity vanishes at the coastline. 
     121\item[no-slip boundary condition (\np{rn\_shlat}\forcode{=2}):] the tangential velocity vanishes at the coastline. 
    122122  Assuming that the tangential velocity decreases linearly from 
    123123  the closest ocean velocity grid point to the coastline, 
     
    251251\label{sec:LBC_mpp} 
    252252 
    253 For massively parallel processing (mpp), a domain decomposition method is used. 
    254 The basic idea of the method is to split the large computation domain of a numerical experiment into 
    255 several smaller domains and solve the set of equations by addressing independent local problems. 
    256 Each processor has its own local memory and computes the model equation over a subdomain of the whole model domain. 
    257 The subdomain boundary conditions are specified through communications between processors which 
    258 are organized by explicit statements (message passing method). 
    259  
    260 A big advantage is that the method does not need many modifications of the initial \fortran code. 
    261 From the modeller's point of view, each sub domain running on a processor is identical to the "mono-domain" code. 
    262 In addition, the programmer manages the communications between subdomains, 
    263 and the code is faster when the number of processors is increased. 
    264 The porting of OPA code on an iPSC860 was achieved during Guyon's PhD [Guyon et al. 1994, 1995] 
    265 in collaboration with CETIIS and ONERA. 
    266 The implementation in the operational context and the studies of performance on 
    267 a T3D and T3E Cray computers have been made in collaboration with IDRIS and CNRS. 
    268 The present implementation is largely inspired by Guyon's work [Guyon 1995]. 
     253%-----------------------------------------nammpp-------------------------------------------- 
     254 
     255\nlst{nammpp} 
     256%----------------------------------------------------------------------------------------------- 
     257 
     258For massively parallel processing (mpp), a domain decomposition method is used. The basic idea of the method is to split the large computation domain of a numerical experiment into several smaller domains and solve the set of equations by addressing independent local problems. Each processor has its own local memory and computes the model equation over a subdomain of the whole model domain. The subdomain boundary conditions are specified through communications between processors which are organized by explicit statements (message passing method). The present implementation is largely inspired by Guyon's work [Guyon 1995]. 
    269259 
    270260The parallelization strategy is defined by the physical characteristics of the ocean model. 
     
    272262depend at the very most on one neighbouring point. 
    273263The only non-local computations concern the vertical physics 
    274 (implicit diffusion, turbulent closure scheme, ...) (delocalization over the whole water column), 
    275 and the solving of the elliptic equation associated with the surface pressure gradient computation 
    276 (delocalization over the whole horizontal domain). 
     264(implicit diffusion, turbulent closure scheme, ...). 
    277265Therefore, a pencil strategy is used for the data sub-structuration: 
    278266the 3D initial domain is laid out on local processor memories following a 2D horizontal topological splitting. 
     
    284272each processor sends to its neighbouring processors the update values of the points corresponding to 
    285273the interior overlapping area to its neighbouring sub-domain (\ie\ the innermost of the two overlapping rows). 
    286 The communication is done through the Message Passing Interface (MPI). 
     274Communications are first done according to the east-west direction and next according to the north-south direction. There is no specific communications for the corners. The communication is done through the Message Passing Interface (MPI) and requires \key{mpp\_mpi}. Use also \key{mpi2} if MPI3 is not available on your computer. 
    287275The data exchanges between processors are required at the very place where 
    288276lateral domain boundary conditions are set in the mono-domain computation: 
    289277the \rou{lbc\_lnk} routine (found in \mdl{lbclnk} module) which manages such conditions is interfaced with 
    290 routines found in \mdl{lib\_mpp} module when running on an MPP computer (\ie\ when \key{mpp\_mpi} defined). 
    291 It has to be pointed out that when using the MPP version of the model, 
    292 the east-west cyclic boundary condition is done implicitly, 
    293 whilst the south-symmetric boundary condition option is not available. 
     278routines found in \mdl{lib\_mpp} module. 
     279The output file \textit{communication\_report.txt} provides the list of which routines do how 
     280many communications during 1 time step of the model.\\ 
    294281 
    295282%>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
     
    305292%>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
    306293 
    307 In the standard version of \NEMO, the splitting is regular and arithmetic. 
    308 The i-axis is divided by \jp{jpni} and 
    309 the j-axis by \jp{jpnj} for a number of processors \jp{jpnij} most often equal to $jpni \times jpnj$ 
    310 (parameters set in  \nam{mpp} namelist). 
    311 Each processor is independent and without message passing or synchronous process, 
    312 programs run alone and access just its own local memory. 
    313 For this reason, the main model dimensions are now the local dimensions of the subdomain (pencil) that 
    314 are named \jp{jpi}, \jp{jpj}, \jp{jpk}. 
    315 These dimensions include the internal domain and the overlapping rows. 
    316 The number of rows to exchange (known as the halo) is usually set to one (\jp{jpreci}=1, in \mdl{par\_oce}). 
    317 The whole domain dimensions are named \jp{jpiglo}, \jp{jpjglo} and \jp{jpk}. 
    318 The relationship between the whole domain and a sub-domain is: 
    319 \[ 
    320   jpi = ( jpiglo-2*jpreci + (jpni-1) ) / jpni + 2*jpreci 
    321   jpj = ( jpjglo-2*jprecj + (jpnj-1) ) / jpnj + 2*jprecj 
    322 \] 
    323 where \jp{jpni}, \jp{jpnj} are the number of processors following the i- and j-axis. 
    324  
    325 One also defines variables nldi and nlei which correspond to the internal domain bounds, 
    326 and the variables nimpp and njmpp which are the position of the (1,1) grid-point in the global domain. 
     294In \NEMO, the splitting is regular and arithmetic. The total number of subdomains corresponds to the number of MPI processes allocated to \NEMO\ when the model is launched (\ie\ mpirun -np x ./nemo will automatically give x subdomains). The i-axis is divided by \np{jpni} and the j-axis by \np{jpnj}. These parameters are defined in \nam{mpp} namelist. If \np{jpni} and \np{jpnj} are < 1, they will be automatically redefined in the code to give the best domain decomposition (see bellow). 
     295 
     296Each processor is independent and without message passing or synchronous process, programs run alone and access just its own local memory. For this reason, the main model dimensions are now the local dimensions of the subdomain (pencil) that are named \jp{jpi}, \jp{jpj}, \jp{jpk}. 
     297These dimensions include the internal domain and the overlapping rows. The number of rows to exchange (known as the halo) is usually set to one (nn\_hls=1, in \mdl{par\_oce}, and must be kept to one until further notice). The whole domain dimensions are named \jp{jpiglo}, \jp{jpjglo} and \jp{jpk}. The relationship between the whole domain and a sub-domain is: 
     298\[ 
     299  jpi = ( jpiglo-2\times nn\_hls + (jpni-1) ) / jpni + 2\times nn\_hls 
     300\] 
     301\[ 
     302  jpj = ( jpjglo-2\times nn\_hls + (jpnj-1) ) / jpnj + 2\times nn\_hls 
     303\] 
     304 
     305One also defines variables nldi and nlei which correspond to the internal domain bounds, and the variables nimpp and njmpp which are the position of the (1,1) grid-point in the global domain (\autoref{fig:mpp}). Note that since the version 4, there is no more extra-halo area as defined in \autoref{fig:mpp} so \jp{jpi} is now always equal to nlci and \jp{jpj} equal to nlcj. 
     306 
    327307An element of $T_{l}$, a local array (subdomain) corresponds to an element of $T_{g}$, 
    328308a global array (whole domain) by the relationship: 
     
    331311  T_{g} (i+nimpp-1,j+njmpp-1,k) = T_{l} (i,j,k), 
    332312\] 
    333 with  $1 \leq i \leq jpi$, $1  \leq j \leq jpj $ , and  $1  \leq k \leq jpk$. 
    334  
    335 Processors are numbered from 0 to $jpnij-1$, the number is saved in the variable nproc. 
    336 In the standard version, a processor has no more than 
    337 four neighbouring processors named nono (for north), noea (east), noso (south) and nowe (west) and 
    338 two variables, nbondi and nbondj, indicate the relative position of the processor: 
    339 \begin{itemize} 
    340 \item       nbondi = -1    an east neighbour, no west processor, 
    341 \item       nbondi =  0 an east neighbour, a west neighbour, 
    342 \item       nbondi =  1    no east processor, a west neighbour, 
    343 \item       nbondi =  2    no splitting following the i-axis. 
    344 \end{itemize} 
    345 During the simulation, processors exchange data with their neighbours. 
    346 If there is effectively a neighbour, the processor receives variables from this processor on its overlapping row, 
    347 and sends the data issued from internal domain corresponding to the overlapping row of the other processor. 
    348  
    349  
    350 The \NEMO\ model computes equation terms with the help of mask arrays (0 on land points and 1 on sea points). 
    351 It is easily readable and very efficient in the context of a computer with vectorial architecture. 
    352 However, in the case of a scalar processor, computations over the land regions become more expensive in 
    353 terms of CPU time. 
    354 It is worse when we use a complex configuration with a realistic bathymetry like the global ocean where 
    355 more than 50 \% of points are land points. 
    356 For this reason, a pre-processing tool can be used to choose the mpp domain decomposition with a maximum number of 
    357 only land points processors, which can then be eliminated (\autoref{fig:mppini2}) 
    358 (For example, the mpp\_optimiz tools, available from the DRAKKAR web site). 
    359 This optimisation is dependent on the specific bathymetry employed. 
    360 The user then chooses optimal parameters \jp{jpni}, \jp{jpnj} and \jp{jpnij} with $jpnij < jpni \times jpnj$, 
    361 leading to the elimination of $jpni \times jpnj - jpnij$ land processors. 
    362 When those parameters are specified in \nam{mpp} namelist, 
    363 the algorithm in the \rou{inimpp2} routine sets each processor's parameters (nbound, nono, noea,...) so that 
    364 the land-only processors are not taken into account. 
    365  
    366 \gmcomment{Note that the inimpp2 routine is general so that the original inimpp 
    367 routine should be suppressed from the code.} 
    368  
    369 When land processors are eliminated, 
    370 the value corresponding to these locations in the model output files is undefined. 
    371 Note that this is a problem for the meshmask file which requires to be defined over the whole domain. 
    372 Therefore, user should not eliminate land processors when creating a meshmask file 
    373 (\ie\ when setting a non-zero value to \np{nn\_msh}). 
     313with $1 \leq i \leq jpi$, $1  \leq j \leq jpj $ , and  $1  \leq k \leq jpk$. 
     314 
     315The 1-d arrays $mig(1:\jp{jpi})$ and $mjg(1:\jp{jpj})$, defined in \rou{dom\_glo} routine (\mdl{domain} module), should be used to get global domain indices from local domain indices. The 1-d arrays, $mi0(1:\jp{jpiglo})$, $mi1(1:\jp{jpiglo})$ and $mj0(1:\jp{jpjglo})$, $mj1(1:\jp{jpjglo})$ have the reverse purpose and should be used to define loop indices expressed in global domain indices (see examples in \mdl{dtastd} module).\\ 
     316 
     317The \NEMO\ model computes equation terms with the help of mask arrays (0 on land points and 1 on sea points). It is therefore possible that an MPI subdomain contains only land points. To save ressources, we try to supress from the computational domain as much land subdomains as possible. For example if $N_{mpi}$ processes are allocated to NEMO, the domain decomposition will be given by the following equation: 
     318\[ 
     319  N_{mpi} = jpni \times jpnj - N_{land} + N_{useless} 
     320\] 
     321$N_{land}$ is the total number of land subdomains in the domain decomposition defined by \np{jpni} and \np{jpnj}. $N_{useless}$ is the number of land subdomains that are kept in the compuational domain in order to make sure that $N_{mpi}$ MPI processes are indeed allocated to a given subdomain. The values of $N_{mpi}$, \np{jpni}, \np{jpnj},  $N_{land}$ and $N_{useless}$ are printed in the output file \texttt{ocean.output}. $N_{useless}$ must, of course, be as small as possible to limit the waste of ressources. A warning is issued in  \texttt{ocean.output} if $N_{useless}$ is not zero. Note that non-zero value of $N_{useless}$ is uselly required when using AGRIF as, up to now, the parent grid and each of the child grids must use all the $N_{mpi}$ processes. 
     322 
     323If the domain decomposition is automatically defined (when \np{jpni} and \np{jpnj} are < 1), the decomposition chosen by the model will minimise the sub-domain size (defined as $max_{all domains}(jpi \times jpj)$) and maximize the number of eliminated land subdomains. This means that no other domain decomposition (a set of \np{jpni} and \np{jpnj} values) will use less processes than $(jpni  \times  jpnj - N_{land})$ and get a smaller subdomain size. 
     324In order to specify $N_{mpi}$ properly (minimize $N_{useless}$), you must run the model once with \np{ln\_list} activated. In this case, the model will start the initialisation phase, print the list of optimum decompositions ($N_{mpi}$, \np{jpni} and \np{jpnj}) in \texttt{ocean.output} and directly abort. The maximum value of $N_{mpi}$ tested in this list is given by $max(N_{MPI\_tasks}, \np{jpni} \times \np{jpnj})$. For example, run the model on 40 nodes with ln\_list activated and $\np{jpni} = 10000$ and $\np{jpnj} = 1$, will print the list of optimum domains decomposition from 1 to about 10000.  
     325 
     326Processors are numbered from 0 to $N_{mpi} - 1$. Subdomains containning some ocean points are numbered first from 0 to $jpni * jpnj - N_{land} -1$. The remaining $N_{useless}$ land subdomains are numbered next, which means that, for a given (\np{jpni}, \np{jpnj}), the numbers attributed to he ocean subdomains do not vary with $N_{useless}$.  
     327 
     328When land processors are eliminated, the value corresponding to these locations in the model output files is undefined. \np{ln\_mskland} must be activated in order avoid Not a Number values in output files. Note that it is better to not eliminate land processors when creating a meshmask file (\ie\ when setting a non-zero value to \np{nn\_msh}). 
    374329 
    375330%>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
     
    406361%----------------------------------------------------------------------------------------------- 
    407362 
    408 Options are defined through the \nam{bdy} \nam{bdy\_dta} namelist variables. 
     363Options are defined through the \nam{bdy} and \nam{bdy\_dta} namelist variables. 
    409364The BDY module is the core implementation of open boundary conditions for regional configurations on 
    410 temperature, salinity, barotropic and baroclinic velocities, as well as ice concentration, ice and snow thicknesses. 
     365ocean temperature, salinity, barotropic-baroclinic velocities, ice-snow concentration, thicknesses, temperatures, salinity and melt ponds concentration and thickness. 
    411366 
    412367The BDY module was modelled on the OBC module (see \NEMO\ 3.4) and shares many features and 
     
    421376\label{subsec:BDY_namelist} 
    422377 
    423 The BDY module is activated by setting \np{ln\_bdy}\forcode{ = .true.} . 
     378The BDY module is activated by setting \np{ln\_bdy}\forcode{=.true.} . 
    424379It is possible to define more than one boundary ``set'' and apply different boundary conditions to each set. 
    425380The number of boundary sets is defined by \np{nb\_bdy}. 
    426 Each boundary set may be defined as a set of straight line segments in a namelist 
    427 (\np{ln\_coords\_file}\forcode{ = .false.}) or read in from a file (\np{ln\_coords\_file}\forcode{ = .true.}). 
    428 If the set is defined in a namelist, then the namelists \nam{bdy\_index} must be included separately, one for each set. 
    429 If the set is defined by a file, then a ``\ifile{coordinates.bdy}'' file must be provided. 
     381Each boundary set can be either defined as a series of straight line segments directly in the namelist 
     382(\np{ln\_coords\_file}\forcode{=.false.}, and a namelist block \nam{bdy\_index} must be included for each set) or read in from a file (\np{ln\_coords\_file}\forcode{=.true.}, and a ``\ifile{coordinates.bdy}'' file must be provided). 
    430383The coordinates.bdy file is analagous to the usual \NEMO\ ``\ifile{coordinates}'' file. 
    431384In the example above, there are two boundary sets, the first of which is defined via a file and 
    432 the second is defined in a namelist. 
     385the second is defined in the namelist. 
    433386For more details of the definition of the boundary geometry see section \autoref{subsec:BDY_geometry}. 
    434387 
    435388For each boundary set a boundary condition has to be chosen for the barotropic solution 
    436389(``u2d'':sea-surface height and barotropic velocities), for the baroclinic velocities (``u3d''), 
    437 for the active tracers \footnote{The BDY module does not deal with passive tracers at this version} (``tra''), and sea-ice (``ice''). 
    438 For each set of variables there is a choice of algorithm and a choice for the data, 
    439 eg. for the active tracers the algorithm is set by \np{cn\_tra} and the choice of data is set by \np{nn\_tra\_dta}.\\ 
     390for the active tracers \footnote{The BDY module does not deal with passive tracers at this version} (``tra''), and for sea-ice (``ice''). 
     391For each set of variables one has to choose an algorithm and the boundary data (set resp. by \np{cn\_tra} and \np{nn\_tra\_dta} for tracers).\\ 
    440392 
    441393The choice of algorithm is currently as follows: 
     
    452404\end{description} 
    453405 
    454 The main choice for the boundary data is to use initial conditions as boundary data 
    455 (\np{nn\_tra\_dta}\forcode{ = 0}) or to use external data from a file (\np{nn\_tra\_dta}\forcode{ = 1}). 
     406The boundary data is either set to initial conditions 
     407(\np{nn\_tra\_dta}\forcode{=0}) or forced with external data from a file (\np{nn\_tra\_dta}\forcode{=1}). 
    456408In case the 3d velocity data contain the total velocity (ie, baroclinic and barotropic velocity), 
    457 the bdy code can derived baroclinic and barotropic velocities by setting \np{ln\_full\_vel}\forcode{ = .true. } 
     409the bdy code can derived baroclinic and barotropic velocities by setting \np{ln\_full\_vel}\forcode{=.true.} 
    458410For the barotropic solution there is also the option to use tidal harmonic forcing either by 
    459 itself (\np{nn\_dyn2d\_dta}\forcode{ = 2}) or in addition to other external data (\np{nn\_dyn2d\_dta}\forcode{ = 3}).\\ 
    460 Sea-ice salinity, temperature and age data at the boundary are constant and defined repectively by \np{rn\_ice\_sal}, \np{rn\_ice\_tem} and \np{rn\_ice\_age}. 
     411itself (\np{nn\_dyn2d\_dta}\forcode{=2}) or in addition to other external data (\np{nn\_dyn2d\_dta}\forcode{=3}).\\ 
     412If not set to initial conditions, sea-ice salinity, temperatures and melt ponds data at the boundary can either be read in a file or defined as constant (by \np{rn\_ice\_sal}, \np{rn\_ice\_tem}, \np{rn\_ice\_apnd}, \np{rn\_ice\_hpnd}). Ice age is constant and defined by \np{rn\_ice\_age}. 
    461413 
    462414If external boundary data is required then the \nam{bdy\_dta} namelist must be defined. 
     
    468420For each required variable, the filename, the frequency of the files and 
    469421the frequency of the data in the files are given. 
    470 Also whether or not time-interpolation is required and whether the data is climatological (time-cyclic) data.\\ 
     422Also whether or not time-interpolation is required and whether the data is climatological (time-cyclic) data. 
     423For sea-ice salinity, temperatures and melt ponds, reading the files are skipped and constant values are used if filenames are defined as {'NOT USED'}.\\ 
    471424 
    472425There is currently an option to vertically interpolate the open boundary data onto the native grid at run-time. 
    473 If \np{nn\_bdy\_jpk} $< -1$, it is assumed that the lateral boundary data are already on the native grid. 
     426If \np{nn\_bdy\_jpk}$<-1$, it is assumed that the lateral boundary data are already on the native grid. 
    474427However, if \np{nn\_bdy\_jpk} is set to the number of vertical levels present in the boundary data, 
    475428a bilinear interpolation onto the native grid will be triggered at runtime. 
     
    630583\jp{jpinft} give the start and end $i$ indices for each segment with similar for the other boundaries. 
    631584These segments define a list of $T$ grid points along the outermost row of the boundary ($nbr\,=\, 1$). 
    632 The code deduces the $U$ and $V$ points and also the points for $nbr\,>\, 1$ if \np{nn\_rimwidth}\forcode{ > 1}. 
     585The code deduces the $U$ and $V$ points and also the points for $nbr\,>\, 1$ if \np{nn\_rimwidth}\forcode{>1}. 
    633586 
    634587The boundary geometry may also be defined from a ``\ifile{coordinates.bdy}'' file. 
     
    706659There is an option to force the total volume in the regional model to be constant. 
    707660This is controlled  by the \np{ln\_vol} parameter in the namelist. 
    708 A value of \np{ln\_vol}\forcode{ = .false.} indicates that this option is not used. 
     661A value of \np{ln\_vol}\forcode{=.false.} indicates that this option is not used. 
    709662Two options to control the volume are available (\np{nn\_volctl}). 
    710 If \np{nn\_volctl}\forcode{ = 0} then a correction is applied to the normal barotropic velocities around the boundary at 
     663If \np{nn\_volctl}\forcode{=0} then a correction is applied to the normal barotropic velocities around the boundary at 
    711664each timestep to ensure that the integrated volume flow through the boundary is zero. 
    712 If \np{nn\_volctl}\forcode{ = 1} then the calculation of the volume change on 
     665If \np{nn\_volctl}\forcode{=1} then the calculation of the volume change on 
    713666the timestep includes the change due to the freshwater flux across the surface and 
    714667the correction velocity corrects for this as well. 
Note: See TracChangeset for help on using the changeset viewer.