#2598 closed Task (fixed)
HPC-01_SebEtCie_MPI_Interface
Reported by: | smasson | Owned by: | smasson |
---|---|---|---|
Priority: | low | Milestone: | 2021 WP |
Component: | LBC | Version: | trunk |
Severity: | minor | Keywords: | |
Cc: |
Description
Context
1) Review, simplification and cleaning of the MPI interface is needed
2) There is a few bugs:
- potential bug with the corners if nn_hls > 1 (or 2?) with point-to-point communication
- collective communications is not working with BDY (as BDY has its own exchanges)
3) There is other things to further improve:
- do not send/receive if send/receive only land points (depend on nn_hls)
- send/receive only the oce values (checking mask, depend on nn_hls)
- keep in memory the send/receive buffers (if they are not too big for ex < jpi*jpj) from one call to another -> for point-to-point put the mpi_waitall at the beginning of next call
- make sure the north pole folding does not need east-west comm. done if nogather = F. to be done if nogather = T... so we could place it where we want (no need to wait for east-west comm).
- if north pole folding: force the last row of MPI-subdomains to have jpj much smaller (the smallest value?) so they go faster and other don't wait for them
Workplan action
Commit History (30)
Changeset | Author | Time | ChangeLog |
---|---|---|---|
14611 | smasson | 2021-03-15T17:00:43+01:00 | |
14433 | smasson | 2021-02-11T09:06:49+01:00 | trunk: merge dev_r14312_MPI_Interface into the trunk, #2598 |
14432 | smasson | 2021-02-11T09:04:33+01:00 | dev_r14312_MPI_Interface: use AGRIF/dev branch, #2598 |
14417 | smasson | 2021-02-08T11:48:23+01:00 | dev_r14312_MPI_Interface: put back propper alignment, #2598 |
14403 | smasson | 2021-02-05T13:13:14+01:00 | dev_r14312_MPI_Interface: merge with trunk@14402, #2598 |
14379 | smasson | 2021-02-03T09:11:51+01:00 | dev_r14312_MPI_Interface: refactoring of lbc_lnk_pt2pt_generic, #2598 |
14369 | smasson | 2021-02-02T14:37:11+01:00 | |
14368 | smasson | 2021-02-02T10:57:21+01:00 | dev_r14312_MPI_Interface: AGRIF working with sp and dp arrays, needed by [14367], #2598 |
14367 | smasson | 2021-02-02T08:51:42+01:00 | dev_r14312_MPI_Interface: keep send/recv buffers in memory if smaller than jpi*jpj, #2598 |
14366 | smasson | 2021-02-01T18:06:01+01:00 | dev_r14312_MPI_Interface: prperly define corner in point-2-point communication when needed, #2598 |
14365 | smasson | 2021-02-01T10:50:40+01:00 | dev_r14312_MPI_Interface: use NetCDF global attributes to store configs info, follows [14336], #2598 |
14363 | smasson | 2021-02-01T08:34:52+01:00 | dev_r14312_MPI_Interface: suppress communications involving only land points, #2598 |
14355 | smasson | 2021-01-27T17:12:40+01:00 | dev_r14312_MPI_Interface: update CANAL and TSUNAMI tests, missing part of [14346], #2598 |
14352 | smasson | 2021-01-27T15:20:22+01:00 | dev_r14312_MPI_Interface: use the new AGRIF branch, #2598 |
14351 | smasson | 2021-01-27T15:17:23+01:00 | dev_r14312_MPI_Interface: update AGRIF branch to accept CONTIGUOUS attribute, #2598 |
14350 | smasson | 2021-01-27T15:02:28+01:00 | dev_r14312_MPI_Interface: create AGRIF branch, #2598 |
14349 | smasson | 2021-01-27T14:57:31+01:00 | dev_r14312_MPI_Interface: further simplifications of lbclk and lbcnfd, #2598 |
14346 | smasson | 2021-01-27T08:21:11+01:00 | dev_r14312_MPI_Interface: minor fixes in tests, #2598 |
14344 | smasson | 2021-01-26T12:30:42+01:00 | dev_r14312_MPI_Interface: missing one piece in [14343], #2598 |
14343 | smasson | 2021-01-26T12:07:24+01:00 | dev_r14312_MPI_Interface: bugfix when suppressing land-domains along the NP folding, #2598 |
14338 | smasson | 2021-01-25T08:50:49+01:00 | dev_r14312_MPI_Interface: simplification of lbclnk and lbcnfd and their generic interfaces, #2598 |
14337 | smasson | 2021-01-23T16:06:56+01:00 | dev_r14312_MPI_Interface: 2 bugfixes following [14336], #2598 |
14336 | smasson | 2021-01-23T11:09:54+01:00 | dev_r14312_MPI_Interface: replace jperio by l_Iperio, l_Jperio, l_NFold, c_NFtype, #2598 |
14331 | smasson | 2021-01-22T15:48:14+01:00 | dev_r14312_MPI_Interface: temporary bugfix for SAS and OFF, #2598 |
14327 | smasson | 2021-01-22T11:50:04+01:00 | dev_r14312_MPI_Interface: bugfix for icebergs and NP folding, #2598 |
14326 | smasson | 2021-01-22T11:49:55+01:00 | |
14320 | smasson | 2021-01-20T13:02:04+01:00 | dev_r14312_MPI_Interface: further cleaning remove N[ij][es][12]and N[ij]_[12], #2598 |
14319 | smasson | 2021-01-20T12:53:10+01:00 | dev_r14312_MPI_Interface: merge with trunk@14318, #2598 |
14314 | smasson | 2021-01-19T13:07:35+01:00 | dev_r14312_MPI_Interface: first implementation, #2598 |
14313 | smasson | 2021-01-19T13:00:36+01:00 | create MPI Interface branch, #2598 |
Change History (48)
comment:1 Changed 3 years ago by smasson
comment:2 Changed 3 years ago by smasson
In 14314:
comment:3 Changed 3 years ago by smasson
With nn_hls=1 and point-to-point comm., sette tests passed but:
- ORCA2_ICE_PISCES is no more working with the icebergs
- OFF_PISCES differ with trunk@14312
- SAS_ICE differ with trunk@14312
- AGRIF differ with trunk@14312
- ORCA2_ICE_OBS: incomplete test
I will check this points...
Collective comm. are also working but I need to double check this...
Brief description of what was done and what should be done...
- review, cleaning and simplification
- CRS no more working. Large parts of obsolete code have been commente out...
- deep rewritting of MPI interface
- land-only domains are no more communicating with others even if they are not suppressed
- simplification and convergence of the point-to-point and collectives interfaces
- MPI addresses store in mpinei: an integer(8) array -> could be later integer(8,n_hlsmax) when we also test the mask to see oif we need to send oce value.
- buffer size is the same for send and receive (isizei(8), isizej(8)) could be later generalized to isize(8,2,n_hlsmax) for send receive of the 8 corners.
- collective communicators (mpi_nc_com4 and mpi_nc_com8) could later be arrays to pick-up different graph (BDY and different values of nn_hls for example)
- following collective comm, back to 1D buffers in point-to-point, so later we will be able to ajust their size and re-use the buffer if fitting and not too big (to we can keep it in memeory)
comment:4 Changed 3 years ago by smasson
In 14319:
comment:5 Changed 3 years ago by smasson
In 14320:
comment:6 Changed 3 years ago by smasson
In 14326:
comment:7 Changed 3 years ago by smasson
In 14327:
comment:8 Changed 3 years ago by smasson
With [14327]. All sette tests passed (except SWG as in the trunk).
Compared with trunk@14310 all files are identical except:
WORCA2_OFF_PISCES_ST tracer.stat files are DIFFERENT (results are different after time steps) WORCA2_SAS_ICE_ST run.stat files are DIFFERENT (results are different after 4 time steps) WICE_AGRIF_ST run.stat files are DIFFERENT (results are different after 2 time steps)
comment:9 Changed 3 years ago by smasson
In 14331:
comment:10 Changed 3 years ago by smasson
- With [14331], all sette tests give the same results as in the trunk@14310. :-)
- This is a temporary fix as the gestion of jperio will be completely reviewed.
comment:11 Changed 3 years ago by smasson
In 14336:
comment:12 Changed 3 years ago by smasson
In 14337:
comment:13 Changed 3 years ago by smasson
[14337] passes all sette tests and gives the same results as trunk@14310
comment:14 Changed 3 years ago by smasson
In 14338:
comment:15 Changed 3 years ago by smasson
[14338] passes all sette tests and gives the same results as trunk@14310
In this commit, I reduced the complexity of lbclnk and lbcnfd.
- I replaced lbc_lbk_multi by lbc_lbk -> we are now always using what was corresponding to lbc_lbk_multi even if we have only 1 array
- I strongly reduce the number of interfaces in lbclnk and lbcnfd in order to simplify and speed-up the compilation time of these routines
- I renamed some of the lbc generic routines to make the hierarchy of routines more understandable (hopefully)
- I simplified the c preprocessing in the generic routines
- I added a new namelist parameter "nn_comm" in nammpp (better name to be found...) to chose between point-to-point or neighbourhood collectives communications
comment:16 Changed 3 years ago by smasson
In 14343:
comment:17 Changed 3 years ago by smasson
In 14344:
comment:18 Changed 3 years ago by smasson
In 14346:
comment:19 Changed 3 years ago by smasson
In 14349:
comment:20 Changed 3 years ago by smasson
In 14350:
comment:21 Changed 3 years ago by smasson
Need to modify AGRIF so it recognizes the CONTIGUOUS attributes, we have been forced to add in lbc_blk_call.h90 in [14350].
In lbc_blk_call.h90, this fortran 2008 argument is indeed needed in the TARGET declarations:
REAL(PRECISION), DIMENSION(DIMS) , TARGET, CONTIGUOUS, INTENT(inout) :: pt1 ! arrays on which the lbc is applied REAL(PRECISION), DIMENSION(DIMS), OPTIONAL, TARGET, CONTIGUOUS, INTENT(inout) :: pt2 , pt3 , pt4 , pt5 , pt6 , pt7 , pt8 , pt9, & & pt10, pt11, pt12, pt13, pt14, pt15, pt16 ... REAL(PRECISION), DIMENSION(DIMS), TARGET, INTENT(inout), CONTIGUOUS :: ptab ! arrays on which the lbc is applied
in order compile the following line
ptab_ptr(kfld)%pt4d(1:SIZE(ptab, dim=1),1:SIZE(ptab, dim=2),1:ISZ3,1:ISZ4) => ptab
This line and CONTIGUOUS are the key tricks found by Andrew to use 4d pointer even with 2d or 3d input arrays and therefore suppress all 2d and 3d interfaces of lbc_lnk_pt2pt, lbc_lnk_neicoll, lbc_nfd, mpp_nfd and lbc_nfd_nogather.
This greatly reduces the size of lnclnk and lbcfld routines and makes the generic routines much more readable (as the only remaining cpp parameters in lbc_lnk_pt2pt, lbc_lnk_neicoll, lbc_nfd, mpp_nfd and lbc_nfd_nogather are PRECISION and MPI_TYPE).
comment:22 Changed 3 years ago by smasson
In 14351:
comment:23 Changed 3 years ago by smasson
In 14352:
comment:24 Changed 3 years ago by smasson
[14352] passes all sette tests and gives the same results as trunk@14310
comment:25 Changed 3 years ago by smasson
In 14355:
comment:26 Changed 3 years ago by smasson
In 14363:
comment:27 Changed 3 years ago by smasson
[14363] passes all sette tests and gives the same results as trunk@14310
For an unexplained reason, this optimisation is not wotking with neighbourhood collective communications (nn_comm /= 1).
In consequence (and hopefully temporary), I have been force to by-pass this optimisation when nn_comm /= 1...
This is done in init_excl_landpt in mppini.F90 line 1186:
IF( nn_comm == 1 ) THEN ! SM: NOT WORKING FOR NEIGHBOURHOOD COLLECTIVE COMMUNICATIONS, I DON'T KNOW WHY... ... ENDIF
comment:28 Changed 3 years ago by smasson
In 14365:
comment:29 Changed 3 years ago by smasson
In 14366:
comment:30 Changed 3 years ago by smasson
[14366] passes all sette tests and gives the same results as trunk@14310
comment:31 Changed 3 years ago by nicolasmartin
- Owner set to smasson
- Status changed from new to assigned
comment:32 Changed 3 years ago by smasson
In 14367:
comment:33 Changed 3 years ago by smasson
In 14368:
comment:34 Changed 3 years ago by smasson
In 14369:
comment:35 Changed 3 years ago by smasson
[14369] passes all sette tests and gives the same results as trunk@14310
comment:36 Changed 3 years ago by smasson
In 14379:
comment:37 Changed 3 years ago by smasson
[14379] requires to add a few lines in ext/FCM/lib/Fcm/BuildTask.pm (thank you Andrew!)
With this change, [14379] passes all sette tests and gives the same results as trunk@14310
-
ext/FCM/lib/Fcm/BuildTask.pm
124 124 125 125 return unless $self->actiontype; 126 126 127 if ( defined $self->srcfile ) { 128 my $bname = basename($self->srcfile->src) ; 129 for my $depend (@{ $self->dependency }) { 130 if ( $bname eq $depend ) { 131 # Recursion suspected 132 return; 133 } 134 } 135 } 136 127 137 my $uptodate = 1; 128 138 my $dep_uptodate = 1;
comment:38 Changed 3 years ago by smasson
other developments to consider:
- fix the optimization for lnc_lbk_neicoll. As specified in ticket:2598#comment:27, suppressing communications involving only land points is not working with lnc_lbk_neicoll. I don’t know why.
- review specific lbc in BDY. This first needs a review of lsend_bdy and lrecv_bdy in bdyinit with the help of Gaston Irrmann. For example, how do we deal with corners.
- review/supress specific lbc in ICB…
- Now that all calls to lbc_lnk use what was called the "4d multi” interface, it should be possible to call lnc_lnk with arrays of different shape as soon as the first 2 dimensions (i and j) are equal. I think this is quite easy to do in lbc_lnk_pt2pt and in lbc_lnk_neicoll. We mostly have to declare ipk and ipl as 1d array and rewrite the loops as DO jf = 1, ipf ; DO jl = 1, ipl(jf) ; DO jk = 1, ipk(jk). The problem comes from the NP folding. NP folding routines should also be adapted to deal with different values of ipl(jf) and ipk(jk) -> see next point
- North-Pole Folfing: should be compliant with 3 new constrains:
(1) should work with halo size khls /= nn_hls. I think the first point is needed for ICB.
(2) should not assume that east-west communication has already been done (-> do the NP folling only with inner domain points). this suppress dependency between communications.
(3) should be able to deal with different values of ipl(jf) and ipk(jk), see previous point.
I modified lbc_nfd_generic.h90 so it is compliant with constrains 1 and 2 (I hope) but I think that mpp_nfd_generic.h90 lbc_nfd_nogather_generic.h90 are not working if khls /= nn_hls. They should be modified/updated.
I think there is a room for some optimization in the “nogather" version when looking at the neighbourgs: (1) exclude communications including only land points and (2) I think we should not consider the halos when looking for neighbourgs.
- optimize northern cores size when doing NP folding:
As pointed out by Jesus Labarta, the NP folding is slowing down the northern cores and quickly de-synchronize all cores which slows down everything.. A quick and simple fix/patch is to minimize the j-size of the northern domains so they compute faster and they can take a longer time to do the NP folding without slowing down the other cores.
A first optimization in this direction is already existing : we force all domain rows, except the northern one, to have jpjmax lines which minimizes the j-size of the remaining northern row of domains (depending of the domain decomposition). So today we first do the domain decomposition and next play with the potential difference between jpj and jpjmax to minimize the northern domains.
We could modify this strategy. We could, for example, first fixe the northern domains j-size at a minimal value (to be defined) and next built the domain decomposition on the remaining part of the domain. This would ensure that the j-size of the northern domains is always minimized. This is not very difficult do do once we have a good criterion do define the “minimal j-size” we want to use…
- activate in lbc_lnk the optional argument ld4only whenever it is possible to communicate only with 4 neighbourgs. Would additional options such as do only exchanges in a given direction would be helpful? If yes, this is very easy to do.
- open questions: we could also fill the buffers and send only the ocean points of the halos. Maybe this would make a différence when working with 3D arrays? This could easily be done with the pack/unpack fortran functions. The problem would be to find which mask must be used…
comment:39 Changed 3 years ago by smasson
In 14403:
comment:40 Changed 3 years ago by smasson
[14403] passes all sette tests and gives the same results as trunk@14401
Note that we stil need the patch on ext/FCM/lib/Fcm/BuildTask.pm as described in comment:37
comment:41 Changed 3 years ago by smasson
[14403] passes all sette tests and gives the same results as trunk@14401 including with debug options (X64_IRENE_DEBUG) of with gcc (X64_IRENE_GCC)
comment:42 Changed 3 years ago by smasson
In 14417:
comment:43 Changed 3 years ago by smasson
In 14430:
comment:44 Changed 3 years ago by smasson
In 14431:
comment:45 Changed 3 years ago by smasson
In 14432:
comment:46 Changed 3 years ago by smasson
In 14433:
comment:47 Changed 3 years ago by smasson
- Resolution set to fixed
- Status changed from assigned to closed
[14434] passes all sette tests and gives the same results as trunk@14401
comment:48 Changed 3 years ago by smasson
In 14611:
In 14313: