Opened 3 months ago

Closed 2 months ago

Last modified 4 weeks ago

#2598 closed Task (fixed)

HPC-01_SebEtCie_MPI_Interface

Reported by: smasson Owned by: smasson
Priority: low Milestone: 2021 WP
Component: LBC Version: trunk
Severity: minor Keywords:
Cc: Branch review:
MP ready?: Task progress:

Description

Context

1) Review, simplification and cleaning of the MPI interface is needed
2) There is a few bugs:

  • potential bug with the corners if nn_hls > 1 (or 2?) with point-to-point communication
  • collective communications is not working with BDY (as BDY has its own exchanges)

3) There is other things to further improve:

  • do not send/receive if send/receive only land points (depend on nn_hls)
  • send/receive only the oce values (checking mask, depend on nn_hls)
  • keep in memory the send/receive buffers (if they are not too big for ex < jpi*jpj) from one call to another → for point-to-point put the mpi_waitall at the beginning of next call
  • make sure the north pole folding does not need east-west comm. done if nogather = F. to be done if nogather = T… so we could place it where we want (no need to wait for east-west comm).
  • if north pole folding: force the last row of MPI-subdomains to have jpj much smaller (the smallest value?) so they go faster and other don't wait for them

Workplan action

Wikipage: wiki:2021WP/HPC-01_SebEtCie_MPI_Interface

Commit History (30)

ChangesetAuthorTimeChangeLog
14611smasson2021-03-15T17:00:43+01:00

AGRIF: delete dev old branch, #2598 and #2615

14433smasson2021-02-11T09:06:49+01:00

trunk: merge dev_r14312_MPI_Interface into the trunk, #2598

14432smasson2021-02-11T09:04:33+01:00

dev_r14312_MPI_Interface: use AGRIF/dev branch, #2598

14417smasson2021-02-08T11:48:23+01:00

dev_r14312_MPI_Interface: put back propper alignment, #2598

14403smasson2021-02-05T13:13:14+01:00

dev_r14312_MPI_Interface: merge with trunk@14402, #2598

14379smasson2021-02-03T09:11:51+01:00

dev_r14312_MPI_Interface: refactoring of lbc_lnk_pt2pt_generic, #2598

14369smasson2021-02-02T14:37:11+01:00

dev_r14312_MPI_Interface: syntaxe error in [14368], #2598

14368smasson2021-02-02T10:57:21+01:00

dev_r14312_MPI_Interface: AGRIF working with sp and dp arrays, needed by [14367], #2598

14367smasson2021-02-02T08:51:42+01:00

dev_r14312_MPI_Interface: keep send/recv buffers in memory if smaller than jpi*jpj, #2598

14366smasson2021-02-01T18:06:01+01:00

dev_r14312_MPI_Interface: prperly define corner in point-2-point communication when needed, #2598

14365smasson2021-02-01T10:50:40+01:00

dev_r14312_MPI_Interface: use NetCDF global attributes to store configs info, follows [14336], #2598

14363smasson2021-02-01T08:34:52+01:00

dev_r14312_MPI_Interface: suppress communications involving only land points, #2598

14355smasson2021-01-27T17:12:40+01:00

dev_r14312_MPI_Interface: update CANAL and TSUNAMI tests, missing part of [14346], #2598

14352smasson2021-01-27T15:20:22+01:00

dev_r14312_MPI_Interface: use the new AGRIF branch, #2598

14351smasson2021-01-27T15:17:23+01:00

dev_r14312_MPI_Interface: update AGRIF branch to accept CONTIGUOUS attribute, #2598

14350smasson2021-01-27T15:02:28+01:00

dev_r14312_MPI_Interface: create AGRIF branch, #2598

14349smasson2021-01-27T14:57:31+01:00

dev_r14312_MPI_Interface: further simplifications of lbclk and lbcnfd, #2598

14346smasson2021-01-27T08:21:11+01:00

dev_r14312_MPI_Interface: minor fixes in tests, #2598

14344smasson2021-01-26T12:30:42+01:00

dev_r14312_MPI_Interface: missing one piece in [14343], #2598

14343smasson2021-01-26T12:07:24+01:00

dev_r14312_MPI_Interface: bugfix when suppressing land-domains along the NP folding, #2598

14338smasson2021-01-25T08:50:49+01:00

dev_r14312_MPI_Interface: simplification of lbclnk and lbcnfd and their generic interfaces, #2598

14337smasson2021-01-23T16:06:56+01:00

dev_r14312_MPI_Interface: 2 bugfixes following [14336], #2598

14336smasson2021-01-23T11:09:54+01:00

dev_r14312_MPI_Interface: replace jperio by l_Iperio, l_Jperio, l_NFold, c_NFtype, #2598

14331smasson2021-01-22T15:48:14+01:00

dev_r14312_MPI_Interface: temporary bugfix for SAS and OFF, #2598

14327smasson2021-01-22T11:50:04+01:00

dev_r14312_MPI_Interface: bugfix for icebergs and NP folding, #2598

14326smasson2021-01-22T11:49:55+01:00

dev_r14312_MPI_Interface: bugfix following [14320], #2598

14320smasson2021-01-20T13:02:04+01:00

dev_r14312_MPI_Interface: further cleaning remove N[ij][es][12]and N[ij]_[12], #2598

14319smasson2021-01-20T12:53:10+01:00

dev_r14312_MPI_Interface: merge with trunk@14318, #2598

14314smasson2021-01-19T13:07:35+01:00

dev_r14312_MPI_Interface: first implementation, #2598

14313smasson2021-01-19T13:00:36+01:00

create MPI Interface branch, #2598

Change History (48)

comment:1 Changed 3 months ago by smasson

In 14313:

create MPI Interface branch, #2598

comment:2 Changed 3 months ago by smasson

In 14314:

dev_r14312_MPI_Interface: first implementation, #2598

comment:3 Changed 3 months ago by smasson

With nn_hls=1 and point-to-point comm., sette tests passed but:

  • ORCA2_ICE_PISCES is no more working with the icebergs
  • OFF_PISCES differ with trunk@14312
  • SAS_ICE differ with trunk@14312
  • AGRIF differ with trunk@14312
  • ORCA2_ICE_OBS: incomplete test

I will check this points…
Collective comm. are also working but I need to double check this…

Brief description of what was done and what should be done…

  • review, cleaning and simplification
  • CRS no more working. Large parts of obsolete code have been commente out…
  • deep rewritting of MPI interface
  • land-only domains are no more communicating with others even if they are not suppressed
  • simplification and convergence of the point-to-point and collectives interfaces
  • MPI addresses store in mpinei: an integer(8) array → could be later integer(8,n_hlsmax) when we also test the mask to see oif we need to send oce value.
  • buffer size is the same for send and receive (isizei(8), isizej(8)) could be later generalized to isize(8,2,n_hlsmax) for send receive of the 8 corners.
  • collective communicators (mpi_nc_com4 and mpi_nc_com8) could later be arrays to pick-up different graph (BDY and different values of nn_hls for example)
  • following collective comm, back to 1D buffers in point-to-point, so later we will be able to ajust their size and re-use the buffer if fitting and not too big (to we can keep it in memeory)
Last edited 3 months ago by smasson (previous) (diff)

comment:4 Changed 3 months ago by smasson

In 14319:

dev_r14312_MPI_Interface: merge with trunk@14318, #2598

comment:5 Changed 3 months ago by smasson

In 14320:

dev_r14312_MPI_Interface: further cleaning remove N[ij][es][12]and N[ij]_[12], #2598

comment:6 Changed 3 months ago by smasson

In 14326:

dev_r14312_MPI_Interface: bugfix following [14320], #2598

comment:7 Changed 3 months ago by smasson

In 14327:

dev_r14312_MPI_Interface: bugfix for icebergs and NP folding, #2598

comment:8 Changed 3 months ago by smasson

With [14327]. All sette tests passed (except SWG as in the trunk).
Compared with trunk@14310 all files are identical except:

WORCA2_OFF_PISCES_ST  tracer.stat files are DIFFERENT (results are different after   time steps)
WORCA2_SAS_ICE_ST     run.stat    files are DIFFERENT (results are different after  4  time steps)
WICE_AGRIF_ST         run.stat    files are DIFFERENT (results are different after  2  time steps)

comment:9 Changed 3 months ago by smasson

In 14331:

dev_r14312_MPI_Interface: temporary bugfix for SAS and OFF, #2598

comment:10 Changed 3 months ago by smasson

  • With [14331], all sette tests give the same results as in the trunk@14310. :-)
  • This is a temporary fix as the gestion of jperio will be completely reviewed.
Last edited 3 months ago by smasson (previous) (diff)

comment:11 Changed 3 months ago by smasson

In 14336:

dev_r14312_MPI_Interface: replace jperio by l_Iperio, l_Jperio, l_NFold, c_NFtype, #2598

comment:12 Changed 3 months ago by smasson

In 14337:

dev_r14312_MPI_Interface: 2 bugfixes following [14336], #2598

comment:13 Changed 3 months ago by smasson

[14337] passes all sette tests and gives the same results as trunk@14310

comment:14 Changed 3 months ago by smasson

In 14338:

dev_r14312_MPI_Interface: simplification of lbclnk and lbcnfd and their generic interfaces, #2598

comment:15 Changed 3 months ago by smasson

[14338] passes all sette tests and gives the same results as trunk@14310

In this commit, I reduced the complexity of lbclnk and lbcnfd.

  • I replaced lbc_lbk_multi by lbc_lbk → we are now always using what was corresponding to lbc_lbk_multi even if we have only 1 array
  • I strongly reduce the number of interfaces in lbclnk and lbcnfd in order to simplify and speed-up the compilation time of these routines
  • I renamed some of the lbc generic routines to make the hierarchy of routines more understandable (hopefully)
  • I simplified the c preprocessing in the generic routines
  • I added a new namelist parameter "nn_comm" in nammpp (better name to be found…) to chose between point-to-point or neighbourhood collectives communications

comment:16 Changed 3 months ago by smasson

In 14343:

dev_r14312_MPI_Interface: bugfix when suppressing land-domains along the NP folding, #2598

comment:17 Changed 3 months ago by smasson

In 14344:

dev_r14312_MPI_Interface: missing one piece in [14343], #2598

comment:18 Changed 2 months ago by smasson

In 14346:

dev_r14312_MPI_Interface: minor fixes in tests, #2598

comment:19 Changed 2 months ago by smasson

In 14349:

dev_r14312_MPI_Interface: further simplifications of lbclk and lbcnfd, #2598

comment:20 Changed 2 months ago by smasson

In 14350:

dev_r14312_MPI_Interface: create AGRIF branch, #2598

comment:21 Changed 2 months ago by smasson

Need to modify AGRIF so it recognizes the CONTIGUOUS attributes, we have been forced to add in lbc_blk_call.h90 in [14350].

In lbc_blk_call.h90, this fortran 2008 argument is indeed needed in the TARGET declarations:

REAL(PRECISION), DIMENSION(DIMS)          , TARGET, CONTIGUOUS, INTENT(inout) ::   pt1        ! arrays on which the lbc is applied
REAL(PRECISION), DIMENSION(DIMS), OPTIONAL, TARGET, CONTIGUOUS, INTENT(inout) ::   pt2 , pt3 , pt4 , pt5 , pt6 , pt7 , pt8 , pt9, &
         &                                                                         pt10, pt11, pt12, pt13, pt14, pt15, pt16

...

REAL(PRECISION), DIMENSION(DIMS), TARGET, INTENT(inout), CONTIGUOUS ::   ptab       ! arrays on which the lbc is applied

in order compile the following line

ptab_ptr(kfld)%pt4d(1:SIZE(ptab, dim=1),1:SIZE(ptab, dim=2),1:ISZ3,1:ISZ4) => ptab

This line and CONTIGUOUS are the key tricks found by Andrew to use 4d pointer even with 2d or 3d input arrays and therefore suppress all 2d and 3d interfaces of lbc_lnk_pt2pt, lbc_lnk_neicoll, lbc_nfd, mpp_nfd and lbc_nfd_nogather.
This greatly reduces the size of lnclnk and lbcfld routines and makes the generic routines much more readable (as the only remaining cpp parameters in lbc_lnk_pt2pt, lbc_lnk_neicoll, lbc_nfd, mpp_nfd and lbc_nfd_nogather are PRECISION and MPI_TYPE).

Last edited 2 months ago by smasson (previous) (diff)

comment:22 Changed 2 months ago by smasson

In 14351:

dev_r14312_MPI_Interface: update AGRIF branch to accept CONTIGUOUS attribute, #2598

comment:23 Changed 2 months ago by smasson

In 14352:

dev_r14312_MPI_Interface: use the new AGRIF branch, #2598

comment:24 Changed 2 months ago by smasson

[14352] passes all sette tests and gives the same results as trunk@14310
The number of lines in lbcnk.f90+lbcnlf.f90 has been divided by 5 in comparison with trunk@14310 :-)

Last edited 2 months ago by smasson (previous) (diff)

comment:25 Changed 2 months ago by smasson

In 14355:

dev_r14312_MPI_Interface: update CANAL and TSUNAMI tests, missing part of [14346], #2598

comment:26 Changed 2 months ago by smasson

In 14363:

dev_r14312_MPI_Interface: suppress communications involving only land points, #2598

comment:27 Changed 2 months ago by smasson

[14363] passes all sette tests and gives the same results as trunk@14310
For an unexplained reason, this optimisation is not wotking with neighbourhood collective communications (nn_comm /= 1).
In consequence (and hopefully temporary), I have been force to by-pass this optimisation when nn_comm /= 1…
This is done in init_excl_landpt in mppini.F90 line 1186:

IF( nn_comm == 1 ) THEN       ! SM: NOT WORKING FOR NEIGHBOURHOOD COLLECTIVE COMMUNICATIONS, I DON'T KNOW WHY... 
...
ENDIF

comment:28 Changed 2 months ago by smasson

In 14365:

dev_r14312_MPI_Interface: use NetCDF global attributes to store configs info, follows [14336], #2598

comment:29 Changed 2 months ago by smasson

In 14366:

dev_r14312_MPI_Interface: prperly define corner in point-2-point communication when needed, #2598

comment:30 Changed 2 months ago by smasson

[14366] passes all sette tests and gives the same results as trunk@14310

comment:31 Changed 2 months ago by nicolasmartin

  • Owner set to smasson
  • Status changed from new to assigned

comment:32 Changed 2 months ago by smasson

In 14367:

dev_r14312_MPI_Interface: keep send/recv buffers in memory if smaller than jpi*jpj, #2598

comment:33 Changed 2 months ago by smasson

In 14368:

dev_r14312_MPI_Interface: AGRIF working with sp and dp arrays, needed by [14367], #2598

comment:34 Changed 2 months ago by smasson

In 14369:

dev_r14312_MPI_Interface: syntaxe error in [14368], #2598

comment:35 Changed 2 months ago by smasson

[14369] passes all sette tests and gives the same results as trunk@14310

comment:36 Changed 2 months ago by smasson

In 14379:

dev_r14312_MPI_Interface: refactoring of lbc_lnk_pt2pt_generic, #2598

comment:37 Changed 2 months ago by smasson

[14379] requires to add a few lines in ext/FCM/lib/Fcm/BuildTask.pm (thank you Andrew!)
With this change, [14379] passes all sette tests and gives the same results as trunk@14310

  • ext/FCM/lib/Fcm/BuildTask.pm

     
    124124 
    125125  return unless $self->actiontype; 
    126126 
     127  if ( defined $self->srcfile ) { 
     128    my $bname = basename($self->srcfile->src) ; 
     129    for my $depend (@{ $self->dependency }) { 
     130      if ( $bname eq $depend ) { 
     131       # Recursion suspected 
     132       return; 
     133      } 
     134    } 
     135  } 
     136 
    127137  my $uptodate     = 1; 
    128138  my $dep_uptodate = 1; 

comment:38 Changed 2 months ago by smasson

other developments to consider:

  • fix the optimization for lnc_lbk_neicoll. As specified in ticket:2598#comment:27, suppressing communications involving only land points is not working with lnc_lbk_neicoll. I don’t know why.
  • review specific lbc in BDY. This first needs a review of lsend_bdy and lrecv_bdy in bdyinit with the help of Gaston Irrmann. For example, how do we deal with corners.
  • review/supress specific lbc in ICB…
  • Now that all calls to lbc_lnk use what was called the "4d multi” interface, it should be possible to call lnc_lnk with arrays of different shape as soon as the first 2 dimensions (i and j) are equal. I think this is quite easy to do in lbc_lnk_pt2pt and in lbc_lnk_neicoll. We mostly have to declare ipk and ipl as 1d array and rewrite the loops as DO jf = 1, ipf ; DO jl = 1, ipl(jf) ; DO jk = 1, ipk(jk). The problem comes from the NP folding. NP folding routines should also be adapted to deal with different values of ipl(jf) and ipk(jk) → see next point
  • North-Pole Folfing: should be compliant with 3 new constrains:

(1) should work with halo size khls /= nn_hls. I think the first point is needed for ICB.
(2) should not assume that east-west communication has already been done (→ do the NP folling only with inner domain points). this suppress dependency between communications.
(3) should be able to deal with different values of ipl(jf) and ipk(jk), see previous point.

I modified lbc_nfd_generic.h90 so it is compliant with constrains 1 and 2 (I hope) but I think that mpp_nfd_generic.h90 lbc_nfd_nogather_generic.h90 are not working if khls /= nn_hls. They should be modified/updated.
I think there is a room for some optimization in the “nogather" version when looking at the neighbourgs: (1) exclude communications including only land points and (2) I think we should not consider the halos when looking for neighbourgs.

  • optimize northern cores size when doing NP folding:

As pointed out by Jesus Labarta, the NP folding is slowing down the northern cores and quickly de-synchronize all cores which slows down everything.. A quick and simple fix/patch is to minimize the j-size of the northern domains so they compute faster and they can take a longer time to do the NP folding without slowing down the other cores.
A first optimization in this direction is already existing : we force all domain rows, except the northern one, to have jpjmax lines which minimizes the j-size of the remaining northern row of domains (depending of the domain decomposition). So today we first do the domain decomposition and next play with the potential difference between jpj and jpjmax to minimize the northern domains.
We could modify this strategy. We could, for example, first fixe the northern domains j-size at a minimal value (to be defined) and next built the domain decomposition on the remaining part of the domain. This would ensure that the j-size of the northern domains is always minimized. This is not very difficult do do once we have a good criterion do define the “minimal j-size” we want to use…

  • activate in lbc_lnk the optional argument ld4only whenever it is possible to communicate only with 4 neighbourgs. Would additional options such as do only exchanges in a given direction would be helpful? If yes, this is very easy to do.
  • open questions: we could also fill the buffers and send only the ocean points of the halos. Maybe this would make a différence when working with 3D arrays? This could easily be done with the pack/unpack fortran functions. The problem would be to find which mask must be used…


comment:39 Changed 2 months ago by smasson

In 14403:

dev_r14312_MPI_Interface: merge with trunk@14402, #2598

comment:40 Changed 2 months ago by smasson

[14403] passes all sette tests and gives the same results as trunk@14401
Note that we stil need the patch on ext/FCM/lib/Fcm/BuildTask.pm as described in comment:37

comment:41 Changed 2 months ago by smasson

[14403] passes all sette tests and gives the same results as trunk@14401 including with debug options (X64_IRENE_DEBUG) of with gcc (X64_IRENE_GCC)

comment:42 Changed 2 months ago by smasson

In 14417:

dev_r14312_MPI_Interface: put back propper alignment, #2598

comment:43 Changed 2 months ago by smasson

In 14430:

FCM: update Build Task?.pm to avoid circular dependency, ticket:2598#comment:37

comment:44 Changed 2 months ago by smasson

In 14431:

agrif: merge AGRIF/dev_r14312_MPI_Interface into AGRIF/dev, ticket:2598#comment:21

comment:45 Changed 2 months ago by smasson

In 14432:

dev_r14312_MPI_Interface: use AGRIF/dev branch, #2598

comment:46 Changed 2 months ago by smasson

In 14433:

trunk: merge dev_r14312_MPI_Interface into the trunk, #2598

comment:47 Changed 2 months ago by smasson

  • Resolution set to fixed
  • Status changed from assigned to closed

[14434] passes all sette tests and gives the same results as trunk@14401

comment:48 Changed 4 weeks ago by smasson

In 14611:

AGRIF: delete dev old branch, #2598 and #2615

Note: See TracTickets for help on using tickets.