#2213 closed Defect (fixed)

Model freeze when using BDY in some particular cases

Reported by: molines Owned by: systeam
Priority: low Milestone:
Component: BDY Version: trunk
Severity: minor Keywords: bdy, bdyini mpp_lnk_bdy_xxx
Cc: smasson@…

Description

Context

When using a configuration with open boundaries, it freezes at the first step under certain circumstances. This behaviour is sensitive to the domain decomposistion.

Analysis

After some tedious debuging in bdyini.F90 (in the routine bdy_segs, very busy by the way !), a dead-lock appears when the end of a boundary segment lays exactly in the first point of a halo between 2 adjacents processors (for instance, in the I directioni [ W and E], if the ending point is at nlci-1 ( W) corresponding to 1 (E). In this particular case, a east communication is triggered, without the corresponding west communication.

The critical piece of code is ( in bdyini.F90 ) :

 940                      ! check if point has to be sent
 941                      ii = idx_bdy(ib_bdy)%nbi(icount,igrd)
 942                      ij = idx_bdy(ib_bdy)%nbj(icount,igrd)
 943                      if((com_east .ne. 1) .and. (ii == (nlci-1)) .and. (nbondi .le. 0)) then
 944                         com_east = 1
 945                      elseif((com_west .ne. 1) .and. (ii == 2) .and. (nbondi .ge. 0) .and. (nbondi .ne. 2)) then
 946                         com_west = 1
 947                      endif
 948                      if((com_south .ne. 1) .and. (ij == 2) .and. (nbondj .ge. 0) .and. (nbondj .ne. 2)) then
 949                         com_south = 1
 950                      elseif((com_north .ne. 1) .and. (ij == (nlcj-1)) .and. (nbondj .le. 0)) then
 951                         com_north = 1
 952                      endif

Suppose your ending point is at ii=nlci-1 (W) hence ii=1 (E). Lines 943-944 will set com_east=1 (for W) and lines 945-946 will let com_west=0 (for E ).

For sure the problem will be the same on the N S direction. (lines 948-949 and lines 950-951).

The problem was fixed successfully (in my case of a structured BDY), by changing lines 943 to :

 943                      if((com_east .ne. 1) .and. (ii == (nlci)) .and. (nbondi .le. 0)) then

Of course corresponding tests for com_west_b and com_east_b must be adapted accordingly (for instance )

 962                        if((com_west_b .ne. 1) .and. (ii == (nlcit(nowe+1)))) then

This problem was not present is 3.6 although bdy_ini was the same. It is because the change is in lib_mpp.F90 ( or now mpp_bdy_generic.h90 ). In 3.6 send-rcv messages were triggered by both ( eg) nbondi_bdy AND nbondi_bdy_b). Now, send-rcv are triggered by nbondi_bdy ONLY, and nbondi_bdy_b is used just for putting the exchanged values at the right place.

Recommendation

The bug was activated by the 'simplification' in ROUTINE_BDY, but I think that the fix must be in bdyini. On the other hand, the simplification was done to avoid a comm. and I personally have doubts about the impact on performance of such a change.

Commit History (3)

ChangesetAuthorTimeChangeLog
10630smasson2019-02-04T17:09:57+01:00

v4.0: bugfix in mpp for bdy, back to v3.6, see #2213, #2224, #2225

10629smasson2019-02-04T17:07:39+01:00

trunk: bugfix in mpp for bdy, back to v3.6, see #2213, #2224, #2225

10537smasson2019-01-16T21:41:21+01:00

trunk: bugfix in bdyini, see #2213

Change History (7)

comment:1 Changed 15 months ago by smasson

In 10537:

trunk: bugfix in bdyini, see #2213

comment:2 Changed 15 months ago by smasson

Jean-Marc, following your recommendations, I made some corrections in [10537], could you tell me if it works, so I could close the ticket?

comment:3 Changed 15 months ago by smasson

  • Cc smasson@… added

comment:4 Changed 14 months ago by smasson

In 10629:

trunk: bugfix in mpp for bdy, back to v3.6, see #2213, #2224, #2225

comment:5 Changed 14 months ago by smasson

In 10630:

v4.0: bugfix in mpp for bdy, back to v3.6, see #2213, #2224, #2225

comment:6 Changed 14 months ago by smasson

see discussion in #2224
I close the ticket

comment:7 Changed 14 months ago by smasson

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.