Opened 5 years ago

Closed 5 years ago

#1704 closed Bug (fixed)

NEMO reproducibility fails with land domains exclusion

Reported by: lovato Owned by: nemo
Priority: high Milestone:
Component: OCE Version: release-3.6
Severity: Keywords:
Cc:

Description

When running SETTE with the default settings all tests of NEOM 3.6 are passed.
I setup a specific test to verify the reproducibility of the code with land domain exclusion,
as this feature is very useful in high resolution configurations.

I tried to use configuration 4 (ORCA2_LIM_PISCES) to test the reproducibility for a run with a total of
88 PEs (80 ocean+8 land) and one with only the ocean domains (80 PEs). See the wiki of this ticket for the diff file of changes used in the test.

The test for reproducibility failed.

After a while digging in the code, I realised that the problem is due to the land domain exclusion criterion. In fact, the exclusion of land domains when all points of the internal domain are zeros is not taking into account for the need to preserve the ocean points that lies in the overlapping region of these domains. These point are necessary to maintain the coherence of the model between the full domain decomposition and the reduced one without land, since they enter in the definition, e.g., boundary conditions through the MPI data exchange.

I made the following change in the mppini_2.h subroutine to modify the exclusion criteria, which now consider a domain as land when all points within the region (inner+overlap) are zero (I also modified the offline decomposition tool accordingly):

isurf = 0

  • DO jj = 1+jprecj, ilj-jprecj
  • DO ji = 1+jpreci, ili-jpreci

+ DO jj = 1, ilj
+ DO ji = 1, ili

IF( imask(ji+iimppt(ii,ij)-1, jj+ijmppt(ii,ij)-1) == 1) isurf = isurf+1

END DO

END DO

After these modifications, I rerun SETTE using the configuration 4 test always using 88 PEs, but this time the decomposition was 81 ocean + 7 land.

The reproducibility test passed.

If confirmed by other members of the ST, this issue applies to both 3.6 and the trunk ( and also to NEMO 3.4) I think it is also necessary to uniform the log information produced by both mppini and mppini2.

I attach to this issue a substantially revised version of the offline tool which provides a more synthetic output of domain decompositions structure and metrics.


Commit History (2)

ChangesetAuthorTimeChangeLog
6413lovato2016-03-31T18:22:52+02:00

Revise domain decomposition with land PEs exclusion (see ticket #1704)

6412lovato2016-03-31T18:22:32+02:00

Revise domain decomposition with land PEs exclusion (see ticket #1704)

Attachments (2)

mpp_domain_decomposition.f90 (14.1 KB) - added by lovato 5 years ago.
Revised offline decomposition tool routine
namelist_mpp (1.2 KB) - added by lovato 5 years ago.
Revised offline decomposition tool NAMELIST example

Download all attachments as: .zip

Change History (4)

Changed 5 years ago by lovato

Revised offline decomposition tool routine

Changed 5 years ago by lovato

Revised offline decomposition tool NAMELIST example

comment:1 Changed 5 years ago by smasson

great job!

Is this the bug we were searching for years and that what preventing the merge between mppini and mppini_2?

A long time ago, we had problems in iom.F90 with land domain exclusion. At that time, a quick patch was proposed by introducing a specific case when (jpni * jpnj ) == jpnij. We added the logical llnoov that was false when some land domain are excluded. Could you try to see if your bugfix also fix this old problem by modifing the definition of llnoov in iom.F90 → replace it by a simple llnoov = .NOT. lk_agrif

Btw, I don't see why agrif is interfering in the definition of llnoov. This was introduced by Rachid at revision 1200: http://forge.ipsl.jussieu.fr/nemo/changeset/1200/#file38 We should ask him, he is maybe remembering… At the end, could we remove in iom.F90 all modifications associated with llnoov (and starting with the line: ! JMM + SM: ugly patch before getting the new version of lib_mpp)?

comment:2 Changed 5 years ago by lovato

  • Resolution set to fixed
  • Status changed from new to closed

The proposed solution to fix the reproducibility bug was implemented in nemo_v3_6_STABLE at r6412 and in the trunk at r6413.
The codes were tested with SETTE using the configurations described in the wiki page associated to this ticket.

Information printed by mppini and mppini_2 were aligned to produce the same set of information in the NEMO log file.

The MPP_REP tools were also updated accordingly.

Although the NEMO V3.4 is considered obsolete, there are still users working with this release and we should consider to provide the same update.

Note: See TracTickets for help on using tickets.