Opened 5 years ago

Closed 3 years ago

Last modified 3 years ago

#1522 closed Bug (fixed)

Bad MPP decomposition with key_nemocice_decomp

Reported by: joakim Owned by: nemo
Priority: normal Milestone:
Component: OCE Version: release-3.6
Severity: Keywords: CICE MPI MPP decomposition


With key_nemocice_decomp NEMO alters how it calculates the size of subdomains in nemogcm.F90 and mppini.F90.
In some cases, some subdomains can get negative jpi and/or jpj, which will raise MPI errors and NEMO will crash.
As an example, when using 1680 cores, NEMO automatically sets jpi=16, jpj=72, jpnij=1680, jpni=105, jpnj=8. This causes the subdomains on the eastern boundary to have jpi=-14, which causes NEMO to crash.

Commit History (0)

(No commits)

Change History (11)

comment:1 Changed 5 years ago by charris

I'm aware you can get problems with high numbers of processors which is because for NEMO you tell it the number of processors E-W and N-S whereas for CICE you tell it the block size (equivalent to jpi-2 and jpj-2) directly. So in this case CICE decides that for jpi=16 (block_size_x=14) it only needs 104 processors E-W which is actually the more sensible thing to do because otherwise you are using more processors than you need (regardless of whether you are using key_nemocice_decomp). Not sure the best way of persuading NEMO to do the same. Some better error trapping is required but really the answer here is just to use jpni=104. Hopefully we could get the land-suppression option to effectively do this automatically but that would probably require some code changes to the mpp_init2 routine to ensure that ilci/j are set correctly. There could also be assumptions in the NEMO code that the boundaries are always on the first / last processors.

More generally I've a question of whether you are really planning on running on this kind of decomposition for ORCA025 or whether you are just testing scaling etc.


comment:2 Changed 5 years ago by joakim

Hi Chris

Yes, I've dug into the code and it seems like NEMO does a much better MPI decompositions when using LIM instead.
I'm still in the process of making NEMO run with CICE in a regional "southern ocean only" configuration, so I'm still not sure on the number of processors I'll be using.
I might never even use jpnij=1680.

I'll see what I can come up with, but it does seem that the key_cice_decomp parts of the code might need some more work to prevent or at least warn the user when the decomposition is such that jpi or jpj become negative.


comment:3 Changed 4 years ago by nicolasmartin

  • Keywords MPI added; mpi removed

comment:4 Changed 4 years ago by nicolasmartin

  • Keywords CICE added; cice removed

comment:5 Changed 4 years ago by nicolasmartin

  • Keywords nemo_v3_6* added

comment:6 Changed 4 years ago by nicolasmartin

  • Keywords MPP nemo_v3_6_alpha added; mpp removed

comment:7 Changed 4 years ago by nicolasmartin

  • Keywords nemo_v3_6_alpha removed

comment:8 Changed 4 years ago by nicolasmartin

  • Keywords decomposition added; subdomain removed

comment:9 Changed 3 years ago by clevy

  • Resolution set to fixed
  • Status changed from new to closed

comment:10 Changed 3 years ago by nemo

  • Keywords release-3.6* added; nemo_v3_6* removed

comment:11 Changed 3 years ago by nemo

  • Keywords release-3.6* removed
Note: See TracTickets for help on using tickets.