New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
[land-proc] ORCA2_LIM setup fails while removing - land only grid cells – Message List – Discussion – NEMO

Skilled (#3) - [land-proc] ORCA2_LIM setup fails while removing - land only grid cells (#65) - Message List

[land-proc] ORCA2_LIM setup fails while removing - land only grid cells
 solved

Hi all,

This topic is follow-up to the - http://forge.ipsl.jussieu.fr/nemo/discussion/topic/64. Now - i am trying out a standard input configuration (based on - ORCA2_LIM configuration) of nemo-3.6 + xios-2.0 (compiled with gnu compilers) with minimal tweaks, hence posting this issue as a new topic. The purpose is to see if the setup will work after eliminating land processors.

as keys were not getting picked, i had to manually add keys in FPPKEYS variable (in - ./EXTERNAL/fcm/lib/Fcm/Config.pm)

key_trabbl key_lim2 key_dynspg_flt key_diaeiv key_ldfslp key_traldf_c2d key_traldf_eiv key_dynldf_c3d key_zdftke key_zdfddm key_zdftmx key_iomput key_mpp_mpi key_diaobs key_asminc key_nosignedzero key_xios2

Here are the steps i followed to set up nemo's ORCA2_LIM configuration -

./makenemo -m XC40_NCM_gnu -r ORCA2_LIM -n MY_ORCA2_LIM  2>&1|tee logs
cd MY_ORCA2_LIM/EXP00
cp ../../ORCA2_LIM/EXP00/* .
cp ../../GYRE_XIOS/EXP00/*.xml .
tar xvf ../../ORCA2_LIM_nemo_v3.6.tar
gunzip *.gz */*.gz

Here are the modifications carried out in namelist_cfg file -

!-----------------------------------------------------------------------
&nammpp        !   Massively Parallel Processing                        ("key_mpp_mpi")
!-----------------------------------------------------------------------
   cn_mpi_send =  'I'      !  mpi send/recieve type   ='S', 'B', or 'I' for standard send,
                           !  buffer blocking send or immediate non-blocking sends, resp.
   nn_buffer   =   0       !  size in bytes of exported buffer ('B' case), 0 no exportation
   ln_nnogather=  .true.   !  activate code to avoid mpi_allgather use at the northfold
   jpni        =  10       !  jpni   number of processors following i (set automatically if < 1)
   jpnj        =  8       !  jpnj   number of processors following j (set automatically if < 1)
   jpnij       =  76     !  jpnij  number of local domains (set automatically if < 1)

on running simulation with -

aprun -n 76 ./nemo.exe

the error message i get is -

....
-> info : If domain grid_T does not have overlapped regions between processes something must be wrong with mask index
*** Error in `*** Error in `./nemo.exe': *** Error in `./nemo.exe': malloc(): memory corruption: 0x0000000005e2a590 ***
*** Error in `./nemo.exe': malloc(): memory corruption: 0x0000000005e297a0 ***
./nemo.exe': malloc(): memory corruption: 0x0000000005e297f0 ***
malloc(): memory corruption: 0x0000000005db6980 ***
======= Backtrace: =========
======= Backtrace: =========
======= Backtrace: =========
======= Backtrace: =========
/lib64/libc.so.6/lib64/libc.so.6(+0x721af)[0x2aaaadde71af]
....

i have uploaded relevant files for this run at - https://bitbucket.org/puneet336/nemo3.6_issue1/src/master/ORCA2_LIM/1672152/
standard error/output - https://bitbucket.org/puneet336/nemo3.6_issue1/src/master/ORCA2_LIM/1672152/ORCA2_LIM.o1672152
namelist - https://bitbucket.org/puneet336/nemo3.6_issue1/src/master/ORCA2_LIM/1672152/namelist_cfg
ocean.output - https://bitbucket.org/puneet336/nemo3.6_issue1/src/master/ORCA2_LIM/1672152/ocean.output





on changing namelist_cfg's jpnij to 80 -

!-----------------------------------------------------------------------
&nammpp        !   Massively Parallel Processing                        ("key_mpp_mpi")
!-----------------------------------------------------------------------
   cn_mpi_send =  'I'      !  mpi send/recieve type   ='S', 'B', or 'I' for standard send,
                           !  buffer blocking send or immediate non-blocking sends, resp.
   nn_buffer   =   0       !  size in bytes of exported buffer ('B' case), 0 no exportation
   ln_nnogather=  .true.   !  activate code to avoid mpi_allgather use at the northfold
   jpni        =  10       !  jpni   number of processors following i (set automatically if < 1)
   jpnj        =  8       !  jpnj   number of processors following j (set automatically if < 1)
   jpnij       =  80     !  jpnij  number of local domains (set automatically if < 1)

aprun -n 80 ./nemo.exe

The simulation runs to completion ( 5475 steps ). relevant output/configuration files for this simulation can be found at - https://bitbucket.org/puneet336/nemo3.6_issue1/src/master/ORCA2_LIM/1672307/


I am unable to figure out - that why land processor eliminated setup is failing to run, am i missing out on something in input file?.


1) Is there a thumb-rule to select jpnij while eliminating land processors?
2) Also, could you please suggest appropriate (any working) combination of jpni & jpnj where jpni x jpnj > jpnij - for (this) ORCA2_LIM configuration?

please let me know if i can provide any further information from my end.
Eagerly awaiting your replies.

Tree View Flat View (newer first) Flat View (older first)
  • Message #188

    Both the commands were failing with error

    aprun -n 76 ./nemo.exe
    aprun -n 76 ./nemo.exe : -n 4 ./xios_server.exe (iodef.xml : use_server = true)
    

    with namelist_cfg 's key_mpp_mpi section having -

       jpni        =  10       !  jpni   number of processors following i (set automatically if < 1)
       jpnj        =  8       !  jpnj   number of processors following j (set automatically if < 1)
       jpnij       =  76     !  jpnij  number of local domains (set automatically if < 1)
    

    Found out that when multiple gcc versions are loaded in environment while compiling xios - the issue shows up.

    Why i had to load multiple gcc: My default version of gcc (/usr/bin/gcc) was 4.8 and i was trying to compile nemo 3.6 + xios r1630 (with intel 2017). xios r1630 uses "regex" type/constructs in source code due to which i had to load gcc>=4.9 (module load).

    Solution: nemo_3.6 + xios_r1322 compiled fine with gcc 4.8 (hence no need to load additional modules). And ORCA2_LIM setup under MPMD/ non MPMD configuration worked fine. Please mark this issue as solved.

    highly likely that multiple gcc versions are root cause for http://forge.ipsl.jussieu.fr/nemo/discussion/topic/64 .

Tree View Flat View (newer first) Flat View (older first)

Attachments

No attachments created.