Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#2070 closed Enhancement (fixed)

Reorganisation of nemogcm.F90 and mppini.F90 to better separate domain decomposition functions from the rest of the initialisation

Reported by: acc Owned by: acc
Priority: high Milestone: 2018 release-4.0
Component: OCE Version: trunk
Severity: minor Keywords:
Cc: Review:
MP ready?:
Progress:

Description (last modified by acc)

Context

Reorganisation of nemogcm.F90 and mppini.F90 to better separate domain decomposition functions from the rest of the initialisation. This reorganisation (thanks to Sebastien) also fixes issues with land suppression and offers the opportunity to tidy up some long standing issues relating to halo use in prt_ctl (prtctl.F90).
A few other changes have been made in the process which remove redundant operations or add new features. These are detailed here separately and have been submitted as separate changesets for clarity

Stages

1. Main reorganisation

Relocation of partition functions (nemo_partition, factorise and nemo_ndfcom) from nemogcm.F90 to mppini.F90 and associated move of allocations from dom_oce.F90 to mppini.F90. mpp_init is now called before nemo_alloc and contains all code related to setting the individual domain sizes. The algorithm has been corrected to correctly set domain sizes and identify neighbours in all cases including land suppression. layout.dat now labels entries with processor rank rather than narea. Modules affected are:

NEMOGCM/NEMO/OFF_SRC/nemogcm.F90
NEMOGCM/NEMO/OPA_SRC/nemogcm.F90
NEMOGCM/NEMO/SAO_SRC/nemogcm.F90
NEMOGCM/NEMO/SAS_SRC/nemogcm.F90
NEMOGCM/NEMO/OPA_SRC/DOM/dom_oce.F90
NEMOGCM/NEMO/OPA_SRC/LBC/mppini.F90

Submitted at Changeset: [9436]

2. Code update:

Line 2396 of iom.F90 contained the statement:

      ! do we read the overlap 
      ! ugly patch SM+JMM+RB to overwrite global definition in some cases
      llnoov = (jpni * jpnj ) == jpnij .AND. .NOT. lk_agrif

with the algorithm fixes there should no longer be a need to read overlaps in the land suppressed case. The statement has been rationalised to:

      llnoov = .NOT. lk_agrif

subject to deeper testing beyond SETTE. Module affected is:

NEMOGCM/NEMO/OPA_SRC/IOM/iom.F90

Submitted at Changeset: [9437]

3. New feature:

Currently a bad ctl_opn call only reports an error if the master processor raises the error first. Code has been added to ctl_opn so that any other processor raising the error reports to stderr. Module affected:

NEMOGCM/NEMO/OPA_SRC/LBC/lib_mpp.F90

Submitted at Changeset: [9438]

4. Code update:

Line 546 of sbcmod.F90 contains a redundant lbc_lnk call for emp. This is an old bug relating to non-symmetric forcing data across the north-fold (i.e. gridcells that are meant to be the same point receiving different emp fluxes - leading to divergence errors). Modern datasets and/or interpolation on the fly should negate the need for this exchange. It has been commented subject to deeper testing beyond SETTE. Module affected:

NEMOGCM/NEMO/OPA_SRC/SBC/sbcmod.F90

Submitted at Changeset: [9439]

5. New feature/ reduction of variants:

In the mono processor case, prt_ctl can be used to print mean values for a sub-domain (a psuedo-mpp domain). Code has been added to produce a layout.dat-like file (layout_prtctl.dat) to detail the sub domain parameters. At the same time, it is now risky to allow sums over the whole sub-domain since many routines (zdftke, for example) only compute interior values. The optional argument 'overl' which enabled this has been suppressed and all references to it removed elsewhere. Main module affected is:

NEMOGCM/NEMO/OPA_SRC/IOM/prtctl.F90

Other modules where the use of overl has been suppressed:

NEMOGCM/NEMO/OFF_SRC/dtadyn.F90
NEMOGCM/NEMO/OPA_SRC/TRA/eosbn2.F90
NEMOGCM/NEMO/SAS_SRC/sbcssm.F90
NEMOGCM/NEMO/OPA_SRC/DYN/sshwzv.F90
NEMOGCM/NEMO/OPA_SRC/TRD/trdmxl.F90
NEMOGCM/NEMO/OPA_SRC/ZDF/zdfddm.F90
NEMOGCM/NEMO/OPA_SRC/ZDF/zdfgls.F90
NEMOGCM/NEMO/OPA_SRC/ZDF/zdfiwm.F90
NEMOGCM/NEMO/OPA_SRC/ZDF/zdfmxl.F90
NEMOGCM/NEMO/OPA_SRC/ZDF/zdftke.F90

Submitted at Changeset: [9940]

6. New feature:

Add netcdf version of the run.stat file to stpctl.F90. This contains the same information written to the run.stat text file. The current implementation is basic and will overwrite any existing run.stat.nc file. Output is flushed to disk (NF90_SYNC) every 100 timesteps. This may need adjusting if this output is being used for run-time monitoring. Modules affected are:

NEMOGCM/NEMO/OPA_SRC/stpctl.F90
NEMOGCM/NEMO/SAS_SRC/stpctl.F90
NEMOGCM/CONFIG/TEST_CASES/CANAL/MY_SRC/stpctl.F90

Submitted at Changeset: [9441]

7. Local architecture changes:

Changes to local architecture and batch files to update compiler and XIOS versions:

NEMOGCM/SETTE/BATCH_TEMPLATE/batch-X64_MOBILIS
NEMOGCM/ARCH/arch-X64_MOBILIS.fcm

Submitted at Changeset: [9442]

SETTE notes

Full SETTE testing was carried out on the NOCS Mobilis platform. Tests were successful with the Intel 17.0.4 compiler (-i4 -r8 -O3 -fp-model source -xAVX) but there are problems with the older Intel 14.0/2013_sp1.3.174 compiler which has been reliable previously. Recommend discontinuing use of v14 compilers.

SETTE validation report :  @ r9442
WSAS_ST               ice restarts are IDENTICAL  passed :  20180327
WGYREPIS_ST           run.stat    restartability  passed :  20180327
WGYREPIS_ST           tracer.stat restartability  passed :  20180327
WORCA2LIM3PIS_ST      run.stat    restartability  passed :  20180327
WORCA2LIM3PIS_ST      tracer.stat restartability  passed :  20180327
WORCA2OFFPIS_ST       tracer.stat restartability  passed :  20180327
WAMM12_ST             run.stat    restartability  passed :  20180327
WISOMIP_ST            run.stat    restartability  passed :  20180327
WORCA2AGR_ST          run.stat    restartability  passed :  20180327
WGYREPIS_ST           run.stat    reproducibility passed :  20180327
WGYREPIS_ST           tracer.stat reproducibility passed :  20180327
WORCA2LIM3PIS_ST      run.stat    reproducibility passed :  20180327
WORCA2LIM3PIS_ST      tracer.stat reproducibility passed :  20180327
WORCA2OFFPIS_ST       tracer.stat reproducibility passed :  20180327
WAMM12_ST             run.stat    reproducibility passed :  20180327
WISOMIP_ST            run.stat    reproducibility passed :  20180327
WORCA2_LIM3_OBS_ST    run.stat    reproducibility passed :  20180327
WORCA2AGR_ST          run.stat    reproducibility FAILED :  20180327
AGRIFNOZ             AGRIFNO  AGRIF: run.stat unchanged - test  passed :  20180327 20180327

The WORCA2AGR_ST reproducibility test is reporting a failure but only diverges after 70 time steps which is as good as I've even been able to get with this test on this platform. The tests included and additional ORCA2LIM3PISCES reproducibility test with a 60 processor, 8x8 decomposition to test land suppression. Results were identical to the standard 4x8 and 8x4 fully populated cases over 1000 time steps.

Commit History (8)

ChangesetAuthorTimeChangeLog
9446smasson2018-03-28T17:17:03+02:00

dev_merge_2017: bugfix of a very nice typo error following the reorganisation of nemogcm.F90 and mppini.F90 in r9436 see #2070

9444smasson2018-03-28T14:25:17+02:00

dev_merge_2017: minor bugfix following the reorganisation of nemogcm.F90 and mppini.F90 r9436 see #2070

9441acc2018-03-27T15:57:02+02:00

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 7: Add netcdf version of the run.stat files to stpctl.F90. Current implementation is very basic but may introduce opportunities for run-time monitoring; see ticket #2070

9440acc2018-03-27T15:52:54+02:00

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 5: Enhancement to prtctl and suppression of overlap option; see ticket #2070

9439acc2018-03-27T15:39:22+02:00

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 4: code update to sbcmod.F90 to remove redundant lbc_lnk call (emp); see ticket #2070

9438acc2018-03-27T15:37:21+02:00

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 3: Better error handling in ctl_opn; see ticket #2070

9437acc2018-03-27T15:35:53+02:00

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 2: code update to iom.F90 to take advantage of algorithmic fix; see ticket #2070

9436acc2018-03-27T15:30:51+02:00

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90 to better separate domain decomposition functions from the rest of the initialisation. Stage 1: Main reorganisation; see ticket #2070

Change History (10)

comment:1 Changed 2 years ago by acc

In 9436:

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90 to better separate domain decomposition functions from the rest of the initialisation. Stage 1: Main reorganisation; see ticket #2070

comment:2 Changed 2 years ago by acc

In 9437:

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 2: code update to iom.F90 to take advantage of algorithmic fix; see ticket #2070

comment:3 Changed 2 years ago by acc

In 9438:

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 3: Better error handling in ctl_opn; see ticket #2070

comment:4 Changed 2 years ago by acc

In 9439:

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 4: code update to sbcmod.F90 to remove redundant lbc_lnk call (emp); see ticket #2070

comment:5 Changed 2 years ago by acc

In 9440:

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 5: Enhancement to prtctl and suppression of overlap option; see ticket #2070

comment:6 Changed 2 years ago by acc

In 9441:

Branch 2017/dev_merge_2017. Reorganisation of nemogcm.F90 and mppini.F90. Stage 7: Add netcdf version of the run.stat files to stpctl.F90. Current implementation is very basic but may introduce opportunities for run-time monitoring; see ticket #2070

comment:7 Changed 2 years ago by acc

  • Description modified (diff)

comment:8 Changed 2 years ago by acc

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from new to closed

comment:9 Changed 2 years ago by smasson

In 9444:

dev_merge_2017: minor bugfix following the reorganisation of nemogcm.F90 and mppini.F90 r9436 see #2070

comment:10 Changed 2 years ago by smasson

In 9446:

dev_merge_2017: bugfix of a very nice typo error following the reorganisation of nemogcm.F90 and mppini.F90 in r9436 see #2070

Note: See TracTickets for help on using tickets.