Opened 7 weeks ago

Closed 2 weeks ago

Last modified 8 days ago

#2492 closed Bug (fixed)

Out-of-bounds error in ORCA2-based mono-processor configuration

Reported by: smueller Owned by: smueller
Priority: low Milestone:
Component: ICB Version: release-4.0
Severity: minor Keywords: ICB, LBC, non-MPP
Cc: smasson, pierre.mathiot@…

Description

Context

A user of an ORCA2-based mono-processor configuration (key_mpp_mpi undefined) has reported an out-of-bounds error which occurs during the north-fold boundary exchange in subroutine lbc_nfd_2d_ext.

Analysis

This out-of-bounds error can readily be reproduced in reference configuration ORCA2_ICE_PISCES by removing CPP-keys key_mpp_mpi and key_iomput.

The error is caused by the initialisation of array tmask_e(0:jpi+1,0:jpj+1) in subroutine icb_init; it is absent when iceberg handling is disabled (ln_icebergs = .FALSE.). The initialisation of tmask_e is finalised by calling subroutine mpp_lnk_2d_icb (interface lbc_lnk_icb), which differs in the north-fold treatment depending on whether jpni is 1 (incl. mono-processor case) or greater: if jpni = 1, subroutine lbc_nfd_2d_ext (implemented in source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/LBC/lbc_nfd_ext_generic.h90) is called. Of array tmask_e(0:jpi+1,0:jpj+1), the subset tmask_e(1:jpi,1:jpj+1) is passed to subroutine lbc_nfd_2d_ext by subroutine mpp_lnk_2d_icb. While subroutine lbc_nfd_2d_ext refers to this subset as ptab(1:jpi,0:jpj), it accesses array elements with a dimension-2 subscript of nlcj+1. Since jpj == nlcj, this results in out-of-bounds array access and potentially incorrect content of the array.

The same error also affects arrays {u,v}mask_e initialised in subroutine icb_init (source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90) and {uo,vo,ff,tt,fr,ua,va,hi,vi}_e initialised in subroutine icb_utl_copy (source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbutl.F90). Unrelated to out-of-bounds array access, it appears that the lbc_lnk_icb calls used to initialise arrays {u,v}mask_e specify incorrect grid types ('T'), which could result in further incorrect boundary exchanges that may negatively affect these two arrays.

Fix

Subroutine mpp_lnk_2d_icb could be adjusted to retain the bounds for dimension 2 of the array passed to subroutine lbc_nfd_2d_ext, i.e.,

  • src/OCE/LBC/lbclnk.F90

     
    381381      IF( npolj /= 0 ) THEN 
    382382         ! 
    383383         SELECT CASE ( jpni ) 
    384                    CASE ( 1 )     ;   CALL lbc_nfd          ( pt2d(1:jpi,1:jpj+kextj), cd_type, psgn, kextj ) 
     384                   CASE ( 1 )     ;   CALL lbc_nfd          ( pt2d(1:jpi,1-kextj:jpj+kextj), cd_type, psgn, kextj ) 
    385385                   CASE DEFAULT   ;   CALL mpp_lbc_north_icb( pt2d(1:jpi,1:jpj+kextj), cd_type, psgn, kextj ) 
    386386         END SELECT 
    387387         ! 

Further, the grid types specified in the lbc_lnk_icb calls for arrays {u,v}mask_e could be adjusted according to

  • src/OCE/ICB/icbini.F90

     
    239239      umask_e(:,:) = 0._wp   ;   umask_e(1:jpi,1:jpj) = umask(:,:,1) 
    240240      vmask_e(:,:) = 0._wp   ;   vmask_e(1:jpi,1:jpj) = vmask(:,:,1) 
    241241      CALL lbc_lnk_icb( 'icbini', tmask_e, 'T', +1._wp, 1, 1 ) 
    242       CALL lbc_lnk_icb( 'icbini', umask_e, 'T', +1._wp, 1, 1 ) 
    243       CALL lbc_lnk_icb( 'icbini', vmask_e, 'T', +1._wp, 1, 1 ) 
     242      CALL lbc_lnk_icb( 'icbini', umask_e, 'U', +1._wp, 1, 1 ) 
     243      CALL lbc_lnk_icb( 'icbini', vmask_e, 'V', +1._wp, 1, 1 ) 
    244244      ! 
    245245      ! assign each new iceberg with a unique number constructed from the processor number 
    246246      ! and incremented by the total number of processors 

Commit History (2)

ChangesetAuthorTimeChangeLog
13350smueller2020-07-28T14:28:29+02:00

Remedy for the bugs reported in ticket #2492

13276mathiot2020-07-09T09:47:18+02:00

ticket #2494 and #2375: wrong point type inn lbc_lnk_icb for umask_e and vmask_e (see ticket #2492)

Change History (11)

comment:1 Changed 7 weeks ago by smasson

  • Cc smasson added
Last edited 5 weeks ago by smasson (previous) (diff)

comment:2 Changed 5 weeks ago by mathiot

  • Cc pierre.mathiot@… added

comment:3 Changed 5 weeks ago by mathiot

In 13276:

ticket #2494 and #2375: wrong point type inn lbc_lnk_icb for umask_e and vmask_e (see ticket #2492)

comment:4 Changed 2 weeks ago by smueller

  • Version 4.0-HEAD deleted

Subroutine mpp_lnk_2d_icb only appears to be called from within the ICB source code (subroutines icb_ini and icb_utl_copy), so its modification should only affect model runs with ln_icebergs=.true.. Further, the proposed modification of mpp_lnk_2d_icb only affects a subroutine call when jpni=1 and, since runs using the model compiled with key_mpp_mpi, ln_icebergs=.true., and jpni=1 are explicitely prevented (source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90:#L112), it should only affect mono-processor runs without key_mpp_mpi.

comment:5 Changed 2 weeks ago by smueller

  • Version set to release-4.0

comment:6 Changed 2 weeks ago by smueller

  • Owner changed from systeam to smueller
  • Status changed from new to assigned

In an email discussion it was proposed to test source:/NEMO/releases/r4.0/r4.0-HEAD with the above fix for module lbclnk by comparing the run.stat output files produced by LONG runs with the ORCA2_ICE_PISCES reference configuration i) as used by SETTE (with key_mpp_mpi, jpni=4, and jpnj=8), ii) with jpni=1 after disabling line source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90@13346:#L112, and iii) without key_mpp_mpi; it was also suggested that the second of the bugs reported above, the specification of incorrect grid types in the initialisation of arrays {u,v}mask_e, should be fixed as proposed (see also [13276]).

comment:7 Changed 2 weeks ago by smueller

After disabling line source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90@13346:#L112, the model in ORCA2_ICE_PISCES reference configuration with jpni=1 and jpnj=32 crashes at time step 693; after including the proposed fixes for modules lbclnk and icbini, this model crash no longer occurs.

comment:8 Changed 2 weeks ago by smueller

The proposed test (see comment:6) has been successful: run.stat files produced using source:/NEMO/releases/r4.0/r4.0-HEAD/@13346 with the proposed fixes of modules lbclnk and icbini are identical across all three cases; further, in cases i and iii, the run.stat output files are also identical to the corresponding run.stat files produced using source:/NEMO/releases/r4.0/r4.0-HEAD@13346 without the proposed fixes (in case ii, one of the runs did not complete, see comment:7).

Further, it has also been found that output files tracer.stat differ between the MPP cases (i, ii) and the mono-processor case without key_mpp_mpi (iii); the tracer.stat output, however, has remained unchanged after the proposed fixes have been applied both in case i and iii. This difference in tracer.stat output appears to be unrelated to the bugs detailed above and should be reported in a different ticket.

comment:9 Changed 2 weeks ago by smueller

In 13350:

Remedy for the bugs reported in ticket #2492

comment:10 Changed 2 weeks ago by smueller

  • Resolution set to fixed
  • Status changed from assigned to closed

source:/NEMO/releases/r4.0/r4.0-HEAD@13350 has passed the standard SETTE tests. Further, source:/NEMO/releases/r4.0/r4.0-HEAD@13350 compiled with debug options (incl. bounds checking) and without key_mpp_mpi and key_iomput runs successfully.

comment:11 Changed 8 days ago by mathiot

Should this fix also be added to the trunk ? Is the plan to have a big push of bug fixes into the trunk based on the NEMO 4.0.3 released ?

Note: See TracTickets for help on using tickets.