New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2019WP/ENHANCE-04_AndrewC-reporting – NEMO
wiki:2019WP/ENHANCE-04_AndrewC-reporting

Version 2 (modified by acc, 5 years ago) (diff)

--

ENHANCE-04_AndrewC-reporting

Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review

Summary

Investigate ways of improving the code's reporting facilities. Currently, errors away from the lead process are not fully reported and the ln_ctl mechanism produces an overwhelming volume of output for large processor counts. Options are needed to produce more selective output (either by output type or processor range). This is an un-started task from the 2018WP (formerly ROBUST-06_AndrewC-reporting) that has been carried forward to 2019. #2167

Preview

Error: Failed to load processor box
No macro or processor named 'box' found

One of the features of NEMO 4.0 is the large reduction in global communications at the cost of suppressing some global diagnostics which tend to be useful only during SETTE testing, debugging or new configuration development. Such diagnostics can be re-activated using the ln_ctl namelist variable but this is rather a blunt instrument in that it activates all extra output for all processing elements. This list ican include:

run.stat
run.stat.nc
tracer.stat
ocean.output_XXXX
layout.dat_XXXX
mpp.output_XXXX
mpp.top.output_XXX     <---- correct this to 4-digits for consistency
EMPave.dat_XXXX
icebergs.stat_XXXX

and possibly others depending on runtime options. One of the 2019 NEMO workplan entries (ENHANCE-04_AndrewC-reporting) plans to introduce more control over the choice of output created but it may be best to introduce the basic structure prior to the v4.0 release. Below is a summary of the overall plans and what can be implemented immediately if there is consensus.

At this stage, the plan is to leave ln_ctl as an all-or-nothing switch but to add the basic structure for finer control. For the v4.0 release, this new structure will only be capable of activating the global stats files (run.stat, run.stat.nc and tracer.stat) independently of ln_ctl but the placeholders will be there for extended control over other outputs. Minimal changes to achieve this will affect:

1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref
2. OCE/nemogcm.F90 OFF/nemogcm.F90 SAO/nemogcm.F90 SAS/nemogcm.F90
3. OCE/stpctl.F90
4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90
5. OCE/LBC/mppini.F90
6. sette/sette.sh

1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref

These changes introduce a derived-type structure into the in_out_manager module and populates the reference namelist with default values.

There are 5 basic sets of parameters in this structure:

* l_config                        Activates use of the settings in the rest of the structure (ln_ctl is ignored if this is true)
* l_runstat, l_trcstat            Activates production of global stats files. Only a single file of each of these is ever produced.
* l_oceout, l_layout, l_EMPave    Normal operation is to produce a single version of each of these. If true then a version for each area is produced
* l_mppout, l_mpptrc, l_icbstat   Suppressed if false, otherwise produce a version for each area.
* procmin, procmax, procincr      Allow subsetting of areas when producing output in the previous two categories. Default values will ensure all areas report.
Index: OCE/IOM/in_out_manager.F90
===================================================================
--- OCE/IOM/in_out_manager.F90  (revision 10459)
+++ OCE/IOM/in_out_manager.F90  (working copy)
@@ -99,6 +99,27 @@
    !!                    output monitoring
    !!----------------------------------------------------------------------
    LOGICAL ::   ln_ctl           !: run control for debugging
+   TYPE :: sn_ctl                !: optional use structure for finer control over output selection
+      LOGICAL :: l_config  = .FALSE.  !: activate/deactivate finer control
+                                      !  Note if l_config is True then ln_ctl is ignored.
+                                      !  Otherwise setting ln_ctl True is equivalent to setting
+                                      !  all the following logicals in this structure True
+      LOGICAL :: l_runstat = .FALSE.  !: Produce/do not produce run.stat file (T/F)
+      LOGICAL :: l_trcstat = .FALSE.  !: Produce/do not produce tracer.stat file (T/F)
+      LOGICAL :: l_oceout  = .FALSE.  !: Produce all ocean.outputs    (T) or just one (F)
+      LOGICAL :: l_layout  = .FALSE.  !: Produce all layout.dat files (T) or just one (F)
+      LOGICAL :: l_EMPave  = .FALSE.  !: Produce all EMPave.dat files (T) or just one (F) (if active)
+      LOGICAL :: l_mppout  = .FALSE.  !: Produce/do not produce mpp.output_XXXX files (T/F)
+      LOGICAL :: l_mpptop  = .FALSE.  !: Produce/do not produce mpp.top.output_XXXX files (T/F)
+      LOGICAL :: l_icbstat = .FALSE.  !: Produce/do not produce icebergs.stat_XXXX files (T/F)
+                                      !  Optional subsetting of processor report files
+                                      !  Default settings of 0/1000000/1 should ensure all areas report.
+                                      !  Set to a more restrictive range to select specific areas
+      INTEGER :: procmin   = 0        !: Minimum narea to output
+      INTEGER :: procmax   = 1000000  !: Maximum narea to output
+      INTEGER :: procincr  = 1        !: narea increment to output
+   END TYPE
+   TYPE (sn_ctl) :: sn_cfctl     !: run control structure for selective output
    LOGICAL ::   ln_timing        !: run control for timing
    LOGICAL ::   ln_diacfl        !: flag whether to create CFL diagnostics
    INTEGER ::   nn_print         !: level of print (0 no print)

Index: cfgs/SHARED/namelist_ref
===================================================================
--- cfgs/SHARED/namelist_ref    (revision 10459)
+++ cfgs/SHARED/namelist_ref    (working copy)
@@ -1301,7 +1301,19 @@
 !-----------------------------------------------------------------------
 &namctl        !   Control prints                                       (default: OFF)
 !-----------------------------------------------------------------------
-   ln_ctl      = .false.   !  trends control print (expensive!)
+   ln_ctl = .FALSE.                 ! Toggle all report printing on/off (T/F); Ignored if sn_cfctl%l_config is T
+     sn_cfctl%l_config = .TRUE.     ! IF .true. then control which reports are written with the following
+       sn_cfctl%l_runstat = .FALSE. ! switches and which areas produce reports with the proc integer settings.
+       sn_cfctl%l_trcstat = .FALSE. ! The default settings for the proc integers should ensure
+       sn_cfctl%l_oceout  = .FALSE. ! that  all areas report.
+       sn_cfctl%l_layout  = .FALSE. !
+       sn_cfctl%l_EMPave  = .FALSE. !
+       sn_cfctl%l_mppout  = .FALSE. !
+       sn_cfctl%l_mpptop  = .FALSE. !
+       sn_cfctl%l_icbstat = .FALSE. !
+       sn_cfctl%procmin   = 0       ! Minimum area number for reporting [default:0]
+       sn_cfctl%procmax   = 1000000 ! Maximum area number for reporting [default:1000000]
+       sn_cfctl%procincr  = 1       ! Increment for optional subsetting of areas [default:1]
    nn_print    =    0      !  level of print (0 no extra print)
    nn_ictls    =    0      !  start i indice of control sum (use to compare mono versus
    nn_ictle    =    0      !  end   i indice of control sum        multi processor runs

The new structure is read and reported in nemogcm.F90. Changes for all variants are similar OCE/nemogcm.F90 is shown here:

Index: OCE/nemogcm.F90
===================================================================
--- OCE/nemogcm.F90     (revision 10459)
+++ OCE/nemogcm.F90     (working copy)
@@ -256,8 +256,8 @@
       INTEGER  ::   ios, ilocal_comm   ! local integers
       CHARACTER(len=120), DIMENSION(60) ::   cltxt, cltxt2, clnam
       !!
-      NAMELIST/namctl/ ln_ctl   , nn_print, nn_ictls, nn_ictle,   &
-         &             nn_isplt , nn_jsplt, nn_jctls, nn_jctle,   &
+      NAMELIST/namctl/ ln_ctl   , sn_cfctl, nn_print, nn_ictls, nn_ictle,   &
+         &             nn_isplt , nn_jsplt, nn_jctls, nn_jctle,             &
          &             ln_timing, ln_diacfl
       NAMELIST/namcfg/ ln_read_cfg, cn_domcfg, ln_closea, ln_write_cfg, cn_domcfg_out, ln_use_jattr
       !!----------------------------------------------------------------------
@@ -327,6 +327,17 @@

       narea = narea + 1                                     ! mynode return the rank of proc (0 --> jpnij -1 )

+      IF( sn_cfctl%l_config ) THEN
+         ! Activate finer control of report outputs
+         ! optionally switch off output from selected areas (note this only
+         ! applies to output which does not involve global communications)
+         IF( ( narea < sn_cfctl%procmin .OR. narea > sn_cfctl%procmax  ) .OR. &
+           & ( MOD( narea - sn_cfctl%procmin, sn_cfctl%procincr ) /= 0 ) )    &
+           &   CALL nemo_set_cfctl( sn_cfctl, .FALSE., .FALSE. )
+      ELSE
+         CALL nemo_set_cfctl( sn_cfctl, ln_ctl, .TRUE. )
+      ENDIF
+
       lwm = (narea == 1)                                    ! control of output namelists
       lwp = (narea == 1) .OR. ln_ctl                        ! control of all listing output print

@@ -489,6 +500,18 @@
          WRITE(numout,*) '~~~~~~~~'
          WRITE(numout,*) '   Namelist namctl'
          WRITE(numout,*) '      run control (for debugging)     ln_ctl     = ', ln_ctl
+         WRITE(numout,*) '       finer control over o/p sn_cfctl%l_config  = ', sn_cfctl%l_config
+         WRITE(numout,*) '                              sn_cfctl%l_runstat = ', sn_cfctl%l_runstat
+         WRITE(numout,*) '                              sn_cfctl%l_trcstat = ', sn_cfctl%l_trcstat
+         WRITE(numout,*) '                              sn_cfctl%l_oceout  = ', sn_cfctl%l_oceout
+         WRITE(numout,*) '                              sn_cfctl%l_layout  = ', sn_cfctl%l_layout
+         WRITE(numout,*) '                              sn_cfctl%l_EMPave  = ', sn_cfctl%l_EMPave
+         WRITE(numout,*) '                              sn_cfctl%l_mppout  = ', sn_cfctl%l_mppout
+         WRITE(numout,*) '                              sn_cfctl%l_mpptop  = ', sn_cfctl%l_mpptop
+         WRITE(numout,*) '                              sn_cfctl%l_icbstat = ', sn_cfctl%l_icbstat
+         WRITE(numout,*) '                              sn_cfctl%procmin   = ', sn_cfctl%procmin  
+         WRITE(numout,*) '                              sn_cfctl%procmax   = ', sn_cfctl%procmax  
+         WRITE(numout,*) '                              sn_cfctl%procincr  = ', sn_cfctl%procincr 
          WRITE(numout,*) '      level of print                  nn_print   = ', nn_print
          WRITE(numout,*) '      Start i indice for SUM control  nn_ictls   = ', nn_ictls
          WRITE(numout,*) '      End i indice for SUM control    nn_ictle   = ', nn_ictle
@@ -635,6 +658,35 @@
       !
    END SUBROUTINE nemo_alloc

+   SUBROUTINE nemo_set_cfctl(sn_cfctl, setto, for_all )
+      !!----------------------------------------------------------------------
+      !!                     ***  ROUTINE nemo_set_cfctl  ***
+      !!
+      !! ** Purpose :   Set elements of the output control structure to setto.
+      !!                for_all should be .false. unless all areas are to be
+      !!                treated identically.
+      !!
+      !! ** Method  :   Note this routine can be used to switch on/off some
+      !!                types of output for selected areas but any output types
+      !!                that involve global communications (e.g. mpp_max, glob_sum)
+      !!                should be protected from selective switching by the
+      !!                for_all argument
+      !!----------------------------------------------------------------------
+      LOGICAL :: setto, for_all
+      TYPE (sn_ctl) :: sn_cfctl
+      !!----------------------------------------------------------------------
+      IF( for_all ) THEN
+         sn_cfctl%l_runstat = setto
+         sn_cfctl%l_trcstat = setto
+      ENDIF
+      sn_cfctl%l_oceout  = setto
+      sn_cfctl%l_layout  = setto
+      sn_cfctl%l_EMPave  = setto
+      sn_cfctl%l_mppout  = setto
+      sn_cfctl%l_mpptop  = setto
+      sn_cfctl%l_icbstat = setto
+   END SUBROUTINE nemo_set_cfctl
+
    !!======================================================================
 END MODULE nemogcm

Eventually, setting sn_cfctl%l_config true will force ln_ctl to be false but the current implementation is incomplete and can only be used the activate run.stat, tracer.stat and multiple layout.dats independently of ln_ctl. ln_ctl, therefore, remains as the overriding control for all outputs.

3. OCE/stpctl.F90

The logic changes required in stpctl.F90 to control the production of run.stat are relatively simple. Two new local logicals are introduced to avoid having to evaluate logical constructs repeatedly. These determine whether or not to collect the global maximums (lcolruns) and whether or not to handle the actual writing (lwrtruns).

 
Index: OCE/stpctl.F90
===================================================================
--- OCE/stpctl.F90      (revision 10459)
+++ OCE/stpctl.F90      (working copy)
@@ -33,7 +33,7 @@
    PUBLIC stp_ctl           ! routine called by step.F90

    INTEGER  ::   idrun, idtime, idssh, idu, ids1, ids2, idt1, idt2, idc1, idw1, istatus
-   LOGICAL  ::   lsomeoce
+   LOGICAL  ::   lsomeoce, lcolruns, lwrtruns
    !!----------------------------------------------------------------------
    !! NEMO/OCE 4.0 , NEMO Consortium (2018)
    !! $Id$
@@ -69,6 +69,8 @@
       CHARACTER(len=20) :: clname
       !!----------------------------------------------------------------------
       !
+      IF( kt == nit000 )   lcolruns = ln_ctl .OR. ( sn_cfctl%l_config .AND. sn_cfctl%l_runstat )
+      IF( kt == nit000 )   lwrtruns = lcolruns .AND. lwm
       IF( kt == nit000 .AND. lwp ) THEN
          WRITE(numout,*) 
          WRITE(numout,*) 'stp_ctl : time-stepping control'
@@ -76,7 +78,7 @@
          !                                ! open time.step file
          IF( lwm ) CALL ctl_opn( numstp, 'time.step', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea )
          !                                ! open run.stat file
-         IF( ln_ctl .AND. lwm ) THEN
+         IF( lwrtruns ) THEN
             CALL ctl_opn( numrun, 'run.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea )
             clname = 'run.stat.nc' 
             IF( .NOT. Agrif_Root() )   clname = TRIM(Agrif_CFixed())//"_"//TRIM(clname)
@@ -120,12 +122,12 @@
          zmax(9) = MAXVAL(   Cu_adv(:,:,:)   , mask = tmask(:,:,:) == 1._wp ) !       cell Courant no. max
       ENDIF
       !
-      IF( ln_ctl ) THEN
+      IF( lcolruns ) THEN
          CALL mpp_max( "stpctl", zmax )          ! max over the global domain
          nstop = NINT( zmax(7) )                 ! nstop indicator sheared among all local domains
       ENDIF
       !                                   !==  run statistics  ==!   ("run.stat" files)
-      IF( ln_ctl .AND. lwm ) THEN
+      IF( lwrtruns ) THEN
          WRITE(numrun,9500) kt, zmax(1), zmax(2), -zmax(3), zmax(4)
          istatus = NF90_PUT_VAR( idrun, idssh, (/ zmax(1)/), (/kt/), (/1/) )
          istatus = NF90_PUT_VAR( idrun,   idu, (/ zmax(2)/), (/kt/), (/1/) )

4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90

Changes to control the production of tracer.stat follow similar lines with the introduction of a lltrcstat local logical. Note also changes ti prtctl_trc.F90 to make mpp.top.output filenames compatible with other similar filenames (i.e. use I4.4 for area number).

Index: TOP/prtctl_trc.F90
===================================================================
--- TOP/prtctl_trc.F90  (revision 10459)
+++ TOP/prtctl_trc.F90  (working copy)
@@ -209,7 +209,7 @@
       IF( lk_mpp ) THEN
          sind = narea
          eind = narea
-         clb_name = "('mpp.top.output_',I3.3)"
+         clb_name = "('mpp.top.output_',I4.4)"
          cl_run = 'MULTI processor run'
          ! use indices for each area computed by mpp_init subroutine
          nlditl(1:jpnij) = nldit(:)
@@ -228,7 +228,7 @@
       ELSE
          sind = 1
          eind = ijsplt
-         clb_name = "('mono.top.output_',I3.3)"
+         clb_name = "('mono.top.output_',I4.4)"
          cl_run   = 'MONO processor run '
          ! compute indices for each area as done in mpp_init subroutine
          CALL sub_dom

Index: TOP/trcini.F90
===================================================================
--- TOP/trcini.F90      (revision 10459)
+++ TOP/trcini.F90      (working copy)
@@ -71,7 +71,9 @@
       CALL trc_ini_trp   ! passive tracers transport
       CALL trc_ice_ini   ! Tracers in sea ice
       !
-      IF(lwm) CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea )
+      IF( lwm .AND. ( ln_ctl .OR. (sn_cfctl%l_config .AND. sn_cfctl%l_trcstat) ) ) THEN
+         CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea )
+      ENDIF
       !
       CALL trc_ini_state  !  passive tracers initialisation : from a restart or from clim
       IF( nn_dttrc /= 1 ) &

Index: TOP/trcstp.F90
===================================================================
--- TOP/trcstp.F90      (revision 10459)
+++ TOP/trcstp.F90      (working copy)
@@ -31,6 +31,7 @@
    PUBLIC   trc_stp    ! called by step

    LOGICAL  ::   llnew                   ! ???
+   LOGICAL  ::   lltrcstat               ! ???
    REAL(wp) ::   rdt_sampl               ! ???
    INTEGER  ::   nb_rec_per_day, ktdcy   ! ???
    REAL(wp) ::   rsecfst, rseclast       ! ???
@@ -67,6 +68,7 @@
          r2dttrc = 2. * rdttrc       ! = 2 rdttrc (leapfrog)
       ENDIF
       !
+      IF( kt == nittrc000 )  lltrcstat = ln_ctl .OR. (sn_cfctl%l_config .AND. sn_cfctl%l_trcstat)
       IF( kt == nittrc000 .AND. lk_trdmxl_trc )  CALL trd_mxl_trc_init    ! trends: Mixed-layer
       !
       IF( .NOT.ln_linssh ) THEN                                           ! update ocean volume due to ssh temporal evolution
@@ -108,7 +110,7 @@
          !
       ENDIF
       !
-      IF (ln_ctl ) THEN
+      IF (lltrcstat) THEN
          ztrai = 0._wp                                                   !  content of all tracers
          DO jn = 1, jptra
             ztrai = ztrai + glob_sum( 'trcstp', trn(:,:,:,jn) * cvol(:,:,:)   )

5. OCE/LBC/mppini.F90

Control over the creation of multiple layout.dat files is easy to implement and so has been done as an example of how output in this category will be handled in future. For this category, the area subsetting can be used to restrict which areas produce files. The standard layout.dat file, produced by narea = 1 is always produced.

Index: OCE/LBC/mppini.F90
===================================================================
--- OCE/LBC/mppini.F90  (revision 10459)
+++ OCE/LBC/mppini.F90  (working copy)
@@ -150,6 +150,7 @@
       INTEGER ::   ierr, ios                  !
       INTEGER ::   inbi, inbj, iimax,  ijmax, icnt1, icnt2
       LOGICAL ::   llbest
+      LOGICAL ::   llwrtlay
       INTEGER, ALLOCATABLE, DIMENSION(:)     ::   iin, ii_nono, ii_noea          ! 1D workspace
       INTEGER, ALLOCATABLE, DIMENSION(:)     ::   ijn, ii_noso, ii_nowe          !  -     -
       INTEGER, ALLOCATABLE, DIMENSION(:,:) ::   iimppt, ilci, ibondi, ipproc   ! 2D workspace
@@ -166,6 +167,7 @@
            &             ln_vol, nn_volctl, nn_rimwidth, nb_jpk_bdy
       !!----------------------------------------------------------------------

+      llwrtlay = lwp .OR. ln_ctl .OR. ( sn_cfctl%l_config .AND.  sn_cfctl%l_layout )
       ! do we need to take into account bdy_msk?
       REWIND( numnam_ref )              ! Namelist nambdy in reference namelist : BDY
       READ  ( numnam_ref, nambdy, IOSTAT = ios, ERR = 903)
@@ -553,7 +555,7 @@
       END DO
       
       ! Save processor layout in ascii file
-      IF (lwp) THEN
+      IF (llwrtlay) THEN
          CALL ctl_opn( inum, 'layout.dat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, .FALSE., narea )
          WRITE(inum,'(a)') '   jpnij   jpimax  jpjmax    jpk  jpiglo  jpjglo'//& 
    &           ' ( local:    narea     jpi     jpj )'    
@@ -614,6 +616,8 @@ 
             WRITE(numout,*)
             WRITE(numout,*) '   ==>>>   North fold boundary prepared for jpni >1'
             ! additional prints in layout.dat 
+         ENDIF
+         IF (llwrtlay) THEN
             WRITE(inum,*)
             WRITE(inum,*)
             WRITE(inum,*) 'number of subdomains located along the north fold : ', ndim_rank_north
@@ -628,7 +632,7 @@
       !
       IF( ln_nnogather ) THEN
          CALL mpp_init_nfdcom     ! northfold neighbour lists
-         IF (lwp) THEN
+         IF (llwrtlay) THEN
             WRITE(inum,*)
             WRITE(inum,*)
             WRITE(inum,*) 'north fold exchanges with explicit point-to-point messaging :'
@@ -639,7 +643,7 @@
          ENDIF
       ENDIF
       !
-      IF (lwp) CLOSE(inum)
+      IF (llwrtlay) CLOSE(inum)
       !
       DEALLOCATE(iin, ijn, ii_nono, ii_noea, ii_noso, ii_nowe,    &
          &       iimppt, ijmppt, ibondi, ibondj, ipproc, ipolj,   &

6. sette/sette.sh

Finally, once implemented, these cahnges allow SETTE tests to be run without creating the full set of multiple output files. It is simply be a case of replacing all ln_ctl settings like so:

290c291,294
<     set_namelist namelist_cfg ln_ctl .true.
---
>     set_namelist namelist_cfg ln_ctl .false.
>     set_namelist namelist_cfg sn_cfctl%l_config .true.
>     set_namelist namelist_cfg sn_cfctl%l_runstat .true.
>     set_namelist namelist_cfg sn_cfctl%l_trcstat .true.

Tests

Error: Failed to load processor box
No macro or processor named 'box' found

Review

Error: Failed to load processor box
No macro or processor named 'box' found