New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2019WP/ENHANCE-04_AndrewC-reporting – NEMO
wiki:2019WP/ENHANCE-04_AndrewC-reporting

Version 4 (modified by nicolasmartin, 6 years ago) (diff)

Improve diff rendering with #!diff wikiprocessor

ENHANCE-04_AndrewC-reporting

Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Tests
  4. Review

Summary

Investigate ways of improving the code's reporting facilities. Currently, errors away from the lead process are not fully reported and the ln_ctl mechanism produces an overwhelming volume of output for large processor counts. Options are needed to produce more selective output (either by output type or processor range). This is an un-started task from the 2018WP (formerly ROBUST-06_AndrewC-reporting) that has been carried forward to 2019. #2167

Preview

Error: Failed to load processor box
No macro or processor named 'box' found

One of the features of NEMO 4.0 is the large reduction in global communications at the cost of suppressing some global diagnostics which tend to be useful only during SETTE testing, debugging or new configuration development. Such diagnostics can be re-activated using the ln_ctl namelist variable but this is rather a blunt instrument in that it activates all extra output for all processing elements. This list ican include:

run.stat
run.stat.nc
tracer.stat
ocean.output_XXXX
layout.dat_XXXX
mpp.output_XXXX
mpp.top.output_XXX     <---- correct this to 4-digits for consistency
EMPave.dat_XXXX
icebergs.stat_XXXX

and possibly others depending on runtime options. One of the 2019 NEMO workplan entries (ENHANCE-04_AndrewC-reporting) plans to introduce more control over the choice of output created but it may be best to introduce the basic structure prior to the v4.0 release. Below is a summary of the overall plans and what can be implemented immediately if there is consensus.

At this stage, the plan is to leave ln_ctl as an all-or-nothing switch but to add the basic structure for finer control. For the v4.0 release, this new structure will only be capable of activating the global stats files (run.stat, run.stat.nc and tracer.stat) independently of ln_ctl but the placeholders will be there for extended control over other outputs. Minimal changes to achieve this will affect:

1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref
2. OCE/nemogcm.F90 OFF/nemogcm.F90 SAO/nemogcm.F90 SAS/nemogcm.F90
3. OCE/stpctl.F90
4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90
5. OCE/LBC/mppini.F90
6. sette/sette.sh

1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref

These changes introduce a derived-type structure into the in_out_manager module and populates the reference namelist with default values.

There are 5 basic sets of parameters in this structure:

* l_config                        Activates use of the settings in the rest of the structure (ln_ctl is ignored if this is true)
* l_runstat, l_trcstat            Activates production of global stats files. Only a single file of each of these is ever produced.
* l_oceout, l_layout, l_EMPave    Normal operation is to produce a single version of each of these. If true then a version for each area is produced
* l_mppout, l_mpptrc, l_icbstat   Suppressed if false, otherwise produce a version for each area.
* procmin, procmax, procincr      Allow subsetting of areas when producing output in the previous two categories. Default values will ensure all areas report.
  • OCE/IOM/in_out_manager.F90

     
    9999   !!                    output monitoring 
    100100   !!---------------------------------------------------------------------- 
    101101   LOGICAL ::   ln_ctl           !: run control for debugging 
     102   TYPE :: sn_ctl                !: optional use structure for finer control over output selection 
     103      LOGICAL :: l_config  = .FALSE.  !: activate/deactivate finer control 
     104                                      !  Note if l_config is True then ln_ctl is ignored. 
     105                                      !  Otherwise setting ln_ctl True is equivalent to setting 
     106                                      !  all the following logicals in this structure True 
     107      LOGICAL :: l_runstat = .FALSE.  !: Produce/do not produce run.stat file (T/F) 
     108      LOGICAL :: l_trcstat = .FALSE.  !: Produce/do not produce tracer.stat file (T/F) 
     109      LOGICAL :: l_oceout  = .FALSE.  !: Produce all ocean.outputs    (T) or just one (F) 
     110      LOGICAL :: l_layout  = .FALSE.  !: Produce all layout.dat files (T) or just one (F) 
     111      LOGICAL :: l_EMPave  = .FALSE.  !: Produce all EMPave.dat files (T) or just one (F) (if active) 
     112      LOGICAL :: l_mppout  = .FALSE.  !: Produce/do not produce mpp.output_XXXX files (T/F) 
     113      LOGICAL :: l_mpptop  = .FALSE.  !: Produce/do not produce mpp.top.output_XXXX files (T/F) 
     114      LOGICAL :: l_icbstat = .FALSE.  !: Produce/do not produce icebergs.stat_XXXX files (T/F) 
     115                                      !  Optional subsetting of processor report files 
     116                                      !  Default settings of 0/1000000/1 should ensure all areas report. 
     117                                      !  Set to a more restrictive range to select specific areas 
     118      INTEGER :: procmin   = 0        !: Minimum narea to output 
     119      INTEGER :: procmax   = 1000000  !: Maximum narea to output 
     120      INTEGER :: procincr  = 1        !: narea increment to output 
     121   END TYPE 
     122   TYPE (sn_ctl) :: sn_cfctl     !: run control structure for selective output 
    102123   LOGICAL ::   ln_timing        !: run control for timing 
    103124   LOGICAL ::   ln_diacfl        !: flag whether to create CFL diagnostics 
    104125   INTEGER ::   nn_print         !: level of print (0 no print) 
  • cfgs/SHARED/namelist_ref

     
    13011301!----------------------------------------------------------------------- 
    13021302&namctl        !   Control prints                                       (default: OFF) 
    13031303!----------------------------------------------------------------------- 
    1304    ln_ctl      = .false.   !  trends control print (expensive!) 
     1304   ln_ctl = .FALSE.                 ! Toggle all report printing on/off (T/F); Ignored if sn_cfctl%l_config is T 
     1305     sn_cfctl%l_config = .TRUE.     ! IF .true. then control which reports are written with the following 
     1306       sn_cfctl%l_runstat = .FALSE. ! switches and which areas produce reports with the proc integer settings. 
     1307       sn_cfctl%l_trcstat = .FALSE. ! The default settings for the proc integers should ensure 
     1308       sn_cfctl%l_oceout  = .FALSE. ! that  all areas report. 
     1309       sn_cfctl%l_layout  = .FALSE. ! 
     1310       sn_cfctl%l_EMPave  = .FALSE. ! 
     1311       sn_cfctl%l_mppout  = .FALSE. ! 
     1312       sn_cfctl%l_mpptop  = .FALSE. ! 
     1313       sn_cfctl%l_icbstat = .FALSE. ! 
     1314       sn_cfctl%procmin   = 0       ! Minimum area number for reporting [default:0] 
     1315       sn_cfctl%procmax   = 1000000 ! Maximum area number for reporting [default:1000000] 
     1316       sn_cfctl%procincr  = 1       ! Increment for optional subsetting of areas [default:1] 
    13051317   nn_print    =    0      !  level of print (0 no extra print) 
    13061318   nn_ictls    =    0      !  start i indice of control sum (use to compare mono versus 
    13071319   nn_ictle    =    0      !  end   i indice of control sum        multi processor runs 

The new structure is read and reported in nemogcm.F90. Changes for all variants are similar OCE/nemogcm.F90 is shown here:

  • OCE/nemogcm.F90

     
    256256      INTEGER  ::   ios, ilocal_comm   ! local integers 
    257257      CHARACTER(len=120), DIMENSION(60) ::   cltxt, cltxt2, clnam 
    258258      !! 
    259       NAMELIST/namctl/ ln_ctl   , nn_print, nn_ictls, nn_ictle,   & 
    260          &             nn_isplt , nn_jsplt, nn_jctls, nn_jctle,   & 
     259      NAMELIST/namctl/ ln_ctl   , sn_cfctl, nn_print, nn_ictls, nn_ictle,   & 
     260         &             nn_isplt , nn_jsplt, nn_jctls, nn_jctle,             & 
    261261         &             ln_timing, ln_diacfl 
    262262      NAMELIST/namcfg/ ln_read_cfg, cn_domcfg, ln_closea, ln_write_cfg, cn_domcfg_out, ln_use_jattr 
    263263      !!---------------------------------------------------------------------- 
     
    327327 
    328328      narea = narea + 1                                     ! mynode return the rank of proc (0 --> jpnij -1 ) 
    329329 
     330      IF( sn_cfctl%l_config ) THEN 
     331         ! Activate finer control of report outputs 
     332         ! optionally switch off output from selected areas (note this only 
     333         ! applies to output which does not involve global communications) 
     334         IF( ( narea < sn_cfctl%procmin .OR. narea > sn_cfctl%procmax  ) .OR. & 
     335           & ( MOD( narea - sn_cfctl%procmin, sn_cfctl%procincr ) /= 0 ) )    & 
     336           &   CALL nemo_set_cfctl( sn_cfctl, .FALSE., .FALSE. ) 
     337      ELSE 
     338         CALL nemo_set_cfctl( sn_cfctl, ln_ctl, .TRUE. ) 
     339      ENDIF 
     340 
    330341      lwm = (narea == 1)                                    ! control of output namelists 
    331342      lwp = (narea == 1) .OR. ln_ctl                        ! control of all listing output print 
    332343 
     
    489500         WRITE(numout,*) '~~~~~~~~' 
    490501         WRITE(numout,*) '   Namelist namctl' 
    491502         WRITE(numout,*) '      run control (for debugging)     ln_ctl     = ', ln_ctl 
     503         WRITE(numout,*) '       finer control over o/p sn_cfctl%l_config  = ', sn_cfctl%l_config 
     504         WRITE(numout,*) '                              sn_cfctl%l_runstat = ', sn_cfctl%l_runstat 
     505         WRITE(numout,*) '                              sn_cfctl%l_trcstat = ', sn_cfctl%l_trcstat 
     506         WRITE(numout,*) '                              sn_cfctl%l_oceout  = ', sn_cfctl%l_oceout 
     507         WRITE(numout,*) '                              sn_cfctl%l_layout  = ', sn_cfctl%l_layout 
     508         WRITE(numout,*) '                              sn_cfctl%l_EMPave  = ', sn_cfctl%l_EMPave 
     509         WRITE(numout,*) '                              sn_cfctl%l_mppout  = ', sn_cfctl%l_mppout 
     510         WRITE(numout,*) '                              sn_cfctl%l_mpptop  = ', sn_cfctl%l_mpptop 
     511         WRITE(numout,*) '                              sn_cfctl%l_icbstat = ', sn_cfctl%l_icbstat 
     512         WRITE(numout,*) '                              sn_cfctl%procmin   = ', sn_cfctl%procmin   
     513         WRITE(numout,*) '                              sn_cfctl%procmax   = ', sn_cfctl%procmax   
     514         WRITE(numout,*) '                              sn_cfctl%procincr  = ', sn_cfctl%procincr  
    492515         WRITE(numout,*) '      level of print                  nn_print   = ', nn_print 
    493516         WRITE(numout,*) '      Start i indice for SUM control  nn_ictls   = ', nn_ictls 
    494517         WRITE(numout,*) '      End i indice for SUM control    nn_ictle   = ', nn_ictle 
     
    635658      ! 
    636659   END SUBROUTINE nemo_alloc 
    637660 
     661   SUBROUTINE nemo_set_cfctl(sn_cfctl, setto, for_all ) 
     662      !!---------------------------------------------------------------------- 
     663      !!                     ***  ROUTINE nemo_set_cfctl  *** 
     664      !! 
     665      !! ** Purpose :   Set elements of the output control structure to setto. 
     666      !!                for_all should be .false. unless all areas are to be 
     667      !!                treated identically. 
     668      !! 
     669      !! ** Method  :   Note this routine can be used to switch on/off some 
     670      !!                types of output for selected areas but any output types 
     671      !!                that involve global communications (e.g. mpp_max, glob_sum) 
     672      !!                should be protected from selective switching by the 
     673      !!                for_all argument 
     674      !!---------------------------------------------------------------------- 
     675      LOGICAL :: setto, for_all 
     676      TYPE (sn_ctl) :: sn_cfctl 
     677      !!---------------------------------------------------------------------- 
     678      IF( for_all ) THEN 
     679         sn_cfctl%l_runstat = setto 
     680         sn_cfctl%l_trcstat = setto 
     681      ENDIF 
     682      sn_cfctl%l_oceout  = setto 
     683      sn_cfctl%l_layout  = setto 
     684      sn_cfctl%l_EMPave  = setto 
     685      sn_cfctl%l_mppout  = setto 
     686      sn_cfctl%l_mpptop  = setto 
     687      sn_cfctl%l_icbstat = setto 
     688   END SUBROUTINE nemo_set_cfctl 
     689 
    638690   !!====================================================================== 
    639691END MODULE nemogcm 

Eventually, setting sn_cfctl%l_config true will force ln_ctl to be false but the current implementation is incomplete and can only be used the activate run.stat, tracer.stat and multiple layout.dats independently of ln_ctl. ln_ctl, therefore, remains as the overriding control for all outputs.

3. OCE/stpctl.F90

The logic changes required in stpctl.F90 to control the production of run.stat are relatively simple. Two new local logicals are introduced to avoid having to evaluate logical constructs repeatedly. These determine whether or not to collect the global maximums (lcolruns) and whether or not to handle the actual writing (lwrtruns).

  • OCE/stpctl.F90

     
     
    3333   PUBLIC stp_ctl           ! routine called by step.F90 
    3434 
    3535   INTEGER  ::   idrun, idtime, idssh, idu, ids1, ids2, idt1, idt2, idc1, idw1, istatus 
    36    LOGICAL  ::   lsomeoce 
     36   LOGICAL  ::   lsomeoce, lcolruns, lwrtruns 
    3737   !!---------------------------------------------------------------------- 
    3838   !! NEMO/OCE 4.0 , NEMO Consortium (2018) 
    3939   !! $Id$ 
     
    6969      CHARACTER(len=20) :: clname 
    7070      !!---------------------------------------------------------------------- 
    7171      ! 
     72      IF( kt == nit000 )   lcolruns = ln_ctl .OR. ( sn_cfctl%l_config .AND. sn_cfctl%l_runstat ) 
     73      IF( kt == nit000 )   lwrtruns = lcolruns .AND. lwm 
    7274      IF( kt == nit000 .AND. lwp ) THEN 
    7375         WRITE(numout,*)  
    7476         WRITE(numout,*) 'stp_ctl : time-stepping control' 
     
    7678         !                                ! open time.step file 
    7779         IF( lwm ) CALL ctl_opn( numstp, 'time.step', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 
    7880         !                                ! open run.stat file 
    79          IF( ln_ctl .AND. lwm ) THEN 
     81         IF( lwrtruns ) THEN 
    8082            CALL ctl_opn( numrun, 'run.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 
    8183            clname = 'run.stat.nc'  
    8284            IF( .NOT. Agrif_Root() )   clname = TRIM(Agrif_CFixed())//"_"//TRIM(clname) 
     
    120122         zmax(9) = MAXVAL(   Cu_adv(:,:,:)   , mask = tmask(:,:,:) == 1._wp ) !       cell Courant no. max 
    121123      ENDIF 
    122124      ! 
    123       IF( ln_ctl ) THEN 
     125      IF( lcolruns ) THEN 
    124126         CALL mpp_max( "stpctl", zmax )          ! max over the global domain 
    125127         nstop = NINT( zmax(7) )                 ! nstop indicator sheared among all local domains 
    126128      ENDIF 
    127129      !                                   !==  run statistics  ==!   ("run.stat" files) 
    128       IF( ln_ctl .AND. lwm ) THEN 
     130      IF( lwrtruns ) THEN 
    129131         WRITE(numrun,9500) kt, zmax(1), zmax(2), -zmax(3), zmax(4) 
    130132         istatus = NF90_PUT_VAR( idrun, idssh, (/ zmax(1)/), (/kt/), (/1/) ) 
    131133         istatus = NF90_PUT_VAR( idrun,   idu, (/ zmax(2)/), (/kt/), (/1/) ) 

4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90

Changes to control the production of tracer.stat follow similar lines with the introduction of a lltrcstat local logical. Note also changes ti prtctl_trc.F90 to make mpp.top.output filenames compatible with other similar filenames (i.e. use I4.4 for area number).

  • TOP/prtctl_trc.F90

     
    209209      IF( lk_mpp ) THEN 
    210210         sind = narea 
    211211         eind = narea 
    212          clb_name = "('mpp.top.output_',I3.3)" 
     212         clb_name = "('mpp.top.output_',I4.4)" 
    213213         cl_run = 'MULTI processor run' 
    214214         ! use indices for each area computed by mpp_init subroutine 
    215215         nlditl(1:jpnij) = nldit(:) 
     
    228228      ELSE 
    229229         sind = 1 
    230230         eind = ijsplt 
    231          clb_name = "('mono.top.output_',I3.3)" 
     231         clb_name = "('mono.top.output_',I4.4)" 
    232232         cl_run   = 'MONO processor run ' 
    233233         ! compute indices for each area as done in mpp_init subroutine 
    234234         CALL sub_dom 
  • TOP/trcini.F90

     
    7171      CALL trc_ini_trp   ! passive tracers transport 
    7272      CALL trc_ice_ini   ! Tracers in sea ice 
    7373      ! 
    74       IF(lwm) CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea ) 
     74      IF( lwm .AND. ( ln_ctl .OR. (sn_cfctl%l_config .AND. sn_cfctl%l_trcstat) ) ) THEN 
     75         CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea ) 
     76      ENDIF 
    7577      ! 
    7678      CALL trc_ini_state  !  passive tracers initialisation : from a restart or from clim 
    7779      IF( nn_dttrc /= 1 ) & 
  • TOP/trcstp.F90

     
    3131   PUBLIC   trc_stp    ! called by step 
    3232 
    3333   LOGICAL  ::   llnew                   ! ??? 
     34   LOGICAL  ::   lltrcstat               ! ??? 
    3435   REAL(wp) ::   rdt_sampl               ! ??? 
    3536   INTEGER  ::   nb_rec_per_day, ktdcy   ! ??? 
    3637   REAL(wp) ::   rsecfst, rseclast       ! ??? 
     
    6768         r2dttrc = 2. * rdttrc       ! = 2 rdttrc (leapfrog) 
    6869      ENDIF 
    6970      ! 
     71      IF( kt == nittrc000 )  lltrcstat = ln_ctl .OR. (sn_cfctl%l_config .AND. sn_cfctl%l_trcstat) 
    7072      IF( kt == nittrc000 .AND. lk_trdmxl_trc )  CALL trd_mxl_trc_init    ! trends: Mixed-layer 
    7173      ! 
    7274      IF( .NOT.ln_linssh ) THEN                                           ! update ocean volume due to ssh temporal evolution 
     
    108110         ! 
    109111      ENDIF 
    110112      ! 
    111       IF (ln_ctl ) THEN 
     113      IF (lltrcstat) THEN 
    112114         ztrai = 0._wp                                                   !  content of all tracers 
    113115         DO jn = 1, jptra 
    114116            ztrai = ztrai + glob_sum( 'trcstp', trn(:,:,:,jn) * cvol(:,:,:)   ) 

5. OCE/LBC/mppini.F90

Control over the creation of multiple layout.dat files is easy to implement and so has been done as an example of how output in this category will be handled in future. For this category, the area subsetting can be used to restrict which areas produce files. The standard layout.dat file, produced by narea = 1 is always produced.

  • OCE/LBC/mppini.F90

     
    150150      INTEGER ::   ierr, ios                  ! 
    151151      INTEGER ::   inbi, inbj, iimax,  ijmax, icnt1, icnt2 
    152152      LOGICAL ::   llbest 
     153      LOGICAL ::   llwrtlay 
    153154      INTEGER, ALLOCATABLE, DIMENSION(:)     ::   iin, ii_nono, ii_noea          ! 1D workspace 
    154155      INTEGER, ALLOCATABLE, DIMENSION(:)     ::   ijn, ii_noso, ii_nowe          !  -     - 
    155156      INTEGER, ALLOCATABLE, DIMENSION(:,:) ::   iimppt, ilci, ibondi, ipproc   ! 2D workspace 
     
    166167           &             ln_vol, nn_volctl, nn_rimwidth, nb_jpk_bdy 
    167168      !!---------------------------------------------------------------------- 
    168169 
     170      llwrtlay = lwp .OR. ln_ctl .OR. ( sn_cfctl%l_config .AND.  sn_cfctl%l_layout ) 
    169171      ! do we need to take into account bdy_msk? 
    170172      REWIND( numnam_ref )              ! Namelist nambdy in reference namelist : BDY 
    171173      READ  ( numnam_ref, nambdy, IOSTAT = ios, ERR = 903) 
     
    553555      END DO 
    554556       
    555557      ! Save processor layout in ascii file 
    556       IF (lwp) THEN 
     558      IF (llwrtlay) THEN 
    557559         CALL ctl_opn( inum, 'layout.dat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, .FALSE., narea ) 
    558560         WRITE(inum,'(a)') '   jpnij   jpimax  jpjmax    jpk  jpiglo  jpjglo'//&  
    559561   &           ' ( local:    narea     jpi     jpj )'     
     
    614616            WRITE(numout,*) 
    615617            WRITE(numout,*) '   ==>>>   North fold boundary prepared for jpni >1' 
    616618            ! additional prints in layout.dat  
     619         ENDIF 
     620         IF (llwrtlay) THEN 
    617621            WRITE(inum,*) 
    618622            WRITE(inum,*) 
    619623            WRITE(inum,*) 'number of subdomains located along the north fold : ', ndim_rank_north 
     
    628632      ! 
    629633      IF( ln_nnogather ) THEN 
    630634         CALL mpp_init_nfdcom     ! northfold neighbour lists 
    631          IF (lwp) THEN 
     635         IF (llwrtlay) THEN 
    632636            WRITE(inum,*) 
    633637            WRITE(inum,*) 
    634638            WRITE(inum,*) 'north fold exchanges with explicit point-to-point messaging :' 
     
    639643         ENDIF 
    640644      ENDIF 
    641645      ! 
    642       IF (lwp) CLOSE(inum) 
     646      IF (llwrtlay) CLOSE(inum) 
    643647      ! 
    644648      DEALLOCATE(iin, ijn, ii_nono, ii_noea, ii_noso, ii_nowe,    & 
    645649         &       iimppt, ijmppt, ibondi, ibondj, ipproc, ipolj,   & 

6. sette/sette.sh

Finally, once implemented, these changes allow SETTE tests to be run without creating the full set of multiple output files. It is simply a case of replacing all ln_ctl settings like so:

290c291,294
<     set_namelist namelist_cfg ln_ctl .true.
---
>     set_namelist namelist_cfg ln_ctl .false.
>     set_namelist namelist_cfg sn_cfctl%l_config .true.
>     set_namelist namelist_cfg sn_cfctl%l_runstat .true.
>     set_namelist namelist_cfg sn_cfctl%l_trcstat .true.

Tests

Error: Failed to load processor box
No macro or processor named 'box' found

Review

Error: Failed to load processor box
No macro or processor named 'box' found