Version 9 (modified by acc, 18 months ago) (diff)

ENHANCE-04_AndrewC-reporting

Last edition: 01/17/19 13:32:32 by acc

The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.

  1. Summary
  2. Preview
  3. Sebastien's comments on the first draft, many of which have already been incorporated into version2 above:
  4. Tests
  5. Review

Summary

Investigate ways of improving the code's reporting facilities. Currently, errors away from the lead process are not fully reported and the ln_ctl mechanism produces an overwhelming volume of output for large processor counts. Options are needed to produce more selective output (either by output type or processor range). This is an un-started task from the 2018WP (formerly ROBUST-06_AndrewC-reporting) that has been carried forward to 2019. #2167

Preview

(version 3 following initial preview by Sebastien ( see his comments below) and tidying up of the existing nn_verbose_level settings to control the icebergs.stat files)

Since the preview step must be completed before the PI starts the coding, the previewer(s) answers are expected to be completed within the two weeks after the PI has sent the request to the previewer(s).
Then an iterative process should take place between PI and previewer(s) in order to find a consensus

Possible bottlenecks:

  • the methodology
  • the flowchart and list of routines to be changed
  • the new list of variables wrt coding rules
  • the summary of updates in literature

Once an agreement has been reached, preview is ended and the PI can start the development into his branch.

One of the features of NEMO 4.0 is the large reduction in global communications at the cost of suppressing some global diagnostics which tend to be useful only during SETTE testing, debugging or new configuration development. Such diagnostics can be re-activated using the ln_ctl namelist variable but this is a rather blunt instrument in that it activates all extra output for all processing elements. This list can include:

run.stat
run.stat.nc
tracer.stat
ocean.output_XXXX
layout.dat_XXXX
mpp.output_XXXX
mpp.top.output_XXX     <---- correct this to 4-digits for consistency
EMPave.dat_XXXX        <---- These are all identical, suppress writing to write-master only
icebergs.stat_XXXX     <---- These should already be controlled by the nn_verbose_level variable; tidy up 
                             and ensure silent running with nn_verbose_level=0

and possibly others depending on runtime options. One of the 2019 NEMO workplan entries (ENHANCE-04_AndrewC-reporting) plans to introduce more control over the choice of output created but it may be best to introduce the basic structure prior to the v4.0 release. Below is a summary of the overall plans and what can be implemented immediately if there is consensus.

At this stage, the plan is to leave ln_ctl as an all-or-nothing switch but to add the basic structure for finer control. For the v4.0 release, this new structure will only be capable of activating the global stats files (run.stat, run.stat.nc and tracer.stat) independently of ln_ctl but the placeholders will be there for extended control over other outputs. Minimal changes to achieve this will affect:

1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref
2. OCE/nemogcm.F90 OFF/nemogcm.F90 SAO/nemogcm.F90 SAS/nemogcm.F90
3. OCE/stpctl.F90
4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90
5. OCE/LBC/mppini.F90  OCE/SBC/sbcfwb.F90
6. Tidy up icb modules so that the existing nn_verbose_level parameter can be used to turn off icebergs.stat files
7. sette/sette.sh

To reiterate, these changes allow run.stat and tracer.stat to be produced even when ln_ctl is .false.. They also introduce additional controls such as updating run.stat, tracer.stat and time.step at integer multiples of time step rather than every time step and for restricting the production of some types of output (e.g. layout.dat) to a subset of processing regions. There are good arguments for separating out the multiple uses that ln_ctl has grown to support (see Sebastien's comments on its history) but I propose these changes , as a quick, pre-release addition.

1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref

These changes introduce a derived-type structure into the in_out_manager module and populates the reference namelist with default values.

There are 5 basic sets of parameters in this structure:

* l_config                        Activates use of the settings in the rest of the structure (specifically for when ln_ctl is false)
* l_runstat, l_trcstat            Activates production of global stats files. Only a single file of each of these is ever produced.
* l_oceout, l_layout              Normal operation is to produce a single version of each of these. If true then a version for each area is produced
* l_mppout, l_mpptrc              Suppressed if false, otherwise produce a version for each area.
* procmin, procmax, procincr      Allow subsetting of areas when producing output in the previous two categories. Default values will ensure all areas report.
* ptimincr                        Timestep increment for outputting of time step status information less frequently (affects run.stat, tracer.stat and time.step only)
  • OCE/IOM/in_out_manager.F90

     
    9999   !!                    output monitoring 
    100100   !!---------------------------------------------------------------------- 
    101101   LOGICAL ::   ln_ctl           !: run control for debugging 
     102   TYPE :: sn_ctl                !: optional use structure for finer control over output selection 
     103      LOGICAL :: l_config  = .FALSE.  !: activate/deactivate finer control 
     104                                      !  Note if l_config is True then ln_ctl is ignored. 
     105                                      !  Otherwise setting ln_ctl True is equivalent to setting 
     106                                      !  all the following logicals in this structure True 
     107      LOGICAL :: l_runstat = .FALSE.  !: Produce/do not produce run.stat file (T/F) 
     108      LOGICAL :: l_trcstat = .FALSE.  !: Produce/do not produce tracer.stat file (T/F) 
     109      LOGICAL :: l_oceout  = .FALSE.  !: Produce all ocean.outputs    (T) or just one (F) 
     110      LOGICAL :: l_layout  = .FALSE.  !: Produce all layout.dat files (T) or just one (F) 
     111      LOGICAL :: l_mppout  = .FALSE.  !: Produce/do not produce mpp.output_XXXX files (T/F) 
     112      LOGICAL :: l_mpptop  = .FALSE.  !: Produce/do not produce mpp.top.output_XXXX files (T/F) 
     113                                      !  Optional subsetting of processor report files 
     114                                      !  Default settings of 0/1000000/1 should ensure all areas report. 
     115                                      !  Set to a more restrictive range to select specific areas 
     116      INTEGER :: procmin   = 0        !: Minimum narea to output 
     117      INTEGER :: procmax   = 1000000  !: Maximum narea to output 
     118      INTEGER :: procincr  = 1        !: narea increment to output 
     119      INTEGER :: ptimincr  = 1        !: timestep increment to output (time.step and run.stat) 
     120   END TYPE 
     121   TYPE (sn_ctl) :: sn_cfctl     !: run control structure for selective output 
    102122   LOGICAL ::   ln_timing        !: run control for timing 
    103123   LOGICAL ::   ln_diacfl        !: flag whether to create CFL diagnostics 
    104124   INTEGER ::   nn_print         !: level of print (0 no print) 
  • ../cfgs/SHARED/namelist_ref

     
    13031303!----------------------------------------------------------------------- 
    13041304&namctl        !   Control prints                                       (default: OFF) 
    13051305!----------------------------------------------------------------------- 
    1306    ln_ctl      = .false.   !  trends control print (expensive!) 
     1306   ln_ctl = .FALSE.                 ! Toggle all report printing on/off (T/F); Ignored if sn_cfctl%l_config is T 
     1307     sn_cfctl%l_config = .TRUE.     ! IF .true. then control which reports are written with the following 
     1308       sn_cfctl%l_runstat = .FALSE. ! switches and which areas produce reports with the proc integer settings. 
     1309       sn_cfctl%l_trcstat = .FALSE. ! The default settings for the proc integers should ensure 
     1310       sn_cfctl%l_oceout  = .FALSE. ! that  all areas report. 
     1311       sn_cfctl%l_layout  = .FALSE. ! 
     1312       sn_cfctl%l_mppout  = .FALSE. ! 
     1313       sn_cfctl%l_mpptop  = .FALSE. ! 
     1314       sn_cfctl%procmin   = 0       ! Minimum area number for reporting [default:0] 
     1315       sn_cfctl%procmax   = 1000000 ! Maximum area number for reporting [default:1000000] 
     1316       sn_cfctl%procincr  = 1       ! Increment for optional subsetting of areas [default:1] 
     1317       sn_cfctl%ptimincr  = 1       ! Timestep increment for writing time step progress info 
    13071318   nn_print    =    0      !  level of print (0 no extra print) 
    13081319   nn_ictls    =    0      !  start i indice of control sum (use to compare mono versus 
    13091320   nn_ictle    =    0      !  end   i indice of control sum        multi processor runs 

2. OCE/nemogcm.F90, OFF/nemogcm.F90, SAO/nemogcm.F90, SAS/nemogcm.F90

The new structure is read and reported in nemogcm.F90. Changes for all variants are similar OCE/nemogcm.F90 is shown here:

  • OCE/nemogcm.F90

     
    256256      INTEGER  ::   ios, ilocal_comm   ! local integers 
    257257      CHARACTER(len=120), DIMENSION(60) ::   cltxt, cltxt2, clnam 
    258258      !! 
    259       NAMELIST/namctl/ ln_ctl   , nn_print, nn_ictls, nn_ictle,   & 
    260          &             nn_isplt , nn_jsplt, nn_jctls, nn_jctle,   & 
     259      NAMELIST/namctl/ ln_ctl   , sn_cfctl, nn_print, nn_ictls, nn_ictle,   & 
     260         &             nn_isplt , nn_jsplt, nn_jctls, nn_jctle,             & 
    261261         &             ln_timing, ln_diacfl 
    262262      NAMELIST/namcfg/ ln_read_cfg, cn_domcfg, ln_closea, ln_write_cfg, cn_domcfg_out, ln_use_jattr 
    263263      !!---------------------------------------------------------------------- 
     
    327327 
    328328      narea = narea + 1                                     ! mynode return the rank of proc (0 --> jpnij -1 ) 
    329329 
     330      IF( sn_cfctl%l_config ) THEN 
     331         ! Activate finer control of report outputs 
     332         ! optionally switch off output from selected areas (note this only 
     333         ! applies to output which does not involve global communications) 
     334         IF( ( narea < sn_cfctl%procmin .OR. narea > sn_cfctl%procmax  ) .OR. & 
     335           & ( MOD( narea - sn_cfctl%procmin, sn_cfctl%procincr ) /= 0 ) )    & 
     336           &   CALL nemo_set_cfctl( sn_cfctl, .FALSE., .FALSE. ) 
     337      ELSE 
     338         ! Use ln_ctl to turn on or off all options. 
     339         CALL nemo_set_cfctl( sn_cfctl, ln_ctl, .TRUE. ) 
     340      ENDIF 
     341 
    330342      lwm = (narea == 1)                                    ! control of output namelists 
    331343      lwp = (narea == 1) .OR. ln_ctl                        ! control of all listing output print 
    332344 
     
    503515         WRITE(numout,*) '~~~~~~~~' 
    504516         WRITE(numout,*) '   Namelist namctl' 
    505517         WRITE(numout,*) '      run control (for debugging)     ln_ctl     = ', ln_ctl 
     518         WRITE(numout,*) '       finer control over o/p sn_cfctl%l_config  = ', sn_cfctl%l_config 
     519         WRITE(numout,*) '                              sn_cfctl%l_runstat = ', sn_cfctl%l_runstat 
     520         WRITE(numout,*) '                              sn_cfctl%l_trcstat = ', sn_cfctl%l_trcstat 
     521         WRITE(numout,*) '                              sn_cfctl%l_oceout  = ', sn_cfctl%l_oceout 
     522         WRITE(numout,*) '                              sn_cfctl%l_layout  = ', sn_cfctl%l_layout 
     523         WRITE(numout,*) '                              sn_cfctl%l_mppout  = ', sn_cfctl%l_mppout 
     524         WRITE(numout,*) '                              sn_cfctl%l_mpptop  = ', sn_cfctl%l_mpptop 
     525         WRITE(numout,*) '                              sn_cfctl%procmin   = ', sn_cfctl%procmin   
     526         WRITE(numout,*) '                              sn_cfctl%procmax   = ', sn_cfctl%procmax   
     527         WRITE(numout,*) '                              sn_cfctl%procincr  = ', sn_cfctl%procincr  
     528         WRITE(numout,*) '                              sn_cfctl%ptimincr  = ', sn_cfctl%ptimincr  
    506529         WRITE(numout,*) '      level of print                  nn_print   = ', nn_print 
    507530         WRITE(numout,*) '      Start i indice for SUM control  nn_ictls   = ', nn_ictls 
    508531         WRITE(numout,*) '      End i indice for SUM control    nn_ictle   = ', nn_ictle 
     
    649672      ! 
    650673   END SUBROUTINE nemo_alloc 
    651674 
     675   SUBROUTINE nemo_set_cfctl(sn_cfctl, setto, for_all ) 
     676      !!---------------------------------------------------------------------- 
     677      !!                     ***  ROUTINE nemo_set_cfctl  *** 
     678      !! 
     679      !! ** Purpose :   Set elements of the output control structure to setto. 
     680      !!                for_all should be .false. unless all areas are to be 
     681      !!                treated identically. 
     682      !! 
     683      !! ** Method  :   Note this routine can be used to switch on/off some 
     684      !!                types of output for selected areas but any output types 
     685      !!                that involve global communications (e.g. mpp_max, glob_sum) 
     686      !!                should be protected from selective switching by the 
     687      !!                for_all argument 
     688      !!---------------------------------------------------------------------- 
     689      LOGICAL :: setto, for_all 
     690      TYPE (sn_ctl) :: sn_cfctl 
     691      !!---------------------------------------------------------------------- 
     692      IF( for_all ) THEN 
     693         sn_cfctl%l_runstat = setto 
     694         sn_cfctl%l_trcstat = setto 
     695      ENDIF 
     696      sn_cfctl%l_oceout  = setto 
     697      sn_cfctl%l_layout  = setto 
     698      sn_cfctl%l_mppout  = setto 
     699      sn_cfctl%l_mpptop  = setto 
     700   END SUBROUTINE nemo_set_cfctl 
     701 
    652702   !!====================================================================== 
    653703END MODULE nemogcm 

Eventually, setting sn_cfctl%l_config true will force ln_ctl to be false but the current implementation is incomplete and can only be used the activate run.stat, tracer.stat and multiple layout.dats independently of ln_ctl. ln_ctl, therefore, remains as the overriding control for all outputs.

3. OCE/stpctl.F90

The logic changes required in stpctl.F90 to control the production of run.stat are relatively simple. Three new local logicals are introduced to make the purpose of the logical constructs clearer. These determine whether or not to collect the global maximums (ll_colruns) , whether or not to handle the actual writing (ll_wrtruns) and whether or not to update the time.step file (ll_wrtstp).

  • OCE/stpctl.F90

     
    6666      INTEGER, DIMENSION(3)  ::   iu, is1, is2        ! min/max loc indices 
    6767      REAL(wp)               ::   zzz                 ! local real  
    6868      REAL(wp), DIMENSION(9) ::   zmax 
     69      LOGICAL                ::   ll_wrtstp, ll_colruns, ll_wrtruns 
    6970      CHARACTER(len=20) :: clname 
    7071      !!---------------------------------------------------------------------- 
    7172      ! 
     73      ll_wrtstp  = ( MOD( kt, sn_cfctl%ptimincr ) == 0 ) .OR. ( kt == nitend ) 
     74      ll_colruns = ll_wrtstp .AND. ( ln_ctl .OR. sn_cfctl%l_runstat ) 
     75      ll_wrtruns = ll_colruns .AND. lwm 
    7276      IF( kt == nit000 .AND. lwp ) THEN 
    7377         WRITE(numout,*) 
    7478         WRITE(numout,*) 'stp_ctl : time-stepping control' 
    7579         WRITE(numout,*) '~~~~~~~' 
    7680         !                                ! open time.step file 
    7781         IF( lwm ) CALL ctl_opn( numstp, 'time.step', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 
    78          !                                ! open run.stat file 
    79          IF( ln_ctl .AND. lwm ) THEN 
     82         !                                ! open run.stat file(s) at start whatever 
     83         !                                ! the value of sn_cfctl%ptimincr 
     84         IF( lwm .AND. ( ln_ctl .OR. sn_cfctl%l_runstat ) ) THEN 
    8085            CALL ctl_opn( numrun, 'run.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 
    8186            clname = 'run.stat.nc' 
    8287            IF( .NOT. Agrif_Root() )   clname = TRIM(Agrif_CFixed())//"_"//TRIM(clname) 
     
    98103      ENDIF 
    99104      IF( kt == nit000 )   lsomeoce = COUNT( ssmask(:,:) == 1._wp ) > 0 
    100105      ! 
    101       IF(lwm) THEN                        !==  current time step  ==!   ("time.step" file) 
     106      IF(lwm .AND. ll_wrtstp) THEN        !==  current time step  ==!   ("time.step" file) 
    102107         WRITE ( numstp, '(1x, i8)' )   kt 
    103108         REWIND( numstp ) 
    104109      ENDIF 
     
    120125         zmax(9) = MAXVAL(   Cu_adv(:,:,:)   , mask = tmask(:,:,:) == 1._wp ) !       cell Courant no. max 
    121126      ENDIF 
    122127      ! 
    123       IF( ln_ctl ) THEN 
     128      IF( ll_colruns ) THEN 
    124129         CALL mpp_max( "stpctl", zmax )          ! max over the global domain 
    125130         nstop = NINT( zmax(7) )                 ! nstop indicator sheared among all local domains 
    126131      ENDIF 
    127132      !                                   !==  run statistics  ==!   ("run.stat" files) 
    128       IF( ln_ctl .AND. lwm ) THEN 
     133      IF( ll_wrtruns ) THEN 
    129134         WRITE(numrun,9500) kt, zmax(1), zmax(2), -zmax(3), zmax(4) 
    130135         istatus = NF90_PUT_VAR( idrun, idssh, (/ zmax(1)/), (/kt/), (/1/) ) 
    131136         istatus = NF90_PUT_VAR( idrun,   idu, (/ zmax(2)/), (/kt/), (/1/) ) 

4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90

Changes to control the production of tracer.stat follow similar lines with the introduction of a lltrcstat local logical. Note also changes to prtctl_trc.F90 to make mpp.top.output filenames compatible with other similar filenames (i.e. use I4.4 for area number). This one is a little off message because the control for this TOP output has been read in OCE/nemogcm.F90. I think this makes sense and moving it to the TOP name list just for the sake of keeping OCE and TOP fully independent seems unnecessary; TBD.

  • TOP/prtctl_trc.F90

     
    209209      IF( lk_mpp ) THEN 
    210210         sind = narea 
    211211         eind = narea 
    212          clb_name = "('mpp.top.output_',I3.3)" 
     212         clb_name = "('mpp.top.output_',I4.4)" 
    213213         cl_run = 'MULTI processor run' 
    214214         ! use indices for each area computed by mpp_init subroutine 
    215215         nlditl(1:jpnij) = nldit(:) 
     
    228228      ELSE 
    229229         sind = 1 
    230230         eind = ijsplt 
    231          clb_name = "('mono.top.output_',I3.3)" 
     231         clb_name = "('mono.top.output_',I4.4)" 
    232232         cl_run   = 'MONO processor run ' 
    233233         ! compute indices for each area as done in mpp_init subroutine 
    234234         CALL sub_dom 
  • TOP/trcini.F90

     
    7171      CALL trc_ini_trp   ! passive tracers transport 
    7272      CALL trc_ice_ini   ! Tracers in sea ice 
    7373      ! 
    74       IF(lwm) CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea ) 
     74      IF( lwm .AND. ( ln_ctl .OR. (sn_cfctl%l_config .AND. sn_cfctl%l_trcstat) ) ) THEN 
     75         CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea ) 
     76      ENDIF 
    7577      ! 
    7678      CALL trc_ini_state  !  passive tracers initialisation : from a restart or from clim 
    7779      IF( nn_dttrc /= 1 ) & 
  • TOP/trcstp.F90

     
    5656      ! 
    5757      INTEGER ::   jk, jn   ! dummy loop indices 
    5858      REAL(wp)::   ztrai    ! local scalar 
     59      LOGICAL ::   ll_trcstat ! local logical 
    5960      CHARACTER (len=25) ::   charout   ! 
    6061      !!------------------------------------------------------------------- 
    6162      ! 
     
    6768         r2dttrc = 2. * rdttrc       ! = 2 rdttrc (leapfrog) 
    6869      ENDIF 
    6970      ! 
     71      ll_trcstat  = ( ln_ctl .OR. sn_cfctl%l_trcstat ) .AND. & 
     72     &              ( ( MOD( kt, sn_cfctl%ptimincr ) == 0 ) .OR. ( kt == nitend ) ) 
    7073      IF( kt == nittrc000 .AND. lk_trdmxl_trc )  CALL trd_mxl_trc_init    ! trends: Mixed-layer 
    7174      ! 
    7275      IF( .NOT.ln_linssh ) THEN                                           ! update ocean volume due to ssh temporal evolution 
     
    108111         ! 
    109112      ENDIF 
    110113      ! 
    111       IF (ln_ctl ) THEN 
     114      IF (ll_trcstat) THEN 
    112115         ztrai = 0._wp                                                   !  content of all tracers 
    113116         DO jn = 1, jptra 
    114117            ztrai = ztrai + glob_sum( 'trcstp', trn(:,:,:,jn) * cvol(:,:,:)   ) 

5. OCE/LBC/mppini.F90

Control over the creation of multiple layout.dat files is easy to implement and so has been done as an example of how output in this category will be handled in future. For this category, the area subsetting can be used to restrict which areas produce files. The standard layout.dat file, produced by narea = 1 is always produced. Also, OCE/SBC/sbcfwb.F90 has been fixed so that only one EMPave.dat file is ever read or produced. Ultimately, only one processor should read this file and MPI_BCAST the values to all others as a more scalable solution (for name lists too?).

  • OCE/LBC/mppini.F90

     
    150150      INTEGER ::   ierr, ios                  ! 
    151151      INTEGER ::   inbi, inbj, iimax,  ijmax, icnt1, icnt2 
    152152      LOGICAL ::   llbest 
     153      LOGICAL ::   llwrtlay 
    153154      INTEGER, ALLOCATABLE, DIMENSION(:)     ::   iin, ii_nono, ii_noea          ! 1D workspace 
    154155      INTEGER, ALLOCATABLE, DIMENSION(:)     ::   ijn, ii_noso, ii_nowe          !  -     - 
    155156      INTEGER, ALLOCATABLE, DIMENSION(:,:) ::   iimppt, ilci, ibondi, ipproc   ! 2D workspace 
     
    166167           &             ln_vol, nn_volctl, nn_rimwidth, nb_jpk_bdy 
    167168      !!---------------------------------------------------------------------- 
    168169 
     170      llwrtlay = lwp .OR. ln_ctl .OR. ( sn_cfctl%l_config .AND.  sn_cfctl%l_layout ) 
    169171      ! do we need to take into account bdy_msk? 
    170172      REWIND( numnam_ref )              ! Namelist nambdy in reference namelist : BDY 
    171173      READ  ( numnam_ref, nambdy, IOSTAT = ios, ERR = 903) 
     
    553555      END DO 
    554556       
    555557      ! Save processor layout in ascii file 
    556       IF (lwp) THEN 
     558      IF (llwrtlay) THEN 
    557559         CALL ctl_opn( inum, 'layout.dat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, .FALSE., narea ) 
    558560         WRITE(inum,'(a)') '   jpnij   jpimax  jpjmax    jpk  jpiglo  jpjglo'//&  
    559561   &           ' ( local:    narea     jpi     jpj )'     
     
    614616            WRITE(numout,*) 
    615617            WRITE(numout,*) '   ==>>>   North fold boundary prepared for jpni >1' 
    616618            ! additional prints in layout.dat  
     619         ENDIF 
     620         IF (llwrtlay) THEN 
    617621            WRITE(inum,*) 
    618622            WRITE(inum,*) 
    619623            WRITE(inum,*) 'number of subdomains located along the north fold : ', ndim_rank_north 
     
    628632      ! 
    629633      IF( ln_nnogather ) THEN 
    630634         CALL mpp_init_nfdcom     ! northfold neighbour lists 
    631          IF (lwp) THEN 
     635         IF (llwrtlay) THEN 
    632636            WRITE(inum,*) 
    633637            WRITE(inum,*) 
    634638            WRITE(inum,*) 'north fold exchanges with explicit point-to-point messaging :' 
     
    639643         ENDIF 
    640644      ENDIF 
    641645      ! 
    642       IF (lwp) CLOSE(inum) 
     646      IF (llwrtlay) CLOSE(inum) 
    643647      ! 
    644648      DEALLOCATE(iin, ijn, ii_nono, ii_noea, ii_noso, ii_nowe,    & 
    645649         &       iimppt, ijmppt, ibondi, ibondj, ipproc, ipolj,   & 
  • OCE/SBC/sbcfwb.F90

    
            
            
     
    143143            qns(:,:) = qns(:,:) - zcoef * sst_m(:,:) * tmask(:,:,1) ! account for change to the heat budget due to fw correction 
    144144         ENDIF 
    145145         ! 
    146          IF( kt == nitend .AND. lwp ) THEN            ! save fwfold value in a file 
     146         IF( kt == nitend .AND. lwm ) THEN            ! save fwfold value in a file (only one required) 
    147147            CALL ctl_opn( inum, 'EMPave.dat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, .FALSE., narea ) 
    148148            WRITE( inum, "(24X,I8,2ES24.16)" ) nyear, a_fwb_b, a_fwb 
    149149            CLOSE( inum ) 

6. Tidy up icb modules so that the existing nn_verbose_level parameter can be used to turn off icebergs.stat files

The icb modules already contain their own mechanism for controlling the verbosity of the icebergs.stat file reporting. This is done via the nn_verbose_level name list parameter which is set to a default value of 1. Output is minimal for nn_verbose_level=0 but is not switched off completely. Thus the icebergs.stat files are always created even when not required. This is particularly annoying at larger processor counts. Only a little work is required to ensure that nn_verbose_level=0 runs silently and does not create the unwanted files:

Changes required in:
OCE/ICB/icbdia.F90
OCE/ICB/icbdyn.F90
OCE/ICB/icbini.F90
OCE/ICB/icblbc.F90
OCE/ICB/icbstp.F90
OCE/ICB/icbutl.F90
  • OCE/ICB/icbdia.F90

     
    326326            CALL report_consistant( 'bot interface','kg','sent',calving_out_net, & 
    327327               &                    'returned',calving_ret_net) 
    328328         ENDIF 
    329          WRITE( numicb, '("calved by class = ",i6,20(",",i6))') (nbergs_calved_by_class(ik),ik=1,nclasses) 
    330          IF( nspeeding_tickets > 0 )   WRITE( numicb, '("speeding tickets issued = ",i6)') nspeeding_tickets 
     329         IF (nn_verbose_level > 0) THEN 
     330            WRITE( numicb, '("calved by class = ",i6,20(",",i6))') (nbergs_calved_by_class(ik),ik=1,nclasses) 
     331            IF( nspeeding_tickets > 0 )   WRITE( numicb, '("speeding tickets issued = ",i6)') nspeeding_tickets 
     332         ENDIF 
    331333         ! 
    332334         nbergs_start              = nbergs_end 
    333335         stored_start              = stored_end 
     
    436438      IF( kt == nit000 ) THEN 
    437439         stored_start = SUM( berg_grid%stored_ice(:,:,:) ) 
    438440         CALL mpp_sum( 'icbdia', stored_start ) 
    439          WRITE(numicb,'(a,es13.6,a)')   'icb_dia_income: initial stored mass=',stored_start,' kg' 
    440441         ! 
    441442         stored_heat_start = SUM( berg_grid%stored_heat(:,:) ) 
    442443         CALL mpp_sum( 'icbdia', stored_heat_start ) 
    443          WRITE(numicb,'(a,es13.6,a)')    'icb_dia_income: initial stored heat=',stored_heat_start,' J' 
     444         IF (nn_verbose_level > 0) THEN 
     445            WRITE(numicb,'(a,es13.6,a)')   'icb_dia_income: initial stored mass=',stored_start,' kg' 
     446            WRITE(numicb,'(a,es13.6,a)')   'icb_dia_income: initial stored heat=',stored_heat_start,' J' 
     447         ENDIF 
    444448      ENDIF 
    445449      ! 
    446450      calving_rcv_net = calving_rcv_net + SUM( berg_grid%calving(:,:) ) * berg_dt 
     
    514518      INTEGER,       INTENT(in), OPTIONAL :: kbergs 
    515519      !!---------------------------------------------------------------------- 
    516520      ! 
     521      IF (nn_verbose_level == 0) RETURN 
    517522      IF( PRESENT(kbergs) ) THEN 
    518523         WRITE(numicb,100) cd_budgetstr // ' state:',                                    & 
    519524            &              cd_startstr  // ' start',  pstartval,         cd_budgetunits, & 
     
    538543      REAL(wp),      INTENT(in) :: pstartval, pendval 
    539544      !!---------------------------------------------------------------------- 
    540545      ! 
     546      IF (nn_verbose_level == 0) RETURN 
    541547      WRITE(numicb,200) cd_budgetstr // ' check:',                 & 
    542548         &              cd_startstr,    pstartval, cd_budgetunits, & 
    543549         &              cd_endstr,      pendval,   cd_budgetunits, & 
     
    557563      REAL(wp) ::   zval 
    558564      !!---------------------------------------------------------------------- 
    559565      ! 
     566      IF (nn_verbose_level == 0) RETURN 
    560567      zval = ( ( pendval - pstartval ) - ( pinval - poutval ) ) /   & 
    561568         &   MAX( 1.e-30, MAX( ABS( pendval - pstartval ) , ABS( pinval - poutval ) ) ) 
    562569         ! 
     
    577584      INTEGER      , INTENT(in) ::   pstartval, pendval 
    578585      !!---------------------------------------------------------------------- 
    579586      ! 
     587      IF (nn_verbose_level == 0) RETURN 
    580588      WRITE(numicb,100) cd_budgetstr // ' state:',           & 
    581589         &              cd_startstr  // ' start', pstartval, & 
    582590         &              cd_endstr    // ' end',   pendval,   & 
     
    594602      INTEGER,       INTENT(in) :: pinval, poutval, pstartval, pendval 
    595603      !!---------------------------------------------------------------------- 
    596604      ! 
     605      IF (nn_verbose_level == 0) RETURN 
    597606      WRITE(numicb,200) cd_budgetstr // ' budget:', & 
    598607         &              cd_instr     // ' in',      pinval, & 
    599608         &              cd_outstr    // ' out',     poutval, & 
  • OCE/ICB/icbdyn.F90

     
    370370         ENDIF 
    371371      ENDIF 
    372372      !                                      ! check the speed and acceleration limits 
    373       IF( ABS( zuveln ) > pp_vel_lim   .OR. ABS( zvveln ) > pp_vel_lim   )   & 
    374          WRITE(numicb,'("pe=",i3,x,a)') narea,'Dump triggered by excessive velocity' 
    375       IF( ABS( pax    ) > pp_accel_lim .OR. ABS( pay    ) > pp_accel_lim )   & 
    376          WRITE(numicb,'("pe=",i3,x,a)') narea,'Dump triggered by excessive acceleration' 
     373      IF (nn_verbose_level > 0) THEN 
     374         IF( ABS( zuveln ) > pp_vel_lim   .OR. ABS( zvveln ) > pp_vel_lim   )   & 
     375            WRITE(numicb,'("pe=",i3,x,a)') narea,'Dump triggered by excessive velocity' 
     376         IF( ABS( pax    ) > pp_accel_lim .OR. ABS( pay    ) > pp_accel_lim )   & 
     377            WRITE(numicb,'("pe=",i3,x,a)') narea,'Dump triggered by excessive acceleration' 
     378      ENDIF 
    377379      ! 
    378380   END SUBROUTINE icb_accel 
    379381 
  • OCE/ICB/icbini.F90

     
    7777      !                          ! open ascii output file or files for iceberg status information 
    7878      !                          ! note that we choose to do this on all processors since we cannot 
    7979      !                          ! predict where icebergs will be ahead of time 
    80       CALL ctl_opn( numicb, 'icebergs.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 
     80      IF( nn_verbose_level > 0) THEN 
     81         CALL ctl_opn( numicb, 'icebergs.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 
     82      ENDIF 
    8183 
    8284      ! set parameters (mostly from namelist) 
    8385      ! 
     
    240242         ENDIF 
    241243         CALL iom_close( inum )                                     ! close file 
    242244         ! 
    243          WRITE(numicb,*) 
    244          WRITE(numicb,*) '          calving read in a file' 
     245         IF( nn_verbose_level > 0) THEN 
     246            WRITE(numicb,*) 
     247            WRITE(numicb,*) '          calving read in a file' 
     248         ENDIF 
    245249         ALLOCATE( sf_icb(1), STAT=istat1 )         ! Create sf_icb structure (calving) 
    246250         ALLOCATE( sf_icb(1)%fnow(jpi,jpj,1), STAT=istat2 ) 
    247251         ALLOCATE( sf_icb(1)%fdta(jpi,jpj,1,2), STAT=istat3 ) 
     
    335339      ! 
    336340      ibergs = icb_utl_count() 
    337341      CALL mpp_sum('icbini', ibergs) 
    338       WRITE(numicb,'(a,i6,a)') 'diamonds, icb_ini_gen: ',ibergs,' were generated' 
     342      IF( nn_verbose_level > 0) THEN 
     343         WRITE(numicb,'(a,i6,a)') 'diamonds, icb_ini_gen: ',ibergs,' were generated' 
     344      ENDIF 
    339345      ! 
    340346   END SUBROUTINE icb_ini_gen 
    341347 
  • OCE/ICB/icblbc.F90

     
    639639    
    640640            zsbergs(0) = narea 
    641641            zsbergs(1) = nicbfldnsend(jn) 
    642             !IF ( nicbfldnsend(jn) .GT. 0) write(numicb,*) 'ICB sending ',nicbfldnsend(jn),' to ', ifldproc 
     642            !IF ( nicbfldnsend(jn) .GT. 0 .AND. nn_verbose_level > 0 ) write(numicb,*) 'ICB sending ',nicbfldnsend(jn),' to ', ifldproc 
    643643            CALL mppsend( 21, zsbergs(0:1), 2, ifldproc-1, nicbfldreq(jn)) 
    644644         ENDIF 
    645645         ! 
     
    655655            DO jjn = 1,jpni 
    656656             IF( nicbfldproc(jjn) .eq. INT(znbergs(1)) ) EXIT 
    657657            END DO 
    658             IF( jjn .GT. jpni ) write(numicb,*) 'ICB ERROR' 
     658            IF( jjn .GT. jpni .AND. nn_verbose_level > 0 ) write(numicb,*) 'ICB ERROR' 
    659659            nicbfldexpect(jjn) = INT( znbergs(2) ) 
    660             !IF ( nicbfldexpect(jjn) .GT. 0) write(numicb,*) 'ICB expecting ',nicbfldexpect(jjn),' from ', nicbfldproc(jjn) 
    661             !CALL FLUSH(numicb) 
     660            !IF ( nicbfldexpect(jjn) .GT. 0 .AND. nn_verbose_level > 0 ) write(numicb,*) 'ICB expecting ',nicbfldexpect(jjn),' from ', nicbfldproc(jjn) 
     661            !IF (nn_verbose_level > 0) CALL FLUSH(numicb) 
    662662         ENDIF 
    663663         ! 
    664664      END DO 
     
    910910            DEALLOCATE(old) 
    911911         ENDIF 
    912912         old => new 
    913         !WRITE( numicb,*) 'icb_increase_ibuffer',narea,' increased to',inew_size 
     913         !IF (nn_verbose_level > 0) WRITE( numicb,*) 'icb_increase_ibuffer',narea,' increased to',inew_size 
    914914      ENDIF 
    915915      ! 
    916916   END SUBROUTINE icb_increase_ibuffer 
  • OCE/ICB/icbstp.F90

     
    162162 
    163163      IF(lwp) WRITE(numout,'(a,i6)') 'icebergs: icb_end complete', narea 
    164164      ! 
    165       CALL flush( numicb ) 
    166       CLOSE( numicb ) 
     165      IF( nn_verbose_level > 0 ) THEN 
     166         CALL flush( numicb ) 
     167         CLOSE( numicb ) 
     168      ENDIF 
    167169      ! 
    168170   END SUBROUTINE icb_end 
    169171 
  • OCE/ICB/icbutl.F90

     
    624624      INTEGER                :: kt      ! timestep number 
    625625      !!---------------------------------------------------------------------- 
    626626      ! 
     627      IF (nn_verbose_level == 0) RETURN 
    627628      pt => berg%current_point 
    628629      WRITE(numicb, 9200) kt, berg%number(1), & 
    629630                   pt%xi, pt%yj, pt%lon, pt%lat, pt%uvel, pt%vvel,  & 
     
    648649      TYPE(iceberg), POINTER :: this 
    649650      !!---------------------------------------------------------------------- 
    650651      ! 
     652      IF (nn_verbose_level == 0) RETURN 
    651653      this => first_berg 
    652654      IF( ASSOCIATED(this) ) THEN 
    653655         WRITE(numicb,'(a," pe=(",i3,")")' ) cd_label, narea 

7. sette/sette.sh

Finally, once implemented, these changes allow SETTE tests to be run without creating the full set of multiple output files. It is simply a case of replacing all ln_ctl settings like so:

  • sette.sh

     
    287288    set_namelist namelist_cfg nn_stock 495 
    288289    set_namelist namelist_cfg jpni 4 
    289290    set_namelist namelist_cfg jpnj 8 
    290     set_namelist namelist_cfg ln_ctl .true. 
     291    set_namelist namelist_cfg ln_ctl .false. 
     292    set_namelist namelist_cfg sn_cfctl%l_config .true. 
     293    set_namelist namelist_cfg sn_cfctl%l_runstat .true. 
     294    set_namelist namelist_cfg sn_cfctl%l_trcstat .true. 
     295    set_namelist namelist_cfg nn_verbose_level 0 
    291296    set_namelist namelist_cfg ln_use_calving .true. 
    292297    set_namelist namelist_cfg ln_wave .true. 
    293298    set_namelist namelist_cfg ln_cdgw .true. 

Sebastien's comments on the first draft, many of which have already been incorporated into version2 above:

Maybe you know this story, but it case you don’t know… :-)

1) A long time ago - in a galaxy far far away - … ln_ctl was there (since ever?, at least in rev3) and was a control print of the trend over the first core.

2) At rev258 https://forge.ipsl.jussieu.fr/nemo/changeset/258 the concept was modified and extended to be able to debug mpi reproducibility. prtctl was added but, to my knowledge, never documented. The idea is to compare the mean value of every trend in the code over (1) the local mpi domain and (2) the same domain but in a 1 core simulation (or n core as soon as the domain you want to test is included in your mpi subdomain). You compare you mpp.output_xxxx and mono.output_xxxx files and the first difference tells you which core and which trend is the first to diverge.

3) Next, as prtctl was dealing with mpp.output_xxxx and mono.output_xxxx files, we again extended the use of ln_ctl to control the creation of any file trough the definition of lwp: https://forge.ipsl.jussieu.fr/nemo/changeset/1579 This was a bad idea as, with 1 variable (ln_ctl), we control 2 different concepts : (1) mpp debug which requires the production of mpp.output_xxxx and mono.output_xxxx files and (2) the control of all other outputs files (ocean.output, run.stat, layout.dat…)

4) For Performance issues, with Eric, we wanted to be able to remove the globsum in stpctl. At some point we thought that we should introduce a new namelist variable something like in_fast or ln_prod or ln_debug or nn_debug which would allow us to reduce the printed informations and use faster code (for example without globsum in stpctl). After some discussions, we thought that our idea was not clear enough to be introduced in the code before the release in December. We therefore decided to postpone this and simply use ln_ctl to switch on/off the globsum (because if ln_ctl = T, you don’t care of the performances so you can do globsum in stpctl). Once you switch off the globsum, the data in run.stat are useless, so we also switch off run.stat files.

So Today, ln_ctl is mixing different functionalities coming from theses 4 layers of developments. I think we should take advantage of your development to clean-up this mess!

One proposition (to be discussed) could be to split ln_ctl into 3 functionalities:

  • mpp debug associated to prtctl. That could be renamed, for example in prttrd or trdprt for "trend print". Or mppdbg for mpp debug? this part requires the creation of the mpp.output_xxxx and mono.output_xxxx files which, to me, should be controlled only by the activation or not of prtctl
  • something (a logical, an integer, a structure) related to the balance between verbosity-control-debug and performance. This would control/replace/be linked with the creation of run.stat files, nn_print, ln_timing maybe also layout.dat
  • use your sn_ctl to control other files

Other minor points :

  • EMPave.dat : I had a quick look but it seems that this file is the same for each core (contains only the year and global mean). So we should not offer the possibility to create EMPave.dat_xxxx files. Either all cores read the same file (as we do for the nameliste) or only core 0 read it and use mpi_broadcast to send the informations to all other cores (as, I think, we should also do for the name lists). First part done, will consider use of mpi_bcast in 2019
  • We have files created by oce, si3 and top. Maybe, It is strange if a part of OCE, control the creation of top files like tracer.stat, no? To be discussed
  • instead of having to test sn_cfctl%l_config everywhere as for example in "sn_cfctl%l_config .AND. sn_cfctl%l_trcstat”, I would, in nemo_set_cfctl, do sn_cfctl%l_trcstat = setto .AND. sn_cfctl%l_config Actually the test of l_config isn't needed at the lower levels, removed
  • there is also time.step file Not sure we ever want to switch this off but it can now be updated less frequently according to ptimincr
  • do we really need the sn_ctl structure? why no a simple list of logical and integer? To be discussed
  • I would like to “promote” the use of ln_timing, except in production mode, so people become used to look at this file to see the main bottlenecks in computational coast and communications… Think this is a separate issue unless multiple files are produced?

Tests

Once the development is done, the PI should complete the tests section below and after ask the reviewers to start their review.

This part should contain the detailed results of SETTE tests (restartability and reproducibility for each of the reference configuration) and detailed results of restartability and reproducibility when the option is activated on specified configurations used for this test

Regular checks:

  • Can this change be shown to produce expected impact (option activated)?
  • Can this change be shown to have a null impact (option not activated)?
  • Results of the required bit comparability tests been run: are there no differences when activating the development?
  • If some differences appear, is reason for the change valid/understood?
  • If some differences appear, is the impact as expected on model configurations?
  • Is this change expected to preserve all diagnostics?
  • If no, is reason for the change valid/understood?
  • Are there significant changes in run time/memory?

Review

A successful review is needed to schedule the merge of this development into the future NEMO release during next Merge Party (usually in November).

Assessments:

  • Is the proposed methodology now implemented?
  • Are the code changes in agreement with the flowchart defined at preview step?
  • Are the code changes in agreement with list of routines and variables as proposed at preview step?
    If, not, are the discrepancies acceptable?
  • Is the in-line documentation accurate and sufficient?
  • Do the code changes comply with NEMO coding standards?
  • Is the development documented with sufficient details for others to understand the impact of the change?
  • Is the project literature (manual, guide, web, …) now updated or completed following the proposed summary in preview section?

Finding:

Is the review fully successful? If not, please indicate what is still missing


Once review is successful, the development must be scheduled for merge during next Merge Party Meeting.