Version 4 (modified by nicolasmartin, 6 years ago) (diff) |
---|
ENHANCE-04_AndrewC-reporting
Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?
The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.
Summary
Investigate ways of improving the code's reporting facilities. Currently, errors away from the lead process are not fully reported and the ln_ctl mechanism produces an overwhelming volume of output for large processor counts. Options are needed to produce more selective output (either by output type or processor range). This is an un-started task from the 2018WP (formerly ROBUST-06_AndrewC-reporting) that has been carried forward to 2019. #2167
Preview
One of the features of NEMO 4.0 is the large reduction in global communications at the cost of suppressing some global diagnostics which tend to be useful only during SETTE testing, debugging or new configuration development. Such diagnostics can be re-activated using the ln_ctl namelist variable but this is rather a blunt instrument in that it activates all extra output for all processing elements. This list ican include:
run.stat run.stat.nc tracer.stat ocean.output_XXXX layout.dat_XXXX mpp.output_XXXX mpp.top.output_XXX <---- correct this to 4-digits for consistency EMPave.dat_XXXX icebergs.stat_XXXX
and possibly others depending on runtime options. One of the 2019 NEMO workplan entries (ENHANCE-04_AndrewC-reporting) plans to introduce more control over the choice of output created but it may be best to introduce the basic structure prior to the v4.0 release. Below is a summary of the overall plans and what can be implemented immediately if there is consensus.
At this stage, the plan is to leave ln_ctl as an all-or-nothing switch but to add the basic structure for finer control. For the v4.0 release, this new structure will only be capable of activating the global stats files (run.stat, run.stat.nc and tracer.stat) independently of ln_ctl but the placeholders will be there for extended control over other outputs. Minimal changes to achieve this will affect:
1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref 2. OCE/nemogcm.F90 OFF/nemogcm.F90 SAO/nemogcm.F90 SAS/nemogcm.F90 3. OCE/stpctl.F90 4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90 5. OCE/LBC/mppini.F90 6. sette/sette.sh
1. OCE/IOM/in_out_manager.F90 cfgs/SHARED/namelist_ref
These changes introduce a derived-type structure into the in_out_manager module and populates the reference namelist with default values.
There are 5 basic sets of parameters in this structure:
* l_config Activates use of the settings in the rest of the structure (ln_ctl is ignored if this is true) * l_runstat, l_trcstat Activates production of global stats files. Only a single file of each of these is ever produced. * l_oceout, l_layout, l_EMPave Normal operation is to produce a single version of each of these. If true then a version for each area is produced * l_mppout, l_mpptrc, l_icbstat Suppressed if false, otherwise produce a version for each area. * procmin, procmax, procincr Allow subsetting of areas when producing output in the previous two categories. Default values will ensure all areas report.
-
OCE/IOM/in_out_manager.F90
99 99 !! output monitoring 100 100 !!---------------------------------------------------------------------- 101 101 LOGICAL :: ln_ctl !: run control for debugging 102 TYPE :: sn_ctl !: optional use structure for finer control over output selection 103 LOGICAL :: l_config = .FALSE. !: activate/deactivate finer control 104 ! Note if l_config is True then ln_ctl is ignored. 105 ! Otherwise setting ln_ctl True is equivalent to setting 106 ! all the following logicals in this structure True 107 LOGICAL :: l_runstat = .FALSE. !: Produce/do not produce run.stat file (T/F) 108 LOGICAL :: l_trcstat = .FALSE. !: Produce/do not produce tracer.stat file (T/F) 109 LOGICAL :: l_oceout = .FALSE. !: Produce all ocean.outputs (T) or just one (F) 110 LOGICAL :: l_layout = .FALSE. !: Produce all layout.dat files (T) or just one (F) 111 LOGICAL :: l_EMPave = .FALSE. !: Produce all EMPave.dat files (T) or just one (F) (if active) 112 LOGICAL :: l_mppout = .FALSE. !: Produce/do not produce mpp.output_XXXX files (T/F) 113 LOGICAL :: l_mpptop = .FALSE. !: Produce/do not produce mpp.top.output_XXXX files (T/F) 114 LOGICAL :: l_icbstat = .FALSE. !: Produce/do not produce icebergs.stat_XXXX files (T/F) 115 ! Optional subsetting of processor report files 116 ! Default settings of 0/1000000/1 should ensure all areas report. 117 ! Set to a more restrictive range to select specific areas 118 INTEGER :: procmin = 0 !: Minimum narea to output 119 INTEGER :: procmax = 1000000 !: Maximum narea to output 120 INTEGER :: procincr = 1 !: narea increment to output 121 END TYPE 122 TYPE (sn_ctl) :: sn_cfctl !: run control structure for selective output 102 123 LOGICAL :: ln_timing !: run control for timing 103 124 LOGICAL :: ln_diacfl !: flag whether to create CFL diagnostics 104 125 INTEGER :: nn_print !: level of print (0 no print) -
cfgs/SHARED/namelist_ref
1301 1301 !----------------------------------------------------------------------- 1302 1302 &namctl ! Control prints (default: OFF) 1303 1303 !----------------------------------------------------------------------- 1304 ln_ctl = .false. ! trends control print (expensive!) 1304 ln_ctl = .FALSE. ! Toggle all report printing on/off (T/F); Ignored if sn_cfctl%l_config is T 1305 sn_cfctl%l_config = .TRUE. ! IF .true. then control which reports are written with the following 1306 sn_cfctl%l_runstat = .FALSE. ! switches and which areas produce reports with the proc integer settings. 1307 sn_cfctl%l_trcstat = .FALSE. ! The default settings for the proc integers should ensure 1308 sn_cfctl%l_oceout = .FALSE. ! that all areas report. 1309 sn_cfctl%l_layout = .FALSE. ! 1310 sn_cfctl%l_EMPave = .FALSE. ! 1311 sn_cfctl%l_mppout = .FALSE. ! 1312 sn_cfctl%l_mpptop = .FALSE. ! 1313 sn_cfctl%l_icbstat = .FALSE. ! 1314 sn_cfctl%procmin = 0 ! Minimum area number for reporting [default:0] 1315 sn_cfctl%procmax = 1000000 ! Maximum area number for reporting [default:1000000] 1316 sn_cfctl%procincr = 1 ! Increment for optional subsetting of areas [default:1] 1305 1317 nn_print = 0 ! level of print (0 no extra print) 1306 1318 nn_ictls = 0 ! start i indice of control sum (use to compare mono versus 1307 1319 nn_ictle = 0 ! end i indice of control sum multi processor runs
The new structure is read and reported in nemogcm.F90. Changes for all variants are similar OCE/nemogcm.F90 is shown here:
-
OCE/nemogcm.F90
256 256 INTEGER :: ios, ilocal_comm ! local integers 257 257 CHARACTER(len=120), DIMENSION(60) :: cltxt, cltxt2, clnam 258 258 !! 259 NAMELIST/namctl/ ln_ctl , nn_print, nn_ictls, nn_ictle, &260 & nn_isplt , nn_jsplt, nn_jctls, nn_jctle, &259 NAMELIST/namctl/ ln_ctl , sn_cfctl, nn_print, nn_ictls, nn_ictle, & 260 & nn_isplt , nn_jsplt, nn_jctls, nn_jctle, & 261 261 & ln_timing, ln_diacfl 262 262 NAMELIST/namcfg/ ln_read_cfg, cn_domcfg, ln_closea, ln_write_cfg, cn_domcfg_out, ln_use_jattr 263 263 !!---------------------------------------------------------------------- … … 327 327 328 328 narea = narea + 1 ! mynode return the rank of proc (0 --> jpnij -1 ) 329 329 330 IF( sn_cfctl%l_config ) THEN 331 ! Activate finer control of report outputs 332 ! optionally switch off output from selected areas (note this only 333 ! applies to output which does not involve global communications) 334 IF( ( narea < sn_cfctl%procmin .OR. narea > sn_cfctl%procmax ) .OR. & 335 & ( MOD( narea - sn_cfctl%procmin, sn_cfctl%procincr ) /= 0 ) ) & 336 & CALL nemo_set_cfctl( sn_cfctl, .FALSE., .FALSE. ) 337 ELSE 338 CALL nemo_set_cfctl( sn_cfctl, ln_ctl, .TRUE. ) 339 ENDIF 340 330 341 lwm = (narea == 1) ! control of output namelists 331 342 lwp = (narea == 1) .OR. ln_ctl ! control of all listing output print 332 343 … … 489 500 WRITE(numout,*) '~~~~~~~~' 490 501 WRITE(numout,*) ' Namelist namctl' 491 502 WRITE(numout,*) ' run control (for debugging) ln_ctl = ', ln_ctl 503 WRITE(numout,*) ' finer control over o/p sn_cfctl%l_config = ', sn_cfctl%l_config 504 WRITE(numout,*) ' sn_cfctl%l_runstat = ', sn_cfctl%l_runstat 505 WRITE(numout,*) ' sn_cfctl%l_trcstat = ', sn_cfctl%l_trcstat 506 WRITE(numout,*) ' sn_cfctl%l_oceout = ', sn_cfctl%l_oceout 507 WRITE(numout,*) ' sn_cfctl%l_layout = ', sn_cfctl%l_layout 508 WRITE(numout,*) ' sn_cfctl%l_EMPave = ', sn_cfctl%l_EMPave 509 WRITE(numout,*) ' sn_cfctl%l_mppout = ', sn_cfctl%l_mppout 510 WRITE(numout,*) ' sn_cfctl%l_mpptop = ', sn_cfctl%l_mpptop 511 WRITE(numout,*) ' sn_cfctl%l_icbstat = ', sn_cfctl%l_icbstat 512 WRITE(numout,*) ' sn_cfctl%procmin = ', sn_cfctl%procmin 513 WRITE(numout,*) ' sn_cfctl%procmax = ', sn_cfctl%procmax 514 WRITE(numout,*) ' sn_cfctl%procincr = ', sn_cfctl%procincr 492 515 WRITE(numout,*) ' level of print nn_print = ', nn_print 493 516 WRITE(numout,*) ' Start i indice for SUM control nn_ictls = ', nn_ictls 494 517 WRITE(numout,*) ' End i indice for SUM control nn_ictle = ', nn_ictle … … 635 658 ! 636 659 END SUBROUTINE nemo_alloc 637 660 661 SUBROUTINE nemo_set_cfctl(sn_cfctl, setto, for_all ) 662 !!---------------------------------------------------------------------- 663 !! *** ROUTINE nemo_set_cfctl *** 664 !! 665 !! ** Purpose : Set elements of the output control structure to setto. 666 !! for_all should be .false. unless all areas are to be 667 !! treated identically. 668 !! 669 !! ** Method : Note this routine can be used to switch on/off some 670 !! types of output for selected areas but any output types 671 !! that involve global communications (e.g. mpp_max, glob_sum) 672 !! should be protected from selective switching by the 673 !! for_all argument 674 !!---------------------------------------------------------------------- 675 LOGICAL :: setto, for_all 676 TYPE (sn_ctl) :: sn_cfctl 677 !!---------------------------------------------------------------------- 678 IF( for_all ) THEN 679 sn_cfctl%l_runstat = setto 680 sn_cfctl%l_trcstat = setto 681 ENDIF 682 sn_cfctl%l_oceout = setto 683 sn_cfctl%l_layout = setto 684 sn_cfctl%l_EMPave = setto 685 sn_cfctl%l_mppout = setto 686 sn_cfctl%l_mpptop = setto 687 sn_cfctl%l_icbstat = setto 688 END SUBROUTINE nemo_set_cfctl 689 638 690 !!====================================================================== 639 691 END MODULE nemogcm
Eventually, setting sn_cfctl%l_config true will force ln_ctl to be false but the current implementation is incomplete and can only be used the activate run.stat, tracer.stat and multiple layout.dats independently of ln_ctl. ln_ctl, therefore, remains as the overriding control for all outputs.
3. OCE/stpctl.F90
The logic changes required in stpctl.F90 to control the production of run.stat are relatively simple. Two new local logicals are introduced to avoid having to evaluate logical constructs repeatedly. These determine whether or not to collect the global maximums (lcolruns) and whether or not to handle the actual writing (lwrtruns).
-
OCE/stpctl.F90
33 33 PUBLIC stp_ctl ! routine called by step.F90 34 34 35 35 INTEGER :: idrun, idtime, idssh, idu, ids1, ids2, idt1, idt2, idc1, idw1, istatus 36 LOGICAL :: lsomeoce 36 LOGICAL :: lsomeoce, lcolruns, lwrtruns 37 37 !!---------------------------------------------------------------------- 38 38 !! NEMO/OCE 4.0 , NEMO Consortium (2018) 39 39 !! $Id$ … … 69 69 CHARACTER(len=20) :: clname 70 70 !!---------------------------------------------------------------------- 71 71 ! 72 IF( kt == nit000 ) lcolruns = ln_ctl .OR. ( sn_cfctl%l_config .AND. sn_cfctl%l_runstat ) 73 IF( kt == nit000 ) lwrtruns = lcolruns .AND. lwm 72 74 IF( kt == nit000 .AND. lwp ) THEN 73 75 WRITE(numout,*) 74 76 WRITE(numout,*) 'stp_ctl : time-stepping control' … … 76 78 ! ! open time.step file 77 79 IF( lwm ) CALL ctl_opn( numstp, 'time.step', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 78 80 ! ! open run.stat file 79 IF( l n_ctl .AND. lwm) THEN81 IF( lwrtruns ) THEN 80 82 CALL ctl_opn( numrun, 'run.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp, narea ) 81 83 clname = 'run.stat.nc' 82 84 IF( .NOT. Agrif_Root() ) clname = TRIM(Agrif_CFixed())//"_"//TRIM(clname) … … 120 122 zmax(9) = MAXVAL( Cu_adv(:,:,:) , mask = tmask(:,:,:) == 1._wp ) ! cell Courant no. max 121 123 ENDIF 122 124 ! 123 IF( l n_ctl) THEN125 IF( lcolruns ) THEN 124 126 CALL mpp_max( "stpctl", zmax ) ! max over the global domain 125 127 nstop = NINT( zmax(7) ) ! nstop indicator sheared among all local domains 126 128 ENDIF 127 129 ! !== run statistics ==! ("run.stat" files) 128 IF( l n_ctl .AND. lwm) THEN130 IF( lwrtruns ) THEN 129 131 WRITE(numrun,9500) kt, zmax(1), zmax(2), -zmax(3), zmax(4) 130 132 istatus = NF90_PUT_VAR( idrun, idssh, (/ zmax(1)/), (/kt/), (/1/) ) 131 133 istatus = NF90_PUT_VAR( idrun, idu, (/ zmax(2)/), (/kt/), (/1/) )
4. TOP/prtctl_trc.F90 TOP/trcini.F90 TOP/trcstp.F90
Changes to control the production of tracer.stat follow similar lines with the introduction of a lltrcstat local logical. Note also changes ti prtctl_trc.F90 to make mpp.top.output filenames compatible with other similar filenames (i.e. use I4.4 for area number).
-
TOP/prtctl_trc.F90
209 209 IF( lk_mpp ) THEN 210 210 sind = narea 211 211 eind = narea 212 clb_name = "('mpp.top.output_',I 3.3)"212 clb_name = "('mpp.top.output_',I4.4)" 213 213 cl_run = 'MULTI processor run' 214 214 ! use indices for each area computed by mpp_init subroutine 215 215 nlditl(1:jpnij) = nldit(:) … … 228 228 ELSE 229 229 sind = 1 230 230 eind = ijsplt 231 clb_name = "('mono.top.output_',I 3.3)"231 clb_name = "('mono.top.output_',I4.4)" 232 232 cl_run = 'MONO processor run ' 233 233 ! compute indices for each area as done in mpp_init subroutine 234 234 CALL sub_dom -
TOP/trcini.F90
71 71 CALL trc_ini_trp ! passive tracers transport 72 72 CALL trc_ice_ini ! Tracers in sea ice 73 73 ! 74 IF(lwm) CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea ) 74 IF( lwm .AND. ( ln_ctl .OR. (sn_cfctl%l_config .AND. sn_cfctl%l_trcstat) ) ) THEN 75 CALL ctl_opn( numstr, 'tracer.stat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, lwp , narea ) 76 ENDIF 75 77 ! 76 78 CALL trc_ini_state ! passive tracers initialisation : from a restart or from clim 77 79 IF( nn_dttrc /= 1 ) & -
TOP/trcstp.F90
31 31 PUBLIC trc_stp ! called by step 32 32 33 33 LOGICAL :: llnew ! ??? 34 LOGICAL :: lltrcstat ! ??? 34 35 REAL(wp) :: rdt_sampl ! ??? 35 36 INTEGER :: nb_rec_per_day, ktdcy ! ??? 36 37 REAL(wp) :: rsecfst, rseclast ! ??? … … 67 68 r2dttrc = 2. * rdttrc ! = 2 rdttrc (leapfrog) 68 69 ENDIF 69 70 ! 71 IF( kt == nittrc000 ) lltrcstat = ln_ctl .OR. (sn_cfctl%l_config .AND. sn_cfctl%l_trcstat) 70 72 IF( kt == nittrc000 .AND. lk_trdmxl_trc ) CALL trd_mxl_trc_init ! trends: Mixed-layer 71 73 ! 72 74 IF( .NOT.ln_linssh ) THEN ! update ocean volume due to ssh temporal evolution … … 108 110 ! 109 111 ENDIF 110 112 ! 111 IF (l n_ctl) THEN113 IF (lltrcstat) THEN 112 114 ztrai = 0._wp ! content of all tracers 113 115 DO jn = 1, jptra 114 116 ztrai = ztrai + glob_sum( 'trcstp', trn(:,:,:,jn) * cvol(:,:,:) )
5. OCE/LBC/mppini.F90
Control over the creation of multiple layout.dat files is easy to implement and so has been done as an example of how output in this category will be handled in future. For this category, the area subsetting can be used to restrict which areas produce files. The standard layout.dat file, produced by narea = 1 is always produced.
-
OCE/LBC/mppini.F90
150 150 INTEGER :: ierr, ios ! 151 151 INTEGER :: inbi, inbj, iimax, ijmax, icnt1, icnt2 152 152 LOGICAL :: llbest 153 LOGICAL :: llwrtlay 153 154 INTEGER, ALLOCATABLE, DIMENSION(:) :: iin, ii_nono, ii_noea ! 1D workspace 154 155 INTEGER, ALLOCATABLE, DIMENSION(:) :: ijn, ii_noso, ii_nowe ! - - 155 156 INTEGER, ALLOCATABLE, DIMENSION(:,:) :: iimppt, ilci, ibondi, ipproc ! 2D workspace … … 166 167 & ln_vol, nn_volctl, nn_rimwidth, nb_jpk_bdy 167 168 !!---------------------------------------------------------------------- 168 169 170 llwrtlay = lwp .OR. ln_ctl .OR. ( sn_cfctl%l_config .AND. sn_cfctl%l_layout ) 169 171 ! do we need to take into account bdy_msk? 170 172 REWIND( numnam_ref ) ! Namelist nambdy in reference namelist : BDY 171 173 READ ( numnam_ref, nambdy, IOSTAT = ios, ERR = 903) … … 553 555 END DO 554 556 555 557 ! Save processor layout in ascii file 556 IF (l wp) THEN558 IF (llwrtlay) THEN 557 559 CALL ctl_opn( inum, 'layout.dat', 'REPLACE', 'FORMATTED', 'SEQUENTIAL', -1, numout, .FALSE., narea ) 558 560 WRITE(inum,'(a)') ' jpnij jpimax jpjmax jpk jpiglo jpjglo'//& 559 561 & ' ( local: narea jpi jpj )' … … 614 616 WRITE(numout,*) 615 617 WRITE(numout,*) ' ==>>> North fold boundary prepared for jpni >1' 616 618 ! additional prints in layout.dat 619 ENDIF 620 IF (llwrtlay) THEN 617 621 WRITE(inum,*) 618 622 WRITE(inum,*) 619 623 WRITE(inum,*) 'number of subdomains located along the north fold : ', ndim_rank_north … … 628 632 ! 629 633 IF( ln_nnogather ) THEN 630 634 CALL mpp_init_nfdcom ! northfold neighbour lists 631 IF (l wp) THEN635 IF (llwrtlay) THEN 632 636 WRITE(inum,*) 633 637 WRITE(inum,*) 634 638 WRITE(inum,*) 'north fold exchanges with explicit point-to-point messaging :' … … 639 643 ENDIF 640 644 ENDIF 641 645 ! 642 IF (l wp) CLOSE(inum)646 IF (llwrtlay) CLOSE(inum) 643 647 ! 644 648 DEALLOCATE(iin, ijn, ii_nono, ii_noea, ii_noso, ii_nowe, & 645 649 & iimppt, ijmppt, ibondi, ibondj, ipproc, ipolj, &
6. sette/sette.sh
Finally, once implemented, these changes allow SETTE tests to be run without creating the full set of multiple output files. It is simply a case of replacing all ln_ctl settings like so:
290c291,294
< set_namelist namelist_cfg ln_ctl .true.
---
> set_namelist namelist_cfg ln_ctl .false.
> set_namelist namelist_cfg sn_cfctl%l_config .true.
> set_namelist namelist_cfg sn_cfctl%l_runstat .true.
> set_namelist namelist_cfg sn_cfctl%l_trcstat .true.