Does XIOS add sufficient and accurate attribute metadata to rebuild zoom datasets correctly?

There appears to be insufficient or incorrect information in zoom domain files to rebuild whole datasets when the zoom region spans more than one XIOS server and is written to multiple files.

For example, consider this zoom in a 8x4 decomposition of ORCA2_ICE_PISCES defined by the following additions to the XML:

domain_def_nemo.xml:
     <!--   My zoom: example of hand defined zoom   -->
     <domain id="myzoomT" domain_ref="grid_T" >
       <zoom_domain ibegin="25" jbegin="20" ni="90" nj="45"/>
     </domain>

grid_def_nemo.xml:
       <grid id="zoom_T_3D" >
         <domain domain_ref="myzoomT" />
         <axis axis_ref="deptht" />
       </grid>

file_def_nemo-oce.xml:
    <file_definition type="multiple_file" name="@expname@_@freq@_@startdate@_@enddate@" sync_freq="1mo" min_digits="4">

      <file_group id="5d" output_freq="5d"  output_level="10" enabled=".TRUE.">  <!-- 5d files -->
        <file id="file66" name_suffix="_zoom_T" description="ocean T grid variables" >
          <field field_ref="e3t"  grid_ref="zoom_T_3D"    />
          <field field_ref="toce" grid_ref="zoom_T_3D" name="thetao"   operation="instant" freq_op="5d" > @toce_e3t / @e3t </field>
        </file>
        <file id="file11" ....

In a 8x4 decomposition using 4 external XIOS servers the following output files are produced for the 90x45 zoom region:

O2L3P_LONG_5d_00010101_00010303_zoom_T_0000.nc
O2L3P_LONG_5d_00010101_00010303_zoom_T_0001.nc

with the following attribute data in each respectively:

ncdump -h O2L3P_LONG_5d_00010101_00010303_zoom_T_0000.nc
// global attributes:
                .
                .
		:ibegin = 25 ;
		:ni = 90 ;
		:jbegin = 20 ;
		:nj = 17 ;
		:DOMAIN_number_total = 4 ;
		:DOMAIN_number = 0 ;
		:DOMAIN_dimensions_ids = 2, 3 ;
		:DOMAIN_size_global = 180, 148 ;
		:DOMAIN_size_local = 90, 17 ;
		:DOMAIN_position_first = 26, 21 ;
		:DOMAIN_position_last = 115, 37 ;
		:DOMAIN_halo_size_start = 0, 0 ;
		:DOMAIN_halo_size_end = 0, 0 ;
		:DOMAIN_type = "box" ;

ncdump -h O2L3P_LONG_5d_00010101_00010303_zoom_T_0001.nc
                .
                .
		:ibegin = 25 ;
		:ni = 90 ;
		:jbegin = 37 ;
		:nj = 28 ;
		:DOMAIN_number_total = 4 ;
		:DOMAIN_number = 1 ;
		:DOMAIN_dimensions_ids = 2, 3 ;
		:DOMAIN_size_global = 180, 148 ;
		:DOMAIN_size_local = 90, 28 ;
		:DOMAIN_position_first = 26, 38 ;
		:DOMAIN_position_last = 115, 65 ;
		:DOMAIN_halo_size_start = 0, 0 ;
		:DOMAIN_halo_size_end = 0, 0 ;
		:DOMAIN_type = "box" ;

The production of two files is correct because only two of the 4 XIOS servers are dealing with the zoom region. The data within each file is also correct but two issues with the attribute metadata prevent REBUILD_NEMO (and similar tools) from rebuilding the files correctly:

  • DOMAIN_number_total needs to be 2 not 4 otherwise REBUILD_NEMO will fail
  • DOMAIN_size_global will be used to determne the size of the collated dataset. What is actually wanted is to collate these data into a dataset of the whole zoom region (90x45). This information is not contained in the metadata.

The first issue could be dealt with using an ncatted command on the first dataset; for example:

rebuild_nemo -n nl.reb O2L3P_LONG_5d_00010101_00010303_zoom_T 2
file O2L3P_LONG_5d_00010101_00010303_zoom_T,  num_domains 2, num_threads 1
 Rebuilding the following files:
 O2L3P_LONG_5d_00010101_00010303_zoom_T_0000.nc
 O2L3P_LONG_5d_00010101_00010303_zoom_T_0001.nc
 ERROR! : number of files to rebuild in file does not agree with namelist
 Attribute DOMAIN_number_total is :            4
 Number of files specified in namelist is:            2
2

can be fixed with:

ncatted -a DOMAIN_number_total,global,m,d,2 O2L3P_LONG_5d_00010101_00010303_zoom_T_0000.nc

rebuild_nemo -n nl.reb O2L3P_LONG_5d_00010101_00010303_zoom_T 2
file O2L3P_LONG_5d_00010101_00010303_zoom_T,  num_domains 2, num_threads 1
 Rebuilding the following files:
 O2L3P_LONG_5d_00010101_00010303_zoom_T_0000.nc
 O2L3P_LONG_5d_00010101_00010303_zoom_T_0001.nc
 Size of global arrays:          180         148
.
.
 Closing input files...
 Closing output file...
 NEMO rebuild completed successfully

This successfully rebuilds the zoom but places it in an otherwise empty global domain:

Fixing the second issue is trickier. Simply editing the DOMAIN_size_global settings will not suffice because REBUILD_NEMO also uses the DOMAIN_position_first information to place data within the global arrays. Changing the size but not the offset results in Bus errors.

Proposed action

Fixing the metadata at source (XIOS) may be possible. It appears to only involve one module file (see details, below) but it isn't clear how XIOS distinguishes between global domains and zooms (if it does at all). A pragmatic solution will be to add the missing zoom domain information via the XML files and to adapt REBUILD_NEMO to use this information if present. For example, adding to the file_def_nemo-oce.xml:

file_def_nemo-oce.xml:
    <file_definition type="multiple_file" name="@expname@_@freq@_@startdate@_@enddate@" sync_freq="1mo" min_digits="4">

      <file_group id="5d" output_freq="5d"  output_level="10" enabled=".TRUE.">  <!-- 5d files -->
        <file id="file66" name_suffix="_zoom_T" description="ocean T grid variables" >
          <field field_ref="e3t"  grid_ref="zoom_T_3D"    />
          <field field_ref="toce" grid_ref="zoom_T_3D" name="thetao"   operation="instant" freq_op="5d" > @toce_e3t / @e3t </field>
          <variable name="DOMAIN_size_zoom_i" type="int"> 90 </variable>
          <variable name="DOMAIN_size_zoom_j" type="int"> 45 </variable>
        </file>
        <file id="file11" ....

results in:

ncdump -h O2L3P_LONG_5d_00010101_00010303_zoom_T_0000.nc
// global attributes:
                .
                .
		:ibegin = 25 ;
		:ni = 90 ;
		:jbegin = 20 ;
		:nj = 17 ;
		:DOMAIN_number_total = 2 ;
		:DOMAIN_number = 0 ;
		:DOMAIN_dimensions_ids = 2, 3 ;
		:DOMAIN_size_global = 180, 148 ;
		:DOMAIN_size_local = 90, 17 ;
		:DOMAIN_position_first = 26, 21 ;
		:DOMAIN_position_last = 115, 37 ;
		:DOMAIN_halo_size_start = 0, 0 ;
		:DOMAIN_halo_size_end = 0, 0 ;
		:DOMAIN_type = "box" ;
		:DOMAIN_size_zoom_i = 90 ;
		:DOMAIN_size_zoom_j = 45 ;

The remaining task is then to adapt REBUILD_NEMO so that if these new attributes are present:

  • DOMAIN_size_zoom_i and DOMAIN_size_zoom_j are used in place of DOMAIN_size_global
  • The ibegin and jbegin offsets are subtracted from the DOMAIN_position_first values when deciding where to place values into the output array.

The following changes to rebuild_nemo.F90 achieve the required result (a modified version of the full code is attached, see: rebuild_nemo_modified.F90):

  • rebuild_nemo.F90

    old new  
    7070   CHARACTER(LEN=50)  :: clibnc ! netcdf library version 
    7171 
    7272   INTEGER :: ndomain, ifile, ndomain_file, nslicesize, deflate_level 
    73    INTEGER :: ncid, outid, idim, istop 
    74    INTEGER :: natts, attid, xtype, varid, rbdims 
     73   INTEGER :: ncid, outid, idim, istop, istat 
     74   INTEGER :: natts, attid, xtype, varid, rbdims, rbdims1 
    7575   INTEGER :: jv, ndims, nvars, dimlen, dimids(4) 
    7676   INTEGER :: dimid, unlimitedDimId, di, dj, dr 
    7777   INTEGER :: nmax_unlimited, nt, ntslice 
     
    9292   INTEGER, DIMENSION(2) :: halo_start, halo_end, local_sizes 
    9393   INTEGER, DIMENSION(2) :: idomain, jdomain, rdomain, start_pos 
    9494   INTEGER :: ji, jj, jk, jl, jr 
     95   INTEGER :: ni_zoom, nj_zoom, ni_off, nj_off 
     96   LOGICAL :: iszoom 
    9597   INTEGER :: nargs                 ! number of arguments 
    9698   INTEGER, EXTERNAL :: iargc 
    9799 
     
    252254 
    253255   CALL check_nf90( nf90_get_att( ncid, nf90_global, 'DOMAIN_number_total', ndomain_file ) ) 
    254256   IF( ndomain /= ndomain_file ) THEN 
    255       WRITE(numerr,*) 'ERROR! : number of files to rebuild in file does not agree with namelist' 
     257      WRITE(numerr,*) 'WARNING! : number of files to rebuild in file does not agree with namelist' 
    256258      WRITE(numerr,*) 'Attribute DOMAIN_number_total is : ', ndomain_file 
    257259      WRITE(numerr,*) 'Number of files specified in namelist is: ', ndomain 
    258       STOP 2 
     260      istat = nf90_inquire_attribute( ncid, nf90_global, 'DOMAIN_size_zoom_i', xtype, rbdims, attid ) 
     261      IF ( istat == nf90_noerr ) THEN 
     262          WRITE(numerr,*) 'This looks like a zoom region so I will assume you know what you are doing' 
     263      ELSE 
     264          WRITE(numerr,*) 'This is a potentially fatal error' 
     265          STOP 2 
     266      ENDIF 
    259267   ENDIF 
    260268 
    261269!2.1 Set up the output file 
     
    275283 
    276284   ALLOCATE(global_sizes(rbdims)) 
    277285   CALL check_nf90( nf90_get_att( ncid, nf90_global, 'DOMAIN_size_global', global_sizes ) ) 
     286 
     287!2.2.0.1 Override global sizes if zoom attributes are found 
     288   iszoom = .false. 
     289   istat = nf90_inquire_attribute( ncid, nf90_global, 'DOMAIN_size_zoom_i', xtype, rbdims1, attid ) 
     290   IF ( istat == nf90_noerr ) THEN 
     291       CALL check_nf90( nf90_get_att( ncid, nf90_global, 'DOMAIN_size_zoom_i', ni_zoom ) ) 
     292       ! Need both zoom_i and zoom_j attributes to determine zoom size 
     293       istat = nf90_inquire_attribute( ncid, nf90_global, 'DOMAIN_size_zoom_j', xtype, rbdims1, attid ) 
     294       IF ( istat == nf90_noerr ) THEN 
     295           CALL check_nf90( nf90_get_att( ncid, nf90_global, 'DOMAIN_size_zoom_j', nj_zoom ) ) 
     296           iszoom = .true. 
     297           global_sizes(1) = ni_zoom 
     298           global_sizes(2) = nj_zoom 
     299           CALL check_nf90( nf90_get_att( ncid, nf90_global, 'ibegin', ni_off ) ) 
     300           CALL check_nf90( nf90_get_att( ncid, nf90_global, 'jbegin', nj_off ) ) 
     301       ELSE 
     302           iszoom = .false. 
     303       ENDIF 
     304   ENDIF 
     305 
    278306   IF (l_verbose) WRITE(numout,*) 'Size of global arrays: ', global_sizes 
    279307 
    280308 
     
    773801               halo_start(2) = 0 
    774802               di=rebuild_dims(1) 
    775803               dj=3-di 
     804            ELSEIF ( iszoom ) THEN 
     805               start_pos(di) = start_pos(di) - ni_off 
     806               start_pos(dj) = start_pos(dj) - nj_off 
    776807            ENDIF 
    777808 
    778809!3.3.1 Generate local domain interior sizes from local_sizes and halo sizes 

This modified rebuild_nemo will, if the correct attributes are found, rebuild just the zoom region. It will also ignore any mismatch between the DOMAIN_number_total attribute and the number of files specified by the user; trusting that the user has correctly specified the true number.

Notes for possibly tackling the problem at source

The attributes are written by XIOS in:

XIOS_2.5/src/io/nc4_data_output.cpp

by:

    if (server->intraCommSize > 1)
    {
       this->writeLocalAttributes(domain->zoom_ibegin,
                                  domain->zoom_ni,
                                  domain->zoom_jbegin,
                                  domain->zoom_nj,
                                  appendDomid);

       if (singleDomain)
       this->writeLocalAttributes_IOIPSL(dimXid, dimYid,
                                         domain->zoom_ibegin,
                                         domain->zoom_ni,
                                         domain->zoom_jbegin,
                                         domain->zoom_nj,
                                         domain->ni_glo,domain->nj_glo,
                                         server->intraCommRank,server->intraCommSize);


    }

and these functions are:

      void CNc4DataOutput::writeLocalAttributes
         (int ibegin, int ni, int jbegin, int nj, StdString domid)
      {
        try
        {
         SuperClassWriter::addAttribute(StdString("ibegin").append(domid), ibegin);
         SuperClassWriter::addAttribute(StdString("ni"    ).append(domid), ni);
         SuperClassWriter::addAttribute(StdString("jbegin").append(domid), jbegin);
         SuperClassWriter::addAttribute(StdString("nj"    ).append(domid), nj);
        }
        catch (CNetCdfException& e)
        {
           StdString msg("On writing Local Attributes: ");
           msg.append("In the context : ");
           CContext* context = CContext::getCurrent() ;
           msg.append(context->getId()); msg.append("\n");
           msg.append(e.what());
           ERROR("CNc4DataOutput::writeLocalAttributes \
                  (int ibegin, int ni, int jbegin, int nj, StdString domid)", << msg);
        }

      }

and

      void CNc4DataOutput::writeLocalAttributes_IOIPSL(const StdString& dimXid, const StdString& dimYid,
                                                       int ibegin, int ni, int jbegin, int nj, int ni_glo, int nj_glo, int rank, int size)
      {
         CArray<int,1> array(2) ;

         try
         {
           SuperClassWriter::addAttribute("DOMAIN_number_total",size ) ;
           SuperClassWriter::addAttribute("DOMAIN_number", rank) ;
           array = SuperClassWriter::getDimension(dimXid) + 1, SuperClassWriter::getDimension(dimYid) + 1;
           SuperClassWriter::addAttribute("DOMAIN_dimensions_ids",array) ;
           array=ni_glo,nj_glo ;
           SuperClassWriter::addAttribute("DOMAIN_size_global", array) ;
           array=ni,nj ;
           SuperClassWriter::addAttribute("DOMAIN_size_local", array) ;
           array=ibegin+1,jbegin+1 ;
           SuperClassWriter::addAttribute("DOMAIN_position_first", array) ;
           array=ibegin+ni-1+1,jbegin+nj-1+1 ;
           SuperClassWriter::addAttribute("DOMAIN_position_last",array) ;
           array=0,0 ;
           SuperClassWriter::addAttribute("DOMAIN_halo_size_start", array) ;
           SuperClassWriter::addAttribute("DOMAIN_halo_size_end", array);
           SuperClassWriter::addAttribute("DOMAIN_type",string("box")) ;
         }
         catch (CNetCdfException& e)
         {
           StdString msg("On writing Local Attributes IOIPSL \n");
           msg.append("In the context : ");
           CContext* context = CContext::getCurrent() ;
           msg.append(context->getId()); msg.append("\n");
           msg.append(e.what());
           ERROR("CNc4DataOutput::writeLocalAttributes_IOIPSL \
                  (int ibegin, int ni, int jbegin, int nj, int ni_glo, int nj_glo, int rank, int size)", << msg);
         }
      }
Last modified 2 months ago Last modified on 2020-08-18T11:54:19+02:00

Attachments (3)

Download all attachments as: .zip