New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
2021WP/VLD-05_Coward_SETTE_inputs (diff) – NEMO

Changes between Version 5 and Version 6 of 2021WP/VLD-05_Coward_SETTE_inputs


Ignore:
Timestamp:
2021-03-15T17:30:00+01:00 (3 years ago)
Author:
acc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2021WP/VLD-05_Coward_SETTE_inputs

    v5 v6  
    77the delay on preview (or review) are longer than the 2 weeks expected. 
    88 
    9 [[PageOutline(2, , inline)]] 
     9[[PageOutline(2-3, , inline)]] 
    1010 
    1111== Summary 
     
    2020||=Ticket       || #2637                                                 || 
    2121 
    22 === Description 
     22== Description 
    2323Collection and rationalisation of SETTE inputs. The set of input files for the full suite of SETTE tests has evolved rapidly to keep pace with changes to the code (such as the removal of haloes from external files). The current set needs to be cleaned of unused data and chunked and compressed with setting consistent with exascale ambitions. A definitive set then needs to be hosted in a publically available location with a fixed DOI. A lighter version (for example, reduced time-levels in forcing data) may also be appropriate for future containerisation or cloud-deployment of testing services. 
    2424 
     25'''Stage 1''': Collect all input file sets for SETTE 4.2_RC (current trunk). Confirm successful SETTE results and store reference set of SETTE results. Remove redundant files and variables from contents. Make sure sensible chunking and compression choices have been made in all cases. Produce clean 'r4.2_RC_FULL' set. See details [#Stage1details here] 
     26 
     27'''Stage 2''': Reduce data volumes by selecting only sufficient forcing data for twice the period of each standard test. Run SETTE with reduced set and confirm unchanged results. Create recommended 'r4.2_RC' set. See details [#Stage2details here] 
     28 
     29'''Stage3''': Reduce data volumes further by limiting number of significant digits in all fields other than domain and coordinate variables. Confirm SETTE tests are still successful (results WILL be different). Create optional 'r4.2_RC_LITE' set. 
     30 
     31'''Stage 4''': Find a hosting and distribution option 
     32 
     33'''Stage 5''': Document and archive scripts so that the process can be repeated if(when) changes are made to the FULL set. 
    2534 
    2635''...'' 
    2736 
    28 === Implementation 
    29  
    30 '''Stage 1''': Collect all input file sets for SETTE 4.2_RC (current trunk). Confirm successful SETTE results and store reference set of SETTE results. Remove redundant files and variables from contents. Make sure sensible chunking and compression choices have been made in all cases. Produce clean 'r4.2_RC_FULL' set. See details [#Stage1details here] 
    31  
    32 '''Stage 2''': Reduce data volumes by selecting only sufficient forcing data for twice the period of each standard test. Run SETTE with reduced set and confirm unchanged results. Create recommended 'r4.2_RC' set. See details [#Stage2details here] 
    33  
    34 '''Stage3''': Reduce data volumes further by limiting number of significant digits in all fields other than domain and coordinate variables. Confirm SETTE tests are still successful (results WILL be different). Create optional 'r4.2_RC_LITE' set. 
    35  
    36 '''Stage 4''': Find a hosting and distribution option 
    37  
    38 '''Stage 5''': Document and archive scripts so that the process can be repeated if(when) changes are made to the FULL set. 
    39  
    40  
    41 == Stage 1 details 
     37== Implementation 
     38 
     39 
     40 
     41 
     42=== Stage 1 details 
    4243 
    4344Stage 1 involves gathering all SETTE input files and systematically checking contents for redundancy and opportunities for compression. Where chunking and compression have already been applied, it is also important to check the current settings for validity. Here is a typical example from the {{{AGRIF_DEMO_v4.x.tar}}} set: 
     
    8485where {{{nccnkrpt}}} is my bash function defined as: 
    8586{{{  
    86 function nccnkrpt { ncks --cdl -m ${1} | grep '=' ; ncdump -s -h ${1} | grep -e ") ;" -e _ChunkSizes | sed -e 's/.*:/\t/' | sed -e ':x /) ;$/ { N; s/;\n//g ; bx }' ; } 
     87function nccnkrpt { if [ `ncdump -k ${1} | awk '{print $1}'` == "netCDF-4" ] ; then ncks --cdl -m ${1} | grep '=' | grep -v ":" ; ncdump -s -h ${1} | grep -e ") ;" -e _ChunkSizes | sed -e 's/.*:/\t/' | sed -e ':x /) ;$/ { N; s/;\n//g ; bx }'; fi ; } 
    8788}}} 
    8889which helps to reduce the verbosity of the ncdump -s -h output into a more digestible form. In this example the dataset has already been chunked and compressed but the chunksizes are an odd choice. Having chunk sizes which span the entire dataset will restrict future scalability since any access to these data will require reading and uncompressing these large chunks irrespective of the size of the calling domain. A chunksize greater than 1 for the t dimension is also wasteful and confusing  given that the file only contains a single time-level. 
     
    213214rm WED025_r4.2_RC_FULL/coordinates_WED025.nc 
    214215}}} 
    215 The full set of commands used to create the r4.2_RC_FULL set is (in additional to thise already shown): 
     216And  a few selected oddities in the original chunk settings that have been corrected in this tidy up: 
     217{{{ 
     218nccnkrpt ICE_AGRIF_v4.x/initice.nc 
     219    time_counter = UNLIMITED ; // (1 currently) 
     220    x = 97 ; 
     221    y = 97 ;                                                       | 
     222        float nav_lon(y, x)     _ChunkSizes = 97, 97 ;             | 
     223        float nav_lat(y, x)     _ChunkSizes = 97, 97 ;             V 
     224        float time_counter(time_counter)        _ChunkSizes = 1048576 ; 
     225        float ati(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ; 
     226        float hti(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ; 
     227        float hts(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ; 
     228        float smi(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ; 
     229        float tmi(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ; 
     230        float tsu(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ; 
     231 
     232nccnkrpt ORCA2_ICE_v4.x/sss_data.nc 
     233    time_counter = UNLIMITED ; // (12 currently) 
     234    x = 180 ; 
     235    y = 148 ; 
     236        float nav_lat(y, x)     _ChunkSizes = 148, 180 ;        | 
     237        float nav_lon(y, x)     _ChunkSizes = 148, 180 ;        V 
     238        float time_counter(time_counter)        _ChunkSizes = 1024 ; 
     239        float sss(time_counter, y, x)   _ChunkSizes = 21, 148, 180 ; 
     240                                                       ^ 
     241                                                       |  
     242 
     243nccnkrpt ORCA2_OFF_v4.x/dyna_grid_T.nc 
     244    deptht = 31 ; 
     245    time_counter = UNLIMITED ; // (73 currently) 
     246    x = 180 ; 
     247    y = 148 ; 
     248        float nav_lon(y, x)     _ChunkSizes = 148, 180 ; 
     249        float nav_lat(y, x)     _ChunkSizes = 148, 180 ; 
     250        float deptht(deptht)    _ChunkSizes = 31 ; 
     251        float time_counter(time_counter)        _ChunkSizes = 1024 ; 
     252        float votemper(time_counter, deptht, y, x)      _ChunkSizes = 1, 31, 148, 180 ; 
     253        float vosaline(time_counter, deptht, y, x)      _ChunkSizes = 1, 31, 148, 180 ; 
     254        float sosstsst(time_counter, y, x)      _ChunkSizes = 54, 126, 154 ; 
     255        float sosaline(time_counter, y, x)      _ChunkSizes = 54, 126, 154 ; 
     256        float sossheig(time_counter, y, x)      _ChunkSizes = 54, 126, 154 ; 
     257        float iowaflup(time_counter, y, x)      _ChunkSizes = 54, 126, 154  
     258?? 
     259}}} 
     260The full set of commands used to create the r4.2_RC_FULL set is (in additional to those already shown): 
    216261{{{ 
    217262########## AMMcmds ########## 
     
    467512}}} 
    468513 
    469 == Stage 2 details 
     514The files created by all these commands have been fully SETTE tested with trunk revision 14595 and produce identical results to the same tests performed with the original files. 
     515 
     516=== Stage 2 details 
    470517=== Documentation updates 
    471518