New URL for NEMO forge! http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.

2021WP/VLD-05_Coward_SETTE_inputs (diff) – NEMO

Context Navigation

Changes between Version 5 and Version 6 of 2021WP/VLD-05_Coward_SETTE_inputs

Timestamp:: 2021-03-15T17:30:00+01:00 (3 years ago)
Author:: acc
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

2021WP/VLD-05_Coward_SETTE_inputs

-                      v5
+                      v6
 the delay on preview (or review) are longer than the 2 weeks expected.
 [[PageOutline(2, , inline)]]
+[[PageOutline(2-3, , inline)]]
 == Summary
 …
 ||=Ticket       || #2637                                                 ||
 === Description
+== Description
 Collection and rationalisation of SETTE inputs. The set of input files for the full suite of SETTE tests has evolved rapidly to keep pace with changes to the code (such as the removal of haloes from external files). The current set needs to be cleaned of unused data and chunked and compressed with setting consistent with exascale ambitions. A definitive set then needs to be hosted in a publically available location with a fixed DOI. A lighter version (for example, reduced time-levels in forcing data) may also be appropriate for future containerisation or cloud-deployment of testing services.
+'''Stage 1''': Collect all input file sets for SETTE 4.2_RC (current trunk). Confirm successful SETTE results and store reference set of SETTE results. Remove redundant files and variables from contents. Make sure sensible chunking and compression choices have been made in all cases. Produce clean 'r4.2_RC_FULL' set. See details [#Stage1details here]
+'''Stage 2''': Reduce data volumes by selecting only sufficient forcing data for twice the period of each standard test. Run SETTE with reduced set and confirm unchanged results. Create recommended 'r4.2_RC' set. See details [#Stage2details here]
+'''Stage3''': Reduce data volumes further by limiting number of significant digits in all fields other than domain and coordinate variables. Confirm SETTE tests are still successful (results WILL be different). Create optional 'r4.2_RC_LITE' set.
+'''Stage 4''': Find a hosting and distribution option
+'''Stage 5''': Document and archive scripts so that the process can be repeated if(when) changes are made to the FULL set.
 ''...''
+=== Implementation
+'''Stage 1''': Collect all input file sets for SETTE 4.2_RC (current trunk). Confirm successful SETTE results and store reference set of SETTE results. Remove redundant files and variables from contents. Make sure sensible chunking and compression choices have been made in all cases. Produce clean 'r4.2_RC_FULL' set. See details [#Stage1details here]
+'''Stage 2''': Reduce data volumes by selecting only sufficient forcing data for twice the period of each standard test. Run SETTE with reduced set and confirm unchanged results. Create recommended 'r4.2_RC' set. See details [#Stage2details here]
+'''Stage3''': Reduce data volumes further by limiting number of significant digits in all fields other than domain and coordinate variables. Confirm SETTE tests are still successful (results WILL be different). Create optional 'r4.2_RC_LITE' set.
+'''Stage 4''': Find a hosting and distribution option
+'''Stage 5''': Document and archive scripts so that the process can be repeated if(when) changes are made to the FULL set.
+== Stage 1 details
+== Implementation
+=== Stage 1 details
 Stage 1 involves gathering all SETTE input files and systematically checking contents for redundancy and opportunities for compression. Where chunking and compression have already been applied, it is also important to check the current settings for validity. Here is a typical example from the {{{AGRIF_DEMO_v4.x.tar}}} set:
 …
 where {{{nccnkrpt}}} is my bash function defined as:
 {{{
 function nccnkrpt { ncks --cdl -m ${1} | grep '=' ; ncdump -s -h ${1} | grep -e ") ;" -e _ChunkSizes | sed -e 's/.*:/\t/' | sed -e ':x /) ;$/ { N; s/;\n//g ; bx }' ; }
+function nccnkrpt { if [ `ncdump -k ${1} | awk '{print $1}'` == "netCDF-4" ] ; then ncks --cdl -m ${1} | grep '=' | grep -v ":" ; ncdump -s -h ${1} | grep -e ") ;" -e _ChunkSizes | sed -e 's/.*:/\t/' | sed -e ':x /) ;$/ { N; s/;\n//g ; bx }'; fi ; }
 }}}
 which helps to reduce the verbosity of the ncdump -s -h output into a more digestible form. In this example the dataset has already been chunked and compressed but the chunksizes are an odd choice. Having chunk sizes which span the entire dataset will restrict future scalability since any access to these data will require reading and uncompressing these large chunks irrespective of the size of the calling domain. A chunksize greater than 1 for the t dimension is also wasteful and confusing  given that the file only contains a single time-level.
 …
 rm WED025_r4.2_RC_FULL/coordinates_WED025.nc
 }}}
+The full set of commands used to create the r4.2_RC_FULL set is (in additional to thise already shown):
+And  a few selected oddities in the original chunk settings that have been corrected in this tidy up:
+{{{
+nccnkrpt ICE_AGRIF_v4.x/initice.nc
+    time_counter = UNLIMITED ; // (1 currently)
+    x = 97 ;
+    y = 97 ;                                                       |
+        float nav_lon(y, x)     _ChunkSizes = 97, 97 ;             |
+        float nav_lat(y, x)     _ChunkSizes = 97, 97 ;             V
+        float time_counter(time_counter)        _ChunkSizes = 1048576 ;
+        float ati(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ;
+        float hti(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ;
+        float hts(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ;
+        float smi(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ;
+        float tmi(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ;
+        float tsu(time_counter, y, x)   _ChunkSizes = 1, 97, 97 ;
+nccnkrpt ORCA2_ICE_v4.x/sss_data.nc
+    time_counter = UNLIMITED ; // (12 currently)
+    x = 180 ;
+    y = 148 ;
+        float nav_lat(y, x)     _ChunkSizes = 148, 180 ;        |
+        float nav_lon(y, x)     _ChunkSizes = 148, 180 ;        V
+        float time_counter(time_counter)        _ChunkSizes = 1024 ;
+        float sss(time_counter, y, x)   _ChunkSizes = 21, 148, 180 ;
+                                                       ^
+                                                       |
+nccnkrpt ORCA2_OFF_v4.x/dyna_grid_T.nc
+    deptht = 31 ;
+    time_counter = UNLIMITED ; // (73 currently)
+    x = 180 ;
+    y = 148 ;
+        float nav_lon(y, x)     _ChunkSizes = 148, 180 ;
+        float nav_lat(y, x)     _ChunkSizes = 148, 180 ;
+        float deptht(deptht)    _ChunkSizes = 31 ;
+        float time_counter(time_counter)        _ChunkSizes = 1024 ;
+        float votemper(time_counter, deptht, y, x)      _ChunkSizes = 1, 31, 148, 180 ;
+        float vosaline(time_counter, deptht, y, x)      _ChunkSizes = 1, 31, 148, 180 ;
+        float sosstsst(time_counter, y, x)      _ChunkSizes = 54, 126, 154 ;
+        float sosaline(time_counter, y, x)      _ChunkSizes = 54, 126, 154 ;
+        float sossheig(time_counter, y, x)      _ChunkSizes = 54, 126, 154 ;
+        float iowaflup(time_counter, y, x)      _ChunkSizes = 54, 126, 154
+??
+}}}
+The full set of commands used to create the r4.2_RC_FULL set is (in additional to those already shown):
 {{{
 ########## AMMcmds ##########
 …
 }}}
+== Stage 2 details
+The files created by all these commands have been fully SETTE tested with trunk revision 14595 and produce identical results to the same tests performed with the original files.
+=== Stage 2 details
 === Documentation updates