Changes between Version 5 and Version 6 of 2021WP/VLD-05_Coward_SETTE_inputs
- Timestamp:
- 2021-03-15T17:30:00+01:00 (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
2021WP/VLD-05_Coward_SETTE_inputs
v5 v6 7 7 the delay on preview (or review) are longer than the 2 weeks expected. 8 8 9 [[PageOutline(2 , , inline)]]9 [[PageOutline(2-3, , inline)]] 10 10 11 11 == Summary … … 20 20 ||=Ticket || #2637 || 21 21 22 == =Description22 == Description 23 23 Collection and rationalisation of SETTE inputs. The set of input files for the full suite of SETTE tests has evolved rapidly to keep pace with changes to the code (such as the removal of haloes from external files). The current set needs to be cleaned of unused data and chunked and compressed with setting consistent with exascale ambitions. A definitive set then needs to be hosted in a publically available location with a fixed DOI. A lighter version (for example, reduced time-levels in forcing data) may also be appropriate for future containerisation or cloud-deployment of testing services. 24 24 25 '''Stage 1''': Collect all input file sets for SETTE 4.2_RC (current trunk). Confirm successful SETTE results and store reference set of SETTE results. Remove redundant files and variables from contents. Make sure sensible chunking and compression choices have been made in all cases. Produce clean 'r4.2_RC_FULL' set. See details [#Stage1details here] 26 27 '''Stage 2''': Reduce data volumes by selecting only sufficient forcing data for twice the period of each standard test. Run SETTE with reduced set and confirm unchanged results. Create recommended 'r4.2_RC' set. See details [#Stage2details here] 28 29 '''Stage3''': Reduce data volumes further by limiting number of significant digits in all fields other than domain and coordinate variables. Confirm SETTE tests are still successful (results WILL be different). Create optional 'r4.2_RC_LITE' set. 30 31 '''Stage 4''': Find a hosting and distribution option 32 33 '''Stage 5''': Document and archive scripts so that the process can be repeated if(when) changes are made to the FULL set. 25 34 26 35 ''...'' 27 36 28 === Implementation 29 30 '''Stage 1''': Collect all input file sets for SETTE 4.2_RC (current trunk). Confirm successful SETTE results and store reference set of SETTE results. Remove redundant files and variables from contents. Make sure sensible chunking and compression choices have been made in all cases. Produce clean 'r4.2_RC_FULL' set. See details [#Stage1details here] 31 32 '''Stage 2''': Reduce data volumes by selecting only sufficient forcing data for twice the period of each standard test. Run SETTE with reduced set and confirm unchanged results. Create recommended 'r4.2_RC' set. See details [#Stage2details here] 33 34 '''Stage3''': Reduce data volumes further by limiting number of significant digits in all fields other than domain and coordinate variables. Confirm SETTE tests are still successful (results WILL be different). Create optional 'r4.2_RC_LITE' set. 35 36 '''Stage 4''': Find a hosting and distribution option 37 38 '''Stage 5''': Document and archive scripts so that the process can be repeated if(when) changes are made to the FULL set. 39 40 41 == Stage 1 details 37 == Implementation 38 39 40 41 42 === Stage 1 details 42 43 43 44 Stage 1 involves gathering all SETTE input files and systematically checking contents for redundancy and opportunities for compression. Where chunking and compression have already been applied, it is also important to check the current settings for validity. Here is a typical example from the {{{AGRIF_DEMO_v4.x.tar}}} set: … … 84 85 where {{{nccnkrpt}}} is my bash function defined as: 85 86 {{{ 86 function nccnkrpt { ncks --cdl -m ${1} | grep '=' ; ncdump -s -h ${1} | grep -e ") ;" -e _ChunkSizes | sed -e 's/.*:/\t/' | sed -e ':x /) ;$/ { N; s/;\n//g ; bx }'; }87 function nccnkrpt { if [ `ncdump -k ${1} | awk '{print $1}'` == "netCDF-4" ] ; then ncks --cdl -m ${1} | grep '=' | grep -v ":" ; ncdump -s -h ${1} | grep -e ") ;" -e _ChunkSizes | sed -e 's/.*:/\t/' | sed -e ':x /) ;$/ { N; s/;\n//g ; bx }'; fi ; } 87 88 }}} 88 89 which helps to reduce the verbosity of the ncdump -s -h output into a more digestible form. In this example the dataset has already been chunked and compressed but the chunksizes are an odd choice. Having chunk sizes which span the entire dataset will restrict future scalability since any access to these data will require reading and uncompressing these large chunks irrespective of the size of the calling domain. A chunksize greater than 1 for the t dimension is also wasteful and confusing given that the file only contains a single time-level. … … 213 214 rm WED025_r4.2_RC_FULL/coordinates_WED025.nc 214 215 }}} 215 The full set of commands used to create the r4.2_RC_FULL set is (in additional to thise already shown): 216 And a few selected oddities in the original chunk settings that have been corrected in this tidy up: 217 {{{ 218 nccnkrpt ICE_AGRIF_v4.x/initice.nc 219 time_counter = UNLIMITED ; // (1 currently) 220 x = 97 ; 221 y = 97 ; | 222 float nav_lon(y, x) _ChunkSizes = 97, 97 ; | 223 float nav_lat(y, x) _ChunkSizes = 97, 97 ; V 224 float time_counter(time_counter) _ChunkSizes = 1048576 ; 225 float ati(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; 226 float hti(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; 227 float hts(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; 228 float smi(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; 229 float tmi(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; 230 float tsu(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; 231 232 nccnkrpt ORCA2_ICE_v4.x/sss_data.nc 233 time_counter = UNLIMITED ; // (12 currently) 234 x = 180 ; 235 y = 148 ; 236 float nav_lat(y, x) _ChunkSizes = 148, 180 ; | 237 float nav_lon(y, x) _ChunkSizes = 148, 180 ; V 238 float time_counter(time_counter) _ChunkSizes = 1024 ; 239 float sss(time_counter, y, x) _ChunkSizes = 21, 148, 180 ; 240 ^ 241 | 242 243 nccnkrpt ORCA2_OFF_v4.x/dyna_grid_T.nc 244 deptht = 31 ; 245 time_counter = UNLIMITED ; // (73 currently) 246 x = 180 ; 247 y = 148 ; 248 float nav_lon(y, x) _ChunkSizes = 148, 180 ; 249 float nav_lat(y, x) _ChunkSizes = 148, 180 ; 250 float deptht(deptht) _ChunkSizes = 31 ; 251 float time_counter(time_counter) _ChunkSizes = 1024 ; 252 float votemper(time_counter, deptht, y, x) _ChunkSizes = 1, 31, 148, 180 ; 253 float vosaline(time_counter, deptht, y, x) _ChunkSizes = 1, 31, 148, 180 ; 254 float sosstsst(time_counter, y, x) _ChunkSizes = 54, 126, 154 ; 255 float sosaline(time_counter, y, x) _ChunkSizes = 54, 126, 154 ; 256 float sossheig(time_counter, y, x) _ChunkSizes = 54, 126, 154 ; 257 float iowaflup(time_counter, y, x) _ChunkSizes = 54, 126, 154 258 ?? 259 }}} 260 The full set of commands used to create the r4.2_RC_FULL set is (in additional to those already shown): 216 261 {{{ 217 262 ########## AMMcmds ########## … … 467 512 }}} 468 513 469 == Stage 2 details 514 The files created by all these commands have been fully SETTE tested with trunk revision 14595 and produce identical results to the same tests performed with the original files. 515 516 === Stage 2 details 470 517 === Documentation updates 471 518