Version 8 (modified by acc, 3 years ago) (diff) |
---|
VLD-05_Coward_SETTE_inputs
Last edition: Wikinfo(changed_ts)? by Wikinfo(changed_by)?
The PI is responsible to closely follow the progress of the action, and especially to contact NEMO project manager if the delay on preview (or review) are longer than the 2 weeks expected.
Summary
Action | VLD-05_Coward_SETTE_inputs |
---|---|
PI(S) | Andrew Coward |
Digest | Rationalisation of SETTE inputs |
Dependencies | If any |
Branch | N/A |
Previewer(s) | Names |
Reviewer(s) | Names |
Ticket | #2637 |
Description
Collection and rationalisation of SETTE inputs. The set of input files for the full suite of SETTE tests has evolved rapidly to keep pace with changes to the code (such as the removal of haloes from external files). The current set needs to be cleaned of unused data and chunked and compressed with setting consistent with exascale ambitions. A definitive set then needs to be hosted in a publically available location with a fixed DOI. A lighter version (for example, reduced time-levels in forcing data) may also be appropriate for future containerisation or cloud-deployment of testing services.
Stage 1: Collect all input file sets for SETTE 4.2_RC (current trunk). Confirm successful SETTE results and store reference set of SETTE results. Remove redundant files and variables from contents. Make sure sensible chunking and compression choices have been made in all cases. Produce clean 'r4.2_RC_FULL' set. See details here
Stage 2: Reduce data volumes by selecting only sufficient forcing data for twice the period of each standard test. Run SETTE with reduced set and confirm unchanged results. Create recommended 'r4.2_RC' set. See details here
Stage3: Reduce data volumes further by limiting number of significant digits in all fields other than domain and coordinate variables. Confirm SETTE tests are still successful (results WILL be different). Create optional 'r4.2_RC_LITE' set.
Stage 4: Find a hosting and distribution option
Stage 5: Document and archive scripts so that the process can be repeated if(when) changes are made to the FULL set.
...
Implementation
Stage 1 details
Stage 1 involves gathering all SETTE input files and systematically checking contents for redundancy and opportunities for compression. Where chunking and compression have already been applied, it is also important to check the current settings for validity. Here is a typical example from the AGRIF_DEMO_v4.x.tar set:
nccnkrpt ORCA_R2_zps_domcfg_agrif.nc t = UNLIMITED ; // (1 currently) x = 180 ; y = 148 ; z = 31 ; double time_counter(t) _ChunkSizes = 512 ; double glamt(t, y, x) _ChunkSizes = 4, 148, 180 ; double glamu(t, y, x) _ChunkSizes = 4, 148, 180 ; double glamv(t, y, x) _ChunkSizes = 4, 148, 180 ; double glamf(t, y, x) _ChunkSizes = 4, 148, 180 ; double gphit(t, y, x) _ChunkSizes = 4, 148, 180 ; double gphiu(t, y, x) _ChunkSizes = 4, 148, 180 ; double gphiv(t, y, x) _ChunkSizes = 4, 148, 180 ; double gphif(t, y, x) _ChunkSizes = 4, 148, 180 ; double e1t(t, y, x) _ChunkSizes = 4, 148, 180 ; double e1u(t, y, x) _ChunkSizes = 4, 148, 180 ; double e1v(t, y, x) _ChunkSizes = 4, 148, 180 ; double e1f(t, y, x) _ChunkSizes = 4, 148, 180 ; double e2t(t, y, x) _ChunkSizes = 4, 148, 180 ; double e2u(t, y, x) _ChunkSizes = 4, 148, 180 ; double e2v(t, y, x) _ChunkSizes = 4, 148, 180 ; double e2f(t, y, x) _ChunkSizes = 4, 148, 180 ; double ff_f(t, y, x) _ChunkSizes = 4, 148, 180 ; double ff_t(t, y, x) _ChunkSizes = 4, 148, 180 ; double e3t_1d(t, z) _ChunkSizes = 1, 31 ; double e3w_1d(t, z) _ChunkSizes = 1, 31 ; double e3t_0(t, z, y, x) _ChunkSizes = 1, 31, 148, 180 ; double e3u_0(t, z, y, x) _ChunkSizes = 1, 31, 148, 180 ; double e3v_0(t, z, y, x) _ChunkSizes = 1, 31, 148, 180 ; double e3f_0(t, z, y, x) _ChunkSizes = 1, 31, 148, 180 ; double e3w_0(t, z, y, x) _ChunkSizes = 1, 31, 148, 180 ; double e3uw_0(t, z, y, x) _ChunkSizes = 1, 31, 148, 180 ; double e3vw_0(t, z, y, x) _ChunkSizes = 1, 31, 148, 180 ; int bottom_level(t, y, x) _ChunkSizes = 6, 148, 180 ; int top_level(t, y, x) _ChunkSizes = 6, 148, 180 ; float bathy_metry(t, y, x) _ChunkSizes = 6, 148, 180 ;
where nccnkrpt is my bash function defined as:
function nccnkrpt { if [ `ncdump -k ${1} | awk '{print $1}'` == "netCDF-4" ] ; then ncks --cdl -m ${1} | grep '=' | grep -v ":" ; ncdump -s -h ${1} | grep -e ") ;" -e _ChunkSizes | sed -e 's/.*:/\t/' | sed -e ':x /) ;$/ { N; s/;\n//g ; bx }'; fi ; }
which helps to reduce the verbosity of the ncdump -s -h output into a more digestible form. In this example the dataset has already been chunked and compressed but the chunksizes are an odd choice. Having chunk sizes which span the entire dataset will restrict future scalability since any access to these data will require reading and uncompressing these large chunks irrespective of the size of the calling domain. A chunksize greater than 1 for the t dimension is also wasteful and confusing given that the file only contains a single time-level.
The 'right' choice for chunksizes is somewhat arbitrary but given our exascale ambitions of efficient performance with processor domains of O(10x10) in size, a target chunk-size around 64x64 would seem a reasonable compromise. Chunk-sizes which are too small will compromise compressibility and require more chunk meta-data in the file. Sizes which are too large will affect scalability and cause unnecessary delays at start-up. For the ORCA2 domain a chunk-size of 60x50 is chosen since this also avoids any underpopulated chunks. For the vertical dimension, I have used a chunk-size of 4 but there is probably little gain here since the volume data is always read as a whole and a full-depth chunk is equally appropriate. Breaking the vertical dimension into smaller chunks, however, may help other applications that use the domain configuration file and only wish to select specific levels.
All the other files in the original AGRIF_DEMO set are in classic NetCDF3 format and therefore unchunked and uncompressed. All these files have been converted to NetCDF4 with suitable chunk-size choices. The actual sizes vary slightly for each of the 3 AGRIF-level sets so as to avoid under-populated chunks.
ORCA_R2_zps_domcfg_agrif.nc | 1_ORCA_R2_zps_domcfg_agrif.nc | 2_ORCA_R05_zps_domcfg_agrif.nc | 3_ORCA_R017_zps_domcfg_agrif.nc |
t = UNLIMITED ( 1 ) | t = UNLIMITED ( 1 ) | t = UNLIMITED ( 1 ) | t = UNLIMITED ( 1 ) |
x = 180 | x = 48 | x = 132 | x = 134 |
y = 148 | y = 50 | y = 140 | y = 128 |
z = 31 | z = 31 | z = 31 | z = 31 |
The complete set of ncks commands used to create the r4.2_RC_FULL set for AGRIF_DEMO_r4.2_RC_FULL from a copy of AGRIF_DEMO_v4.x is:
# AGRIFcmds #### Mother grid #### # ncks --no_abc --cnk_plc='xpl' --cnk_dmn t,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn z,4 ORCA_R2_zps_domcfg_agrif.nc new_ORCA_R2_zps_domcfg_agrif.nc mv new_ORCA_R2_zps_domcfg_agrif.nc ORCA_R2_zps_domcfg_agrif.nc # #### Nest level 1 #### # for f in 1_chlorophyll.nc 1_geothermal_heating.nc 1_runoff_core_monthly.nc 1_sss_data.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,48 --cnk_dmn y,50 $f new_$f done # for f in 1_data_1m_potential_temperature_nomask.nc 1_data_1m_salinity_nomask.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,48 --cnk_dmn y,50 --cnk_dmn z,4 $f new_$f done # for f in 1_eddy_viscosity_3D.nc 1_ORCA_R2_zps_domcfg_agrif.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn t,1 --cnk_dmn x,48 --cnk_dmn y,50 --cnk_dmn z,4 $f new_$f done # for f in 1_resto.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn x,48 --cnk_dmn y,50 --cnk_dmn z,4 $f new_$f done # for f in 1_weights_core_orca2_bicubic_noc.nc 1_weights_core_orca2_bilinear_noc.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn lon,48 --cnk_dmn lat,50 $f new_$f done # #### Nest level 2 #### # for f in 2_chlorophyll.nc 2_geothermal_heating.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,66 --cnk_dmn y,70 $f new_$f done # for f in 2_data_1m_potential_temperature_nomask.nc 2_data_1m_salinity_nomask.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,66 --cnk_dmn y,70 --cnk_dmn z,4 $f new_$f done # for f in 2_ORCA_R05_zps_domcfg_agrif.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn t,1 --cnk_dmn x,66 --cnk_dmn y,70 --cnk_dmn z,4 $f new_$f done # for f in 2_weights_core2_nordic1_bicub.nc 2_weights_core2_nordic1_bilin.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn lon,66 --cnk_dmn lat,70 $f new_$f done # #### Nest level 3 #### # for f in 3_chlorophyll.nc 3_geothermal_heating.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,67 --cnk_dmn y,64 $f new_$f done # for f in 3_data_1m_potential_temperature_nomask.nc 3_data_1m_salinity_nomask.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,67 --cnk_dmn y,64 --cnk_dmn z,4 $f new_$f done # for f in 3_ORCA_R017_zps_domcfg_agrif.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn t,1 --cnk_dmn x,67 --cnk_dmn y,64 --cnk_dmn z,4 $f new_$f done # for f in 3_weights_core2_nordic2_bicub.nc 3_weights_core2_nordic2_bilin.nc do ncks --no_abc --4 --dfl_lvl 3 --cnk_plc='xpl' --cnk_dmn lon,67 --cnk_dmn lat,64 $f new_$f done # for f in new*; do ff=${f/new_}; mv $f $ff; done #
The process is similar for the other configurations based on the following original sets:
mkdir -p ../r4.2_RC_FULL cp -pr AGRIF_DEMO_v4.x ../r4.2_RC_FULL/AGRIF_DEMO_v4.2_RC_FULL cp -pr AMM12_v4.0 ../r4.2_RC_FULL/AMM12_v4.2_RC_FULL cp -pr ICE_AGRIF_v4.x ../r4.2_RC_FULL/ICE_AGRIF_v4.2_RC_FULL cp -pr ISOMIP+_v4.0 ../r4.2_RC_FULL/ISOMIP+_v4.2_RC_FULL cp -pr ORCA2_ICE_v4.x ../r4.2_RC_FULL/ORCA2_ICE_v4.2_RC_FULL cp -pr ORCA2_OFF_v4.x ../r4.2_RC_FULL/ORCA2_OFF_v4.2_RC_FULL cp -pr SAS_v4.x ../r4.2_RC_FULL/SAS_v4.2_RC_FULL cp -pr WED025_v4.2 ../r4.2_RC_FULL/WED025_v4.2_RC_FULL
It is worth noting a few redundant files that can simply be removed:
# # Remove missing (and erroneously placed) links # rm AMM12_r4.2_RC_FULL/bdydta/bdydta rm AMM12_r4.2_RC_FULL/fluxes/fluxes # # Remove old, unused versions # rm ORCA2_ICE_r4.2_RC_FULL/weights_core2_orca2_bicub.nc.old rm ORCA2_ICE_r4.2_RC_FULL/weights_core2_orca2_bilin.nc.old # # Remove old, unused versions superceded by domain configuration files # rm WED025_r4.2_RC_FULL/bathy_meter_WED025.nc rm WED025_r4.2_RC_FULL/coordinates_WED025.nc
And a few selected oddities in the original chunk settings that have been corrected in this tidy up:
nccnkrpt ICE_AGRIF_v4.x/initice.nc time_counter = UNLIMITED ; // (1 currently) x = 97 ; y = 97 ; | float nav_lon(y, x) _ChunkSizes = 97, 97 ; | float nav_lat(y, x) _ChunkSizes = 97, 97 ; V float time_counter(time_counter) _ChunkSizes = 1048576 ; float ati(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; float hti(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; float hts(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; float smi(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; float tmi(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; float tsu(time_counter, y, x) _ChunkSizes = 1, 97, 97 ; nccnkrpt ORCA2_ICE_v4.x/sss_data.nc time_counter = UNLIMITED ; // (12 currently) x = 180 ; y = 148 ; float nav_lat(y, x) _ChunkSizes = 148, 180 ; | float nav_lon(y, x) _ChunkSizes = 148, 180 ; V float time_counter(time_counter) _ChunkSizes = 1024 ; float sss(time_counter, y, x) _ChunkSizes = 21, 148, 180 ; ^ | nccnkrpt ORCA2_OFF_v4.x/dyna_grid_T.nc deptht = 31 ; time_counter = UNLIMITED ; // (73 currently) x = 180 ; y = 148 ; float nav_lon(y, x) _ChunkSizes = 148, 180 ; float nav_lat(y, x) _ChunkSizes = 148, 180 ; float deptht(deptht) _ChunkSizes = 31 ; float time_counter(time_counter) _ChunkSizes = 1024 ; float votemper(time_counter, deptht, y, x) _ChunkSizes = 1, 31, 148, 180 ; float vosaline(time_counter, deptht, y, x) _ChunkSizes = 1, 31, 148, 180 ; float sosstsst(time_counter, y, x) _ChunkSizes = 54, 126, 154 ; float sosaline(time_counter, y, x) _ChunkSizes = 54, 126, 154 ; float sossheig(time_counter, y, x) _ChunkSizes = 54, 126, 154 ; float iowaflup(time_counter, y, x) _ChunkSizes = 54, 126, 154 ??
The full set of commands used to create the r4.2_RC_FULL set is (in additional to those already shown):
########## AMMcmds ########## cd AMM12_v4.2_RC_FULL ncks --no_abc --cnk_plc='xpl' --cnk_dmn x,32 --cnk_dmn y,32 --cnk_dmn z,6 --cnk_dmn t,1 --4 --dfl_lvl 3 amm12_restart_oce.nc new_amm12_restart_oce.nc mv new_amm12_restart_oce.nc amm12_restart_oce.nc # ncks --4 --no_abc --cnk_dmn=x,32 --cnk_dmn=y,32 --cnk_dmn=z,6 --cnk_dmn=t,1 --dfl_lvl 3 --cnk_plc='xpl' AMM_R12_sco_domcfg.nc new_AMM_R12_sco_domcfg.nc mv new_AMM_R12_sco_domcfg.nc AMM_R12_sco_domcfg.nc # ncks --no_abc --cnk_plc='xpl' --cnk_dmn x,32 --cnk_dmn y,32 --cnk_dmn time_counter,1 --4 --dfl_lvl 3 amm12_rivers.nc new_amm12_rivers.nc mv new_amm12_rivers.nc amm12_rivers.nc # cd fluxes/ mkdir new for f in *.nc; do ncks --no_abc --cnk_plc='xpl' --cnk_dmn x,32 --cnk_dmn y,32 --cnk_dmn t,1 --4 --dfl_lvl 3 $f new/$f; done cd new mv amm* ../ cd ../ rmdir new cd ../ # cd bdydta mkdir new for f in amm12_bdyT_tra*; do ncks --no_abc --cnk_plc='xpl' --cnk_dmn x,64 --cnk_dmn y,1 --cnk_dmn=deptht,51 --cnk_dmn time_counter,1 --4 --dfl_lvl 3 $f new/$f; done cd new mv amm* ../ cd ../ rmdir new cd ../ # cd ../
########## ICEcmds ########## cd ICE_AGRIF_v4.2_RC_FULL ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,49 --cnk_dmn y,49 initice.nc new_initice.nc mv new_initice.nc initice.nc cd ../
########## ISOcmds ########## cd ISOMIP+_v4.2_RC_FULL ncks --no_abc --cnk_dmn x,52 --cnk_dmn y,42 --cnk_dmn nav_lev,1 --cnk_dmn time_counter,1 domain_cfg.nc new_domain_cfg.nc ncks --no_abc --cnk_dmn x,52 --cnk_dmn y,21 --cnk_dmn t,1 isomip+_NEMO_242_geom_ocean3.nc new_isomip+_NEMO_242_geom_ocean3.nc ncks --no_abc --cnk_dmn x,52 --cnk_dmn y,21 --cnk_dmn t,1 isomip+_NEMO_242_geom_ocean4.nc new_isomip+_NEMO_242_geom_ocean4.nc ncks --no_abc --cnk_dmn x,52 --cnk_dmn y,21 --cnk_dmn z,10 nemo_base_COLD.nc new_nemo_base_COLD.nc ncks --no_abc --cnk_dmn x,52 --cnk_dmn y,21 --cnk_dmn z,10 nemo_base_WARM.nc new_nemo_base_WARM.nc ncks --no_abc --cnk_dmn x,52 --cnk_dmn y,21 --cnk_dmn z,10 resto.nc new_resto.nc mv new_domain_cfg.nc domain_cfg.nc mv new_isomip+_NEMO_242_geom_ocean3.nc isomip+_NEMO_242_geom_ocean3.nc mv new_isomip+_NEMO_242_geom_ocean4.nc isomip+_NEMO_242_geom_ocean4.nc mv new_nemo_base_COLD.nc nemo_base_COLD.nc mv new_nemo_base_WARM.nc nemo_base_WARM.nc mv new_resto.nc resto.nc cd ../
########## ORCA2_ICEcmds ########## cd ORCA2_ICE_v4.2_RC_FULL ncks --no_abc --cnk_plc='xpl' --cnk_dmn lon,30 --cnk_dmn lat,30 weights_core2_orca2_bicub.nc new_weights_core2_orca2_bicub.nc ncks --no_abc --cnk_plc='xpl' --cnk_dmn lon,30 --cnk_dmn lat,30 weights_core2_orca2_bilin.nc new_weights_core2_orca2_bilin.nc mv new_weights_core2_orca2_bilin.nc weights_core2_orca2_bilin.nc mv new_weights_core2_orca2_bicub.nc weights_core2_orca2_bicub.nc # # ncar_precip.15JUNE2009_fill.nc ncar_rad.15JUNE2009_fill.nc slp.15JUNE2009_fill.nc # LAT LON TIME # q_10.15JUNE2009_fill.nc t_10.15JUNE2009_fill.nc u_10.15JUNE2009_fill.nc v_10.15JUNE2009_fill.nc # lat lon time # # Note CORE forcing datasets contain some redundant variables; only keep those actually used # ncks --4 --dfl_lvl 1 --no_abc -v T_10_MOD --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 t_10.15JUNE2009_fill.nc new_t_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v U_10_MOD --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 u_10.15JUNE2009_fill.nc new_u_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v V_10_MOD --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 v_10.15JUNE2009_fill.nc new_v_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v Q_10_MOD --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 q_10.15JUNE2009_fill.nc new_q_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v LWDN_MOD,SWDN_MOD --cnk_plc='xpl' --cnk_dmn LON,32 --cnk_dmn LAT,32 --cnk_dmn time,1 ncar_rad.15JUNE2009_fill.nc new_ncar_rad.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v PRC_MOD1,SNOW --cnk_plc='xpl' --cnk_dmn LON,32 --cnk_dmn LAT,32 --cnk_dmn time,1 ncar_precip.15JUNE2009_fill.nc new_ncar_precip.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v SLP --cnk_plc='xpl' --cnk_dmn LON,32 --cnk_dmn LAT,32 --cnk_dmn TIME,1 slp.15JUNE2009_fill.nc new_slp.15JUNE2009_fill.nc # for f in new*; do ff=${f/new_}; mv $f $ff; done # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 sdw_ecwaves_orca2.nc new_sdw_ecwaves_orca2.nc mv new_sdw_ecwaves_orca2.nc sdw_ecwaves_orca2.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn z,4 --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 data_1m_potential_temperature_nomask.nc new_data_1m_potential_temperature_nomask.nc mv new_data_1m_potential_temperature_nomask.nc data_1m_potential_temperature_nomask.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn z,4 --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 data_1m_salinity_nomask.nc new_data_1m_salinity_nomask.nc mv new_data_1m_salinity_nomask.nc data_1m_salinity_nomask.nc # ncks --no_abc --cnk_plc='xpl' --cnk_dmn t,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn z,4 ORCA_R2_zps_domcfg.nc new_ORCA_R2_zps_domcfg.nc mv new_ORCA_R2_zps_domcfg.nc ORCA_R2_zps_domcfg.nc # ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 int_wave_mix.nc new_int_wave_mix.nc mv new_int_wave_mix.nc int_wave_mix.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn z,4 --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 sali_ref_clim_monthly.nc new_sali_ref_clim_monthly.nc mv new_sali_ref_clim_monthly.nc sali_ref_clim_monthly.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 runoff_core_monthly.nc new_runoff_core_monthly.nc mv new_runoff_core_monthly.nc runoff_core_monthly.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 sst_data.nc new_sst_data.nc ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 sss_data.nc new_sss_data.nc mv new_sst_data.nc sst_data.nc mv new_sss_data.nc sss_data.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn X1,30 --cnk_dmn Y1,30 --cnk_dmn time_counter,1 subbasins.nc new_subbasins.nc mv new_subbasins.nc subbasins.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 slaReferenceLevel.nc new_slaReferenceLevel.nc mv new_slaReferenceLevel.nc slaReferenceLevel.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn z,4 resto.nc new_resto.nc mv new_resto.nc resto.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn d01,30 --cnk_dmn d02,30 mask_itf.nc new_mask_itf.nc mv new_mask_itf.nc mask_itf.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn z,4 --cnk_dmn t,1 eddy_viscosity_2D.nc new_eddy_viscosity_2D.nc ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn z,4 --cnk_dmn t,1 eddy_viscosity_3D.nc new_eddy_viscosity_3D.nc mv new_eddy_viscosity_2D.nc eddy_viscosity_2D.nc mv new_eddy_viscosity_3D.nc eddy_viscosity_3D.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn t,1 geothermal_heating.nc new_geothermal_heating.nc mv new_geothermal_heating.nc geothermal_heating.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 K1rowdrg.nc new_K1rowdrg.nc ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 M2rowdrg.nc new_M2rowdrg.nc mv new_K1rowdrg.nc K1rowdrg.nc mv new_M2rowdrg.nc M2rowdrg.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 calving.nc new_calving.nc mv new_calving.nc calving.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 chlorophyll.nc new_chlorophyll.nc mv new_chlorophyll.nc chlorophyll.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 --cnk_dmn z,4 assim_background_increments.nc new_assim_background_increments.nc mv new_assim_background_increments.nc assim_background_increments.nc # ncks --4 --no_abc --cnk_plc='xpl' --cnk_dmn x,30 --cnk_dmn y,30 ahmcoef.nc new_ahmcoef.nc mv new_ahmcoef.nc ahmcoef.nc # cd ../
########## ORCA2_OFFcmds ########## cd ORCA2_OFF_v4.2_RC_FULL ncks --no_abc --cnk_plc='xpl' --cnk_dmn t,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn z,4 ORCA_R2_zps_domcfg.nc new_ORCA_R2_zps_domcfg.nc mv new_ORCA_R2_zps_domcfg.nc ORCA_R2_zps_domcfg.nc # ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn deptht,4 dyna_grid_T.nc new_dyna_grid_T.nc # ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn depthu,4 dyna_grid_U.nc new_dyna_grid_U.nc # ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn depthv,4 dyna_grid_V.nc new_dyna_grid_V.nc # ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn depthw,4 dyna_grid_W.nc new_dyna_grid_W.nc # mv new_dyna_grid_T.nc dyna_grid_T.nc mv new_dyna_grid_U.nc dyna_grid_U.nc mv new_dyna_grid_V.nc dyna_grid_V.nc mv new_dyna_grid_W.nc dyna_grid_W.nc # cd ../
########## SAScmds ########## cd SAS_v4.2_RC_FULL ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn deptht,4 sas_grid_T.nc new_sas_grid_T.nc ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn depthu,4 sas_grid_U.nc new_sas_grid_U.nc ncks --no_abc --cnk_plc='xpl' --cnk_dmn time_counter,1 --cnk_dmn x,60 --cnk_dmn y,50 --cnk_dmn depthv,4 sas_grid_V.nc new_sas_grid_V.nc mv new_sas_grid_T.nc sas_grid_T.nc mv new_sas_grid_U.nc sas_grid_U.nc mv new_sas_grid_V.nc sas_grid_V.nc cd ../
########## WEDcmds ########## cd WED025_v4.2_RC_FULL # u10 v10 t10 precip rsds q10 slp rlds snow # WED025 starts in middle of January; need 20 days from then = 35 days = 280 records @ 3hourly for v in u10 v10 t10 precip rsds q10 slp rlds snow do ncks --cnk_dmn longitude,64 --cnk_dmn latitude,64 ${v}_JRA_y2000.nc 20d_${v}_JRA_y2000.nc done # for v in u10 v10 t10 precip rsds q10 slp rlds snow do mv 20d_${v}_JRA_y2000.nc ${v}_JRA_y2000.nc done # ncks --cnk_dmn xbT,50 WED025_bdyT_tra_y1999.nc new_WED025_bdyT_tra_y1999.nc ncks --cnk_dmn xbT,50 WED025_bdyT_tra_y2000.nc new_WED025_bdyT_tra_y2000.nc ncks --cnk_dmn xbU,50 WED025_bdyU_u3d_y1999.nc new_WED025_bdyU_u3d_y1999.nc ncks --cnk_dmn xbU,50 WED025_bdyU_u3d_y2000.nc new_WED025_bdyU_u3d_y2000.nc ncks --cnk_dmn xbV,44 WED025_bdyV_u3d_y1999.nc new_WED025_bdyV_u3d_y1999.nc ncks --cnk_dmn xbV,44 WED025_bdyV_u3d_y2000.nc new_WED025_bdyV_u3d_y2000.nc mv new_WED025_bdyT_tra_y1999.nc WED025_bdyT_tra_y1999.nc mv new_WED025_bdyT_tra_y2000.nc WED025_bdyT_tra_y2000.nc mv new_WED025_bdyU_u3d_y1999.nc WED025_bdyU_u3d_y1999.nc mv new_WED025_bdyU_u3d_y2000.nc WED025_bdyU_u3d_y2000.nc mv new_WED025_bdyV_u3d_y1999.nc WED025_bdyV_u3d_y1999.nc mv new_WED025_bdyV_u3d_y2000.nc WED025_bdyV_u3d_y2000.nc ncks --cnk_dmn time_counter,1 WED025_bdyT_tra_y1999.nc new_WED025_bdyT_tra_y1999.nc ncks --cnk_dmn time_counter,1 WED025_bdyT_tra_y2000.nc new_WED025_bdyT_tra_y2000.nc ncks --cnk_dmn time_counter,1 WED025_bdyU_u3d_y1999.nc new_WED025_bdyU_u3d_y1999.nc ncks --cnk_dmn time_counter,1 WED025_bdyU_u3d_y2000.nc new_WED025_bdyU_u3d_y2000.nc ncks --cnk_dmn time_counter,1 WED025_bdyV_u3d_y1999.nc new_WED025_bdyV_u3d_y1999.nc ncks --cnk_dmn time_counter,1 WED025_bdyV_u3d_y2000.nc new_WED025_bdyV_u3d_y2000.nc mv new_WED025_bdyT_tra_y1999.nc WED025_bdyT_tra_y1999.nc mv new_WED025_bdyT_tra_y2000.nc WED025_bdyT_tra_y2000.nc mv new_WED025_bdyU_u3d_y1999.nc WED025_bdyU_u3d_y1999.nc mv new_WED025_bdyU_u3d_y2000.nc WED025_bdyU_u3d_y2000.nc mv new_WED025_bdyV_u3d_y1999.nc WED025_bdyV_u3d_y1999.nc mv new_WED025_bdyV_u3d_y2000.nc WED025_bdyV_u3d_y2000.nc # ncks --no_abc --cnk_dmn x,64 --cnk_dmn y,64 domain_cfg.nc new_domain_cfg.nc mv new_domain_cfg.nc domain_cfg.nc # ncks --no_abc --cnk_dmn lon,64 --cnk_dmn lat,64 weights_bilin_JRA.nc new_weights_bilin_JRA.nc ncks --no_abc --cnk_dmn lon,64 --cnk_dmn lat,64 weights_bicubic_JRA.nc new_weights_bicubic_JRA.nc mv new_weights_bilin_JRA.nc weights_bilin_JRA.nc mv new_weights_bicubic_JRA.nc weights_bicubic_JRA.nc # ncks --no_abc --cnk_dmn x,64 --cnk_dmn y,64 --cnk_dmn time_counter,1 WED025_icb_y1999.nc new_WED025_icb_y1999.nc ncks --no_abc --cnk_dmn x,64 --cnk_dmn y,64 --cnk_dmn time_counter,1 WED025_icb_y2000.nc new_WED025_icb_y2000.nc mv new_WED025_icb_y1999.nc WED025_icb_y1999.nc mv new_WED025_icb_y2000.nc WED025_icb_y2000.nc # ncks --no_abc --cnk_dmn x,64 --cnk_dmn y,64 --cnk_dmn time_counter,1 --cnk_dmn z,15 WED025_init_JRA_200001.nc new_WED025_init_JRA_200001.nc mv new_WED025_init_JRA_200001.nc WED025_init_JRA_200001.nc # ncks --no_abc --cnk_dmn x,64 --cnk_dmn y,64 --cnk_dmn time_counter,1 chlorophyll_WED025.nc new_chlorophyll_WED025.nc mv new_chlorophyll_WED025.nc chlorophyll_WED025.nc # ncks --no_abc --cnk_dmn x,64 --cnk_dmn y,64 --cnk_dmn time_counter,1 isfmlt_par.nc new_isfmlt_par.nc mv new_isfmlt_par.nc isfmlt_par.nc # ncks --no_abc --cnk_dmn xbt,50 --cnk_dmn xbu,50 --cnk_dmn xbv,44 coordinates_bdy_WED025.nc new_coordinates_bdy_WED025.nc mv new_coordinates_bdy_WED025.nc coordinates_bdy_WED025.nc # cd ../ #
The files created by all these commands have been fully SETTE tested with trunk revision 14595 and produce identical results to the same tests performed with the original files.
Stage 2 details
The second stage is to take the sets produced in stage 1 and to reduce the time period of any forcing data to more closely match that required for the standard SETTE testing. To permit some additional testing for those cases where longer tests may be required a period of twice the standard test length has been chosen. The standard test lengths are:
GYRE_PISCES | 90 days |
ORCA2_ICE_PISCES | 62 days |
ORCA2_OFF_PISCES | 95 days |
AMM12 | 4 days |
SAS | 16 days |
ORCA2_ICE_OBS | 5 days |
AGRIF | 5d and 9h |
WED025 | 10 days |
where the maximum length returned by grepping the setting of ITEND from sette_reference_configurations.sh has been taken. There are two cases where simply doubling this length and selecting a number of time records based on that number does not work though:
- ORCA2_OFF_PISCES which uses a climatological set of inputs and starts on 1st January => the last record is also needed at the start.
- WED025 which starts on 15th January => 35 days of records are required for a 20 day test.
More on these cases later but first the additional commands required to limit the time records are:
mkdir r4.2_RC cd r4.2_RC cp -pr ../r4.2_RC_FULL/AGRIF_DEMO_v4.2_RC_FULL AGRIF_DEMO_v4.2_RC cp -pr ../r4.2_RC_FULL/AMM12_v4.2_RC_FULL AMM12_v4.2_RC cp -pr ../r4.2_RC_FULL/ICE_AGRIF_v4.2_RC_FULL ICE_AGRIF_v4.2_RC cp -pr ../r4.2_RC_FULL/ISOMIP+_v4.2_RC_FULL ISOMIP+_v4.2_RC cp -pr ../r4.2_RC_FULL/ORCA2_ICE_v4.2_RC_FULL ORCA2_ICE_v4.2_RC cp -pr ../r4.2_RC_FULL/ORCA2_OFF_v4.2_RC_FULL ORCA2_OFF_v4.2_RC cp -pr ../r4.2_RC_FULL/SAS_v4.2_RC_FULL SAS_v4.2_RC cp -pr ../r4.2_RC_FULL/WED025_v4.2_RC_FULL WED025_v4.2_RC ########## ORCA2_ICEcmds ########## cd ORCA2_ICE_v4.2_RC # # # ncar_precip.15JUNE2009_fill.nc ncar_rad.15JUNE2009_fill.nc slp.15JUNE2009_fill.nc # LAT LON TIME # q_10.15JUNE2009_fill.nc t_10.15JUNE2009_fill.nc u_10.15JUNE2009_fill.nc v_10.15JUNE2009_fill.nc # lat lon time # Need 180 days @ 6 hourly = 720 records # ncks --4 --dfl_lvl 1 --no_abc -v T_10_MOD -d time,0,719 --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 t_10.15JUNE2009_fill.nc new_t_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v U_10_MOD -d time,0,719 --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 u_10.15JUNE2009_fill.nc new_u_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v V_10_MOD -d time,0,719 --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 v_10.15JUNE2009_fill.nc new_v_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v Q_10_MOD -d time,0,719 --cnk_plc='xpl' --cnk_dmn lon,32 --cnk_dmn lat,32 --cnk_dmn time,1 q_10.15JUNE2009_fill.nc new_q_10.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v LWDN_MOD,SWDN_MOD -d time,0,179 --cnk_plc='xpl' --cnk_dmn LON,32 --cnk_dmn LAT,32 --cnk_dmn time,1 ncar_rad.15JUNE2009_fill.nc new_ncar_rad.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v PRC_MOD1,SNOW --cnk_plc='xpl' --cnk_dmn LON,32 --cnk_dmn LAT,32 --cnk_dmn time,1 ncar_precip.15JUNE2009_fill.nc new_ncar_precip.15JUNE2009_fill.nc # ncks --4 --dfl_lvl 1 --no_abc -v SLP -d TIME,0,719 --cnk_plc='xpl' --cnk_dmn LON,32 --cnk_dmn LAT,32 --cnk_dmn TIME,1 slp.15JUNE2009_fill.nc new_slp.15JUNE2009_fill.nc # for f in new*; do ff=${f/new_}; mv $f $ff; done # ncks --4 --no_abc --cnk_plc='xpl' -d time_counter,0,719 --cnk_dmn x,30 --cnk_dmn y,30 --cnk_dmn time_counter,1 sdw_ecwaves_orca2.nc new_sdw_ecwaves_orca2.nc mv new_sdw_ecwaves_orca2.nc sdw_ecwaves_orca2.nc cd ../ # WEDcmds cd WED025_v4.2_RC # u10 v10 t10 precip rsds q10 slp rlds snow # WED025 starts in middle of January; need 20 days from then = 35 days = 280 records @ 3hourly for v in u10 v10 t10 precip rsds q10 slp rlds snow do ncks -O -d time,0,279 --cnk_dmn longitude,64 --cnk_dmn latitude,64 ${v}_JRA_y2000.nc 20d_${v}_JRA_y2000.nc done # for v in u10 v10 t10 precip rsds q10 slp rlds snow do mv 20d_${v}_JRA_y2000.nc ${v}_JRA_y2000.nc done
These changes reduce the overall volume from 5.5GB to 4.5GB but the two cases mentioned earlier are still large contributors to this volume. The next stage will reduce the volume further by reducing the number of significant digits used for most data. This will allow for much more efficient compression but the SETTE results will be changed as a result. There is, however, an intermediate option which is to reduce the precision of the data that has to be supplied in order to maintain contiguous time records but which are not actually used.
- In the ORCA2_OFF_PISCES case records 0-37 and 72 of each 5-day mean, annual dataset need to be kept at the original precision but all other records can be reduced to the minimum precision.
- For WED025, records 0-110 are not used and can, therefore be held at minimum precision.
Using ncks to achieve this is troublesome with compressed data. I appears to be reluctant to insert later records into compressed datasets. However: carrying out all operations on netCDF-3 format files; replacing records in reducing time-order and converting to netCDF-4 at the end, works. Thus, this rather tortuous combination achieves the desired result:
#!/bin/bash if [ 1 == 1 ] ; then cd ORCA2_OFF_v4.2_RC for var in T do echo ${var} # Create minimal precision netcdf3 version of entire dataset ncks -3 -O --no_abc --ppc default=1 dyna_grid_${var}.nc new2_dyna_grid_${var}.nc # Create netcdf3 version of original dataset ncks -3 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc # Overwrite records 0-71 with minimal precision ncks -3 --no_abc -A -d time_counter,0,71 new2_dyna_grid_${var}.nc new3_dyna_grid_${var}.nc # Overwrite records 0-37 with original precision ncks -3 --no_abc -A -d time_counter,0,37 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc # Create netcdf-4 final version with required chunking and compression nccopy -7 -c "time_counter/1,x/60,y/50,deptht/4" -d 4 new3_dyna_grid_${var}.nc new4_dyna_grid_${var}.nc done # Tidy up rm new[2,3]*.nc mv new4_dyna_grid_T.nc dyna_grid_T.nc for var in U do echo ${var} ncks -3 -O --no_abc --ppc default=1 dyna_grid_${var}.nc new2_dyna_grid_${var}.nc ncks -3 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc ncks -3 --no_abc -A -d time_counter,0,71 new2_dyna_grid_${var}.nc new3_dyna_grid_${var}.nc ncks -3 --no_abc -A -d time_counter,0,37 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc nccopy -7 -c "time_counter/1,x/60,y/50,depthu/4" -d 4 new3_dyna_grid_${var}.nc new4_dyna_grid_${var}.nc done rm new[2,3]*.nc mv new4_dyna_grid_U.nc dyna_grid_U.nc for var in V do echo ${var} ncks -3 -O --no_abc --ppc default=1 dyna_grid_${var}.nc new2_dyna_grid_${var}.nc ncks -3 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc ncks -3 --no_abc -A -d time_counter,0,71 new2_dyna_grid_${var}.nc new3_dyna_grid_${var}.nc ncks -3 --no_abc -A -d time_counter,0,37 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc nccopy -7 -c "time_counter/1,x/60,y/50,depthv/4" -d 4 new3_dyna_grid_${var}.nc new4_dyna_grid_${var}.nc done rm new[2,3]*.nc mv new4_dyna_grid_V.nc dyna_grid_V.nc for var in W do echo ${var} ncks -3 -O --no_abc --ppc default=1 dyna_grid_${var}.nc new2_dyna_grid_${var}.nc ncks -3 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc ncks -3 --no_abc -A -d time_counter,0,71 new2_dyna_grid_${var}.nc new3_dyna_grid_${var}.nc ncks -3 --no_abc -A -d time_counter,0,37 dyna_grid_${var}.nc new3_dyna_grid_${var}.nc nccopy -7 -c "time_counter/1,x/60,y/50,depthw/4" -d 4 new3_dyna_grid_${var}.nc new4_dyna_grid_${var}.nc done rm new[2,3]*.nc mv new4_dyna_grid_W.nc dyna_grid_W.nc cd ../ fi # if [ 1 == 1 ] ; then cd WED025_v4.2_RC # WEDcmds # u10 v10 t10 precip rsds q10 slp rlds snow # WED025 starts in middle of January; need 20 days from then = 35 days = 280 records @ 3hourly # But the first 13.875 days records (111 @ 3 hourly) can be stored at minimal precision for v in u10 v10 t10 precip rsds q10 slp rlds snow do ncks -3 -O --no_abc --ppc default=1 ${v}_JRA_y2000.nc new2_${v}_JRA_y2000.nc ncks -3 ${v}_JRA_y2000.nc new3_${v}_JRA_y2000.nc ncks -3 --no_abc -A -d time,0,110 new2_${v}_JRA_y2000.nc new3_${v}_JRA_y2000.nc nccopy -7 -c "time/1,longitude/64,latitude/64" -d 4 new3_${v}_JRA_y2000.nc new4_${v}_JRA_y2000.nc done rm new[2,3]*.nc # for v in u10 v10 t10 precip rsds q10 slp rlds snow do mv new4_${v}_JRA_y2000.nc ${v}_JRA_y2000.nc done # cd ../ fi
With this secondary manipulation the overall volumes reduces from 4.5 GB to 3.7GB and there is no change in SETTE results when compared with results using the r4.2_RC_FULL set
Documentation updates
...
Preview
...
Tests
...
Review
...