wiki:Documentation/UserGuide/DifferencesNetcdf

How to check whether two (netcdf) files are identical

Author: S.Luyssaert and A.S. Lansø

revised: 2020/02/28, P. Maugis
revised: 2023/04/07, S. Luyssaert

cdo diffv

If available (i.e. on obelix), you can use the following command:

cdo diffv path_file_1 path_file_2

The comparison only works if the files contain the same variables and in the same order. Otherwise the cdo command will return error=100. If one of the files contains more variables, the attached script differr100.sh by Josefine Ghattas can be used. This script will check if there are differences in the variable names and ask the user to remove variables, so the command "cdo diffv" can be applied. It first checks variables with type float and if no differences are found, it checks variables with type double. The script can therefore be used as well on diagnostic history files as restart files from ORCHIDEE. Use the script in the following way:

./diff100err.sh path_file_1 path_file_2

5-dimension variables are ignored by the cdo diffv command, thus not all variables in the restart files can be compared by this method.

ADVANTAGE: the output file tells you which fields are different. Be aware, though that this method works best for smaller netCDF files. If your history file is more than a few megabytes, the output text file may be many hundreds of megabytes. In that case, the md5sum command may be a better option.

DISADVANTAGE: only works for netcdf files, and for tables rank lower than 4.

md5sum

If you expect the files to be identical (bit by bit), you can use

#!/bin/bash
md5sum path_file1 > sum1
md5sum path_file2 > sum2
cmp -s sum1 sum2

The two first commands create signature strings for each files, written in files 'sum1' and 'sum2' (which will thus be created/overwritten). The output of the third line will be 0 if files are identical, 1 otherwise.

ADVANTAGE: works for all files.

DISADVANTAGE: you only know whether the files are identical or not. If not, you have no idea which fields are different.

Matlab

The matlab function nccmp is able to compare all variables contained within two netcdf files. The original version can be found here.

Pascal Maugis has made some small modifications so that the information produced by the script are put into a file instead of being printed to the screen. The updated version can be found here

Sadly, matlab is not on obelix, but on IRENE. To open matlab on IRENE type Matlab or if you wish to run from the terminal type matlab -nodesktop.

Next run the function by typing:

NCCMP(ncfile1, ncfile2, tolerance, forceCompare)

tolerance is if you allow some variation in the variables between the two files. We want identical files thus put [] here.

forceCompare can be set to True or False.

  • True - write all occurrences of differences in a variable (specifically gives all the indices) to the file: all_diff.txt.
  • False - only write, when there are differences in a variable, the first occurrence of such differences to the file 'first_diff.txt'.

For global simulations, the True option can produce a large file and the information might be hard to process, if there are many differences between the compared restart files. In addition, the True option makes the script much slower. However, for small simulation the True option is very useful.

It is recommended to use the re-ordered files from the difffer100.sh script as inputs to nccmp.

restart_daily.py and restart_monthly.py

When developing the age class code, there is at least one specific test case that could be used to ensure the technical integrity of the code (this test case is described in the script and will be added to the trusting). If the trusting fails, this script can be used to find the exact moment and variable where the restart files start to diverge. This is quite challenging because the dimensions of the restart files are different. The scripts need to pair the matching PFT between the run with 1 age class and the run with 4 age classes. The monthly scripts cycles searches for monthly files. If the file is not found, the script continues. If a difference is found, the script continuous. The daily script includes more testing and stops as soon as a difference is found. Both scripts only work on a single pixel.

Quick test

Most of the above tests are very precise. We know from experience that most restart problems quickly propagate throughout the restart file. So a quick test that handle large spatial domains could focus on a few selected sechiba variables that have no PFT dimension (that way runs with and without age classes could be compared). One such approach on obelix could be:

rm -f 39pft.txt 15pft.txt diff.txt
ncdump -v fluxlat,mcw,mcr,mcs,temp_sol,qsurf,evapot,fluxsens,snow,z0m TEST4DCLEANMORT/SRF/Restart/TEST4DCLEANMORT_19101231_sechiba_rest.nc > 39pft.txt
ncdump -v fluxlat,mcw,mcr,mcs,temp_sol,qsurf,evapot,fluxsens,snow,z0m TESTCLEANMORT/SRF/Restart/TESTCLEANMORT_19101231_sechiba_rest.nc > 15pft.txt
diff 15pft.txt 39pft.txt > diff.txt
vi diff.txt

and the equivalent on CCRT

module load nco
module load cdo
rm -f 39pft.nc 15pft.nc 
ncks -C -v fluxlat TEST4D/SRF/Restart/TEST4D_19031231_sechiba_rest.nc 39pft.nc
ncks -C -v fluxlat TEST1D/SRF/Restart/TEST1D_19031231_sechiba_rest.nc 15pft.nc;
cdo diffv 15pft.nc 39pft.nc

Last modified 10 months ago Last modified on 2023-07-12T10:46:04+02:00

Attachments (2)

Download all attachments as: .zip