wiki:Documentation/UserGuide/DifferencesNetcdf

How to check whether two (netcdf) files are identical

Author: S.Luyssaert and A.S. Lansø

Last revised: 2020/02/28, P. Maugis

cdo diffv

If available (i.e. on obelix), you can use the following command:

cdo diffv path_file_1 path_file_2

The comparison only works if the files contain the same variables and in the same order. Otherwise the cdo command will return error=100. If one of the files contains more variables, the attached script differr100.sh by Josefine Ghattas can be used. This script will check if there are differences in the variable names and ask the user to remove variables, so the command "cdo diffv" can be applied. It first checks variables with type float and if no differences are found, it checks variables with type double. The script can therefore be used as well on diagnostic history files as restart files from ORCHIDEE. Use the script in the following way:

./diff100err.sh path_file_1 path_file_2

5-dimension variables are ignored by the cdo diffv command, thus not all variables in the restart files can be compared by this method.

ADVANTAGE: the output file tells you which fields are different. Be aware, though that this method works best for smaller netCDF files. If your history file is more than a few megabytes, the output text file may be many hundreds of megabytes. In that case, the md5sum command may be a better option.

DISADVANTAGE: only works for netcdf files, and for tables rank lower than 4.

md5sum

If you expect the files to be identical (bit by bit), you can use

#!/bin/bash
md5sum path_file1 > sum1
md5sum path_file2 > sum2
cmp -s sum1 sum2

The two first commands create signature strings for each files, written in files 'sum1' and 'sum2' (which will thus be created/overwritten). The output of the third line will be 0 if files are identical, 1 otherwise.

ADVANTAGE: works for all files.

DISADVANTAGE: you only know whether the files are identical or not. If not, you have no idea which fields are different.

Matlab

The matlab function nccmp is able to compare all variables contained within two netcdf files. The original version can be found here. Pascal Maugis has made some small modifications so that the information produced by the script are put into a file instead of being printed to the screen. The updated version can be found here

Sadly, matlab is not on obelix, but on IRENE. To open matlab on IRENE type Matlab or if you wish to run from the terminal type matlab -nodesktop.

Next run the function by typing:

NCCMP(ncfile1, ncfile2, tolerance, forceCompare)

tolerance is if you allow some variation in the variables between the two files. We want identical files thus put [] here.

forceCompare can be set to True or False.

  • True - write all occurrences of differences in a variable (specifically gives all the indices) to the file: all_diff.txt.
  • False - only write, when there are differences in a variable, the first occurrence of such differences to the file 'first_diff.txt'.

For global simulations, the True option can produce a large file and the information might be hard to process, if there are many differences between the compared restart files. In addition, the True option makes the script much slower. However, for small simulation the True option is very useful.

It is recommended to use the re-ordered files from the difffer100.sh script as inputs to nccmp.

Last modified 3 weeks ago Last modified on 12/29/20 11:13:32

Attachments (2)

Download all attachments as: .zip