wiki:Scripts/FluxnetValidation/DataProcessing

What is FLUXNET_DATA ?

FLUXNET_DATA is a set of Python scripts that aim at generating meteorological ORCHIDEE input file (.nc file) from FLUXNET site data (.csv file).
Additionally, the file stores other monitored variables that may serve for model/data comparison.
The main feature of FLUXNET_DATA is to fill the gaps in the meteorological time series thanks to ERA5 hourly product (or alternatively ERA-interim 3-hourly product).

This gapfilling procedure is detailed and evaluated in Vuichard and Papale (2015)

Input requirements

FLUXNET_DATA scripts will read two .txt files named siteinfo.txt and configinfo.txt and input data. Format of input data is the one the file FR-Hes.2000.synth.hourly.allvars.csv under Inputs directory: One file per year and per site. The name of each file is specified with the convention: ID.YEAR.synth.hourly.allvars.csv where ID the id of the site as specified in the siteinfo.txt file

siteinfo.txt

siteinfo.txt lists the different FLUXNET sites for which we want to generate ORCHIDEE input files. For each site, siteinfo.txt gives site ID, first and last years for which data are available, latitude and longitude of the site location, and optionally shift to UTC time (in hour) and time frequency of the input data (in hour). The site ID has to be a six digit string, the 2 first charcters have to be upper case and the third has to be a "-". The three other characters can mix upper and lower case characters and numbers. Record fields are separated by a "tabulation" character
siteinfo.txt looks like:

Site    FirstY  LastY	Lat	     Lon	  UTCtime(optional)
FR-Hes  2000    2001    48.674198       7.064620        1       0.5

This siteinfo.txt refers to files named FR-Hes.XXXX.synth.hourly.allvars.csv with XXXX from 2000 to 2001.

configinfo.txt

configinfo.txt gives seven information:

  • name_path_weather : the location of the .csv files
  • name_path_reanalysis : the location of the ERA-interim products
  • name_path_out : the directory where we want to store the ORCHIDEE input files ()
  • gapmax : the shortest gap for which we use the reanalysis for filling gaps. Below gapmax length, we use a linear interpolation . It is expressed in time step of the in-situ data and its default value is 6 (= 3 hours)
  • format_weather : it specifies the format of the site input data. There are several format. The default one is "fluxnet"
  • format_reanalysis: it specifies the format of the reanalysis. There are several format. The default one is "2D_ERA5"
  • flux : specify if additional site-level variables (ie Carbon and Energy flux variables are added to the output file (=1) or not (=0).

Functioning

FLUXNET_DATA creates continuous time-series of Air Temperature, Rainfall, Snowfall, Shortwave and Longwave down radiation, surface pressure, surface specific humidity, North-dir and East-dir wind speed from available FLUXNET data (Air Temperature, Precipitation, Shortwave and Longwave down radiation, Vapour pressure deficit, Wind speed) and ERA-interim products when FLUXNET data are not available or not original data.
In order to fill the gaps with reanalysis data, we first compare FLUXNET and ERA-interim data at the time resolution of the reanalysis when both datasets are available. We then correct for systematic bias in order to use non-bias reanalysis data when gaps.
Variables that are compared are :

  • Air Temperature
  • Shortwave and longwave down radiation
  • Water vapour pressure
  • Precipitation
  • Wind speed

Pressure is absent in the fluxnet datasets and is consequently directly taken from the reanalysis dataset.

For all fields except precipitation, we estimate the slope and intercept of the linear regression between Fluxnet and ERA-interim time series in order to correct for ERA-interim bias.
We constrain the intercept of the linear regression to equal 0 for the shortwave down radiation and the wind speed.
For precipitation, we simply compare the cumulative amounts over all the reanalysis time steps where fluxnet and reanalysis data are available.
Water Vapor Pressure from ERA-interim and Fluxnet products is calculated by using the Magnus Tetens relationship based on Air water deficit, Air temperature and Surface pressure (Murray, 1967, http://cires.colorado.edu/~voemel/vp.html).

When there are gaps in the Fluxnet time-series, non-bias reanalysis data at hourly or 3-hourly time step are used and consequently interpolate to match the half-hourly time resolution of the Fluxnet data. 'Air Temperature', 'Water Vapour Pressure', 'Wind speed', 'Longwave down Radiation' and 'Pressure' are linearly interpolated. Hourly or 3-hourly precipitation are spread at half-hourly resolution using the observed mean frequency of precipitation. For the shortwave down radiation interpolation, we account for the solar angle evolution over the hourly or 3-hourly time period.

Last, in order to agree with the required meteorological fields by ORCHIDEE, precipitation is split into Rainfall and Snowfall by using a 0°C threshold and Water Vapor Pressure is converted in 'Air specific humidity' using the Magnus Tetens relationship.

Outputs of FLUXNET_DATA tool

FLUXNET_DATA creates a Netcdf file that contains all the meteorological fields requested by ORCHIDEE model. In addition, it contains a quality flag for each of the meteorological fields indicating when data are original (1) or derived from ERA-interim product (0). It also contained Fluxnet data for :

  • Net Ecosystem Exchange
  • Gross Primary Production
  • Ecosystem Respiration
  • Soil Water Content
  • Sensible Heat Flux
  • Soil Temperature
  • Latent Heat Flux
  • Relative Humidity
  • Canopy conductance
  • Potential Evapotranspiration

FLUXNET_DATA creates also a PDF file that contains charts comparing meteorological fields from Fluxnet and ERA-interim data. It gives information on the percentage of gaps for each meteorological field, the slope and intercept of linear regression between Fluxnet and ERA-interim time series and the RMSE between these datasets with and without bias correction. See below an example of PDF file created by FLUXNET_DATA tool.
http://dods.ipsl.jussieu.fr/orchidee/WIKI/RU-Zot_2002-2004.png

What does FLUXNET_DATA tool contain ?

FLUXNET_DATA is a set of 11 Python scripts.
gapfilling.py is the main script. It first read the two input files, siteinfo.txt and configinfo.txt. Then, it performs a loop over the set of Fluxnet sites and will perform, for each, a suite of treatments.
The treatments performed for each site consist in:

  • reading and uploading the meteorological fields stored in the ERA-interim dataset for each year during which the Fluxnet site was monitored and for the specific site location (one grid point)
    • function read_climatology stored into read_climatology.py).
  • reading and uploading the meteorological fields and the monitored fiels stored int the FLUXNET dataset for each year during which the site was monitored.
    • functions read_weather and read_obs stored respectively into read_weather.py and read_obs.py
  • filling the gaps in the meteorological time series.
    • function gap_fill_func stored into gap_fill.py
  • creating the PDF file.
    • function visu_plotlib_func stored into visu_plotlib.py
  • adapting the variables unit for being in agreement with ORCHIDEE input standard.
    • function prepate_for_orchidee stored into input_nc_orchidee.py
  • creating the NetCDF file.
    • function write_nc stored into input_nc_orchidee.py

How to use FLUXNET_DATA tool on Obelix machine ?

Copy the following directory /home/users/vuichard/TEMPLATE_FLUXNET_DATA under one of your own directory

myprompt>> cp /home/users/vuichard/TEMPLATE_FLUXNET_DATA .

To set the current directory as the default directory one in some of the configuration files

myprompt>> ./changepath.ksh

To launch a test run of the FLUXNET_DATA tool, simply type:

myprompt>> qsub launch_fluxnet.bat

It makes use of the information set in siteinfo.txt: run for FR-Hes site for years 2000 and 2001. It also makes use of information stored in configinfo.txt: The input data are located under Inputs directory, the output data will be stored in Outputs directory. It is also specified that reanalysis data used for filling gaps is ERA5. It takes roughly 30 minutes per year and per site. It will create a log file ouptput_fluxnet at the end of the execution.

To run other sites, specify them into the siteinfo.txt file with the same format as FR-Hes. If you want to not use one of the sites in the list sites in siteinfo.txt, simply put a "#" at the start of the line, and add the associated input files under Inputs directory. And relaunch "qsub launch_fluxnet.bat"

Last modified 8 days ago Last modified on 06/07/21 15:40:51