wiki:Documentation/UserGuide/FLUXNETValidation

Version 14 (modified by mmcgrath, 4 years ago) (diff)

Updating to document changes getting r6384 to work with ENSEMBLE, as well as give some tips on debugging. Not working yet.

This was tested for ORCHIDEE-CN-CAN (r5678 of ORCHIDEE and r5673 of ORCHIDEE_OL) on obelix.

First, look at Nicolas's page.

http://forge.ipsl.jussieu.fr/orchidee/wiki/Scripts/FluxnetValidation

And then look at the README file in config/ORCHIDEE_OL/ENSEMBLE. And then read this whole page before really starting to create a run.

Be sure you have checked out both CN-CAN modeles/ORCHIDEE and config/ORCHIDEE_OL (Documentation/UserGuide/ORCHIDEEDOFOCOInstall).

Be sure that ioipsl_debug=.FALSE. in modeles/IOIPSL/src/errioipsl.f90. Otherwise, the output files become huge because of the high frequency writes combined with the debug information.

Start from a clean SVN ENSEMBLE install. Notice that ENSEMBLE/Job_ENSEMBLE is the main driver, and it should not be deleted! This is what I refer to when I say "Nicolas's FLUXNET scripts". It will create jobs based on SPINUP/SUBJOB/OOL_SEC_STO/.

I have found the following files are used. Care should be made to make sure that conflicting options are not specified in these files (assuming you are running analytical spinup, sechiba, and stomate):

ENSEMBLE/fluxnet.card
ENSEMBLE/PARAM/run.def
SPINUP/COMP/spinup.card
SPINUP/SUBJOBS/OOL_SEC_STO/COMP/sechiba.card
SPINUP/SUBJOBS/OOL_SEC_STO/COMP/stomate.card

I am uncertain what the priority is. All of the .card files add things onto the end of the run.def, and I believe fluxnet.card takes priority over the others in case of conflict.

As of r6358 (much before, actually, but at least this revision), the CAN branch of config/ORCHIDEE_OL/ENSEMBLE contains a series of fluxnet*card files. These different files have different configurations, and different sites. Choose one that best matches what you want.

cp fluxnet_28sp.card fluxnet.card

As of around r6358, the Python script in the config/ORCHIDEE_OL/MAKE_RUN_DEF folder started generating only orchidee_pft.def_* in a few directories: OOL_SEC_STO_FG1trans, OOL_SEC_STO_FG2, SPINUP, and some others. You should make sure that your PARAM directory has all the run.defs it needs, as for a normal run : from the ENSEMBLE folder (or the folder you copied the ENSEMBLE folder to) cp ../OOL_SEC_STO_FG2/PARAM/* PARAM/. Do the same for the SPINUP/SUBJOB directory (e.g. cp ../OOL_SEC_STO_FG2/PARAM/* SPINUP/SUBJOB/OOL_SEC_STO/PARAM/).

I have noticed that the script will complain if a value is specified in fluxnet.card but not the run.def. It will not complain if a value is specified in run.def and not fluxnet.card. Check the [UserChoices?] and [SubJobParams?] sections of fluxnet.card. Many of the UserChoices? are already in SPINUP/COMP/spinup.card, and many of the SubJobParams? are in the run.def. It seems that the scripts make decisions based on what is in fluxnet.card, so this should typically take precedence.

Before we get to some specifics, let's create the jobs.

cd config/ORCHIDEE_OL/ENSEMBLE
vi config.card

Change the following lines (on obelix...on Irene, the ARCHIVE line should be fine):

JobName=FLUXNET
  ARCHIVE=/home/scratch01/$LOGIN

then create the job scripts

../../../libIGCM/ins_job

this creates Job_FLUXNET. Notice that this job will pull from the SPINUP directory as well. ins_job used to create Job files in every directory, but that functionally changed a while ago. Therefore, the following is now necessary (OOL_SEC_STO because we will run a job with sechiba and stomate).

cd ../SPINUP
../../../libIGCM/ins_job
cd SUBJOB/OOL_SEC_STO/
../../../../../libIGCM/ins_job
cd ../../../ENSEMBLE

Now edit the Job file. Notice that this is the Job file that is copied to all the subjobs when they run, so if you want them to run on a different queue (I use the long queue on obelix, as 500 years can take more than 12 hours), you should do that here. I also modify the run directory so I know where the jobs are running and can go to that directory easily if needed.

vi Job_FLUXNET
(change RUN_DIR_PATH=/home/scratch01/mmcgrath/RUN_DIR)
(change JobType=DEV if you are not sure this will work)
mkdir /home/scratch01/mmcgrath/RUN_DIR

Now change the options for the sites to run against.

vi fluxnet.card

Best to run a small test with a single site. If you are running with age classes (not recommended) or any number of PFTs other than the standard 13 in ORCHIDEE, you will need to change the following:

  NbPFTs= 13
  Groups= ( TEST )
  TEST =	   ( BR-Sa3 , BR-Sa3_2000-2003.nc , 2000 , 4 , 0,1,0,0,0,0,0,0,0,0,0,0,0 ) 

If you have 13 PFTs, you can add the above lines just as they are. COMMENT OUT ANY OTHER GROUPS LINES. Else, when you submit the job, you will launch a run over all of the sites in groups, and you have to cancel them one at a time.

The length of the spinup also matters. I use the following for the moment (in fluxnet.card, also in SPINUP/spinup.card)

n_iter=1
duree_inistomate=1
duree_sechiba=500
duree_final=1

All of the other duree values I set to 0. This launches a simulation over one loop of the forcing file, then 500 years (regardless of the length of the forcing file), and then one final loop for analysis.

The section in the fluxnet.card with [SubJobParams?] deserves special mention. As of a recent version of CAN, the run.def has been restructured to include two files: orchidee.def, orchidee_pft.def. This makes the run.def much neater and matches what is done in the coupled simulations. However, the Job_ENSEMBLE script attempts to change some variables in the run.def that fall under the [SubJobParams?] section. To do this, it looks at the actual run.def file, not any included file. If it does not find a line in the run.def corresponding to the lines in [SubJobParams?], it will crash. So make sure all the lines you specific under [SubJobParams?] in fluxnet.card also explicitly appear in the PARAM/run.def file.

The addition of the orchidee.def and orchidee_pft.def required adding them to the [ParametersFiles?] in SPINUP/SUBJOBS/OOL_SEC_STO/COMP/orchidee_ol.card, so that libIGCM copies the new files to the PARAM directory of the running code. It also required changes to the driver, to select from the correct orchidee_pft.def file. To fix this, I simply copied OOL_SEC_STO_FG2/COMP/orchidee_ol.* to SPINUP/SUBJOB/OOL_SEC_STO/COMP/. This also required adding the following to the [UserVhoices?] section in SPINUP/SUBJOB/OOL_SEC_STO/COMP/orchidee_ol.card

NORESTART=n
TIMELENGTH=y

I noticed that the names of the following filenames did not match what is written in the SPINUP/SUBJOB/OOL_SEC_STO/COMP/stomate.card file, which will cause problems later. Make sure tthe filenames in the run.def/flunxet.card/stomate.card all match, and then copy PARAM/*def to SPINUP/SUBJOB/OOL_SEC_STO/PARAM/.

Nammonium_FILE = ndep_nhx.nc
Nnitrate_FILE = ndep_noy.nc
Nfert_FILE = NONE
Nmanure_FILE = NONE
Nfert_cropland_FILE = nfert_cropland.nc
Nmanure_cropland_FILE = nmanure_cropland.nc
Nfert_pasture_FILE = nfert_pasture.nc
Nmanure_pasture_FILE = nmanure_pasture.nc
Nbnf_FILE= bnf.nc

Similarly, values found in fluxnet.card [UserChoices?] seem to be required in SPINUP/COMP/spinup.card, else it crashes.

Some additional variables which need to be in run.def and not orchidee.def (anything with _AUTO_ or _AUTOBLOCKER_ after it?):

STOMATE_HIST_DT = _AUTO_
STOMATE_RESTART_FILEIN = _AUTOBLOCKER_
SECHIBA_restart_in = _AUTOBLOCKER_
XIOS_ORCHIDEE_OK = _AUTOBLOCKER_
WRITE_STEP = _AUTO_
RIVER_DESC = _AUTO_
WRITE_STEP2 = _AUTO_ 
SECHIBA_HISTFILE2 = _AUTO_
STOMATE_IMPOSE_CN = _AUTO_

The following directories are used in the runs (from what I can tell):

ENSEMBLE/PARAM/
SPINUP/COMP/
SPINUP/SUBJOBS/OOL_SEC_STO/COMP/

We need to make sure all of the following lines are commented out (or do not exist in the orchidee_pft.def), since the script will change the vegetation for each site by adding lines at the end of the run.def and if these lines are present they will override them:

SECHIBA_VEG__01=0.0769230769231
...
SECHIBA_VEGMAX__01=0.0769230769231
  ...

Make sure the following line is in the orchidee.def.

IMPOSE_VEG=y

Also confirm that IMPOSE_VEG is not set in another of the files above (e.g., SPINUP/COMP/spinup.card,SPINUP/SUBJOB/OOL_SEC_STO/COMP/sechiba.card). Another line that may be needed is:

ATM_CO2 =_AUTO_: DEFAULT = 350.

In some versions of the run.def, no DEFAULT value is given, but the .driver expects a default and will crash if it's not there. Make sure these lines don't appear twice in the orchidee.def!

The latest versions of the .card and .driver files expect the following to be present in the PARAM/run.def, as they try to modify these values:

SECHIBA_restart_in=_AUTO_
XIOS_ORCHIDEE_OK=_AUTO_
STOMATE_RESTART_FILEIN=_AUTO_

The scripts expect some variables in SPINUP/SUBJOBS/OOL_SEC_STO/COMP/sechiba.card, and will crash if you don't have them. It tries to change them (perhaps based on fluxnet.card) and gives up if it doesn't find them in sechiba.card to change.

[UserChoices]
NEWHYDROL=y
ROUTING=n
LAIMAP=n
IMPOSE_VEG=y
LAND_USE=n
OKCO2=y
CO2varying=n

Something similar in SPINUP/SUBJOBS/OOL_SEC_STO/COMP/orchidee_ol.card

[UserChoices]
NORESTART=n
TIMELENGTH=y

If we turn off XIOS, we have a couple variables undeclared in IOIPSL, so it crashes. Instead, let's leave XIOS on and include the following hack for it to find the iodef.xml file. Note that you will have to change this path!

        (/home/orchidee03/mmcgrath/MYFOLDER/config/ORCHIDEE_OL/SPINUP_ANALYTIC_FG1/PARAM/iodef.xml, .)   ,\

Launch the job (from the README file).

   ./Job_ENSEMBLE fluxnet > out.Job_ENSEMBLE

BE SURE TO CHECK THE USED RUN.DEFs. These can be found by changing to the RUN_DIR when the job is running. The scripts will add flags to the end of the run.def, and sometimes these may conflict with what you want to run.

Debugging

These are some of the errors that I have run into, along with attempts at explaining why and where they may occur, and how to solve them.

Error files can be found in many places, including (assuming a job name of FLUXNET and a site of FI-Hyy):

FLUXNET/out.Job_ENSEMBLE
FLUXNET/FI-HyyFLUXNET/out_qsub_FI-HyyFLUXNET
FLUXNET/FI-HyyFLUXNET/STOI/Script_Output_FI-HyyFLUXNETSTOI.000001
FLUXNET/FI-HyyFLUXNET/STOI/Debug

In my experience, errors come from the following places:

'''FLUXNET/out.Job_ENSEMBLE''': PARAM/run.def
'''FLUXNET/FI-HyyFLUXNET/out_qsub_FI-HyyFLUXNET''': SPINUP/SUBJOB/OOL_SEC_STO/COMP/*card
'''FLUXNET/FI-HyyFLUXNET/STOI/Script_Output_FI-HyyFLUXNETSTOI.000001''': SPINUP/SUBJOB/OOL_SEC_STO/COMP/*card, SPINUP/SUBJOB/OOL_SEC_STO/COMP/*driver, PARAM/run.def, fluxnet.card
'''FLUXNET/FI-HyyFLUXNET/STOI/Debug''': SPINUP/SUBJOB/OOL_SEC_STO/COMP/*card, SPINUP/SUBJOB/OOL_SEC_STO/COMP/*driver, PARAM/run.def, or the ORCHIDEE model itself

I would recommend solving the "deepest" error first (e.g., fix an error in the STOI directory before trying to fix an error in out_qsub_FI-HyyFLUXNET).

Here are some errors:

In the file FLUXNET/FI-HyyFLUXNET/STOI/Script_Output_FI-HyyFLUXNETSTOI.000001

IGCM_debug_Exit :  IGCM_comp_modifyDefFile : The variable XIOS_ORCHIDEE_OK cannot be modified. It should be set to AUTO.

One solution is to modify the file SPINUP/SUBJOB/OOL_SEC_STO/COMP/sechiba.driver such that the following two lines

      IGCM_comp_modifyDefFile blocker run.def XIOS_ORCHIDEE_OK y
      ...
      IGCM_comp_modifyDefFile blocker run.def XIOS_ORCHIDEE_OK n

become

      IGCM_comp_modifyDefFile force run.def XIOS_ORCHIDEE_OK y
      ...
      IGCM_comp_modifyDefFile force run.def XIOS_ORCHIDEE_OK n

If you do this, the value of the variable will be overwritten, so you should confirm that all values which trigger this option (in this case, XIOS=y and XIOS_ORCHIDEE_OK=y) are set to match what you want. In this case, the XIOS value was found in SPINUP/SUBJOB/OOL_SEC_STO/COMP/orchidee_ol_card, PARAM/run.def,fluxnet.card).

Another error that is found:

In the file FLUXNET/FI-HyyFLUXNET/STOI/Script_Output_FI-HyyFLUXNETSTOI.000001

IGCM_debug_Exit :  IGCM_comp_modifyDefFile : Variable STOMATE_OK_STOMATE is not set in correct file. It should be set in run.def.

This is generally a sign that a variable is in PARAM/orchidee.def and it needs to be in PARAM/run.def because libIGCM is trying to modify it, and libIGCM only knows to modify run.def at the moment. You will need to do the same to SPINUP/SUBJOB/OOL_SEC_STO/PARAM/*def.

Another error:

In the file FLUXNET/FI-HyyFLUXNET/STOI/Script_Output_FI-HyyFLUXNETSTOI.000001

IGCM_debug_Exit :  IGCM_comp_modifyDefFile : Error in run.def: Variable=NINPUT_UPDATE is set 2 times

Generally means that a value appears in both PARAM/run.def (likely copied there from fluxnet.card) and PARAM/orchidee.def. Need to delete the line in PARAM/orchidee.def, and then copy the whole PARAM directory to SPINUP/SUBJOB/OOL_SEC_STO/PARAM/.

Cleaning

If an ENSEMBLE run crashes, it can sometimes be difficult to clean up all the files so that you can easily relaunch the run after figuring out what went wrong. In particular, each site creates a new directory, which can add up to a lot of directories. It's possible that some of your runs overlap, too (i.e., they use the same base directory, but the current run only uses forested sites, while a different run used agricultural sites). There may be a libIGCM tool that does this well, but if you aren't familiar with it, here is a short script that works. Copy it to your submission directory (i.e., where you launch the ./Job_ENSEMBLE script), make it executable (e.g., chmod +x clean.sh), and launch it before re-launching the run (e.g., ./clean.sh).

#!/usr/bin/bash
simulation="FLUXNET"
basedir="/home/scratch01/mmcgrath/IGCM_OUT/OL2/PROD/ensemble/"
sites=( FI-Hyy FI-Sod )

for site in "${sites[@]}"
do
    rm -fr ${site}${simulation} 
    rm -fr ${basedir}${site}${simulation}*
    rm -fr ${basedir}${site}${simulation}*
    rm -fr ${basedir}${simulation}/${site}${simulation}*
    rm -fr ${basedir}${simulation}/${site}${simulation}*
    echo "$simulation $site"
done

rm -fr out.job_ensemble

All you need to do is modify the site list, basedir and simulation variables for your particular run.

Speed

Some timing tests were carried out with TAG2.1, TRUNK (r6096), and CAN (r6091) on obelix. This revealed the importance of the NBUFF=0 keyword for running with FLUXNET data for a single site. When running for a single site with forcing that has lower temporal resolution (e.g., CRUNCEP, which has six-hourly resolution instead of the 30 min resolution of FLUXNET), it's much less important. The amount of data output for all runs was adjusted to give approximately the same size of files. The optimized executables were used for all tests (-O3).

I take timings from four locations: CPU Time Global and Real Time Global from out\_orchidee, and then the real and user times reported by time -p ./orhcidee\_ol. For the most part, they are similar. For clairity, I only report Real time Global from out\_orchidee below. Error bars are the standard deviation from 5 independent runs to show the variance.

The TRUNK and TAG21 have 15 PFTs, CAN has 28 PFTs, but they are all set to zero except for NeedleleafEvergreenTemperate? (PFT4...4), Deciduous temperate (PFT6...8), C3Grass (PFT10..23), and C3Crop (PFT12...26), which are all set to 0.25. I wanted to simulate a somewhat realistic pixel with a mix of vegetation.

Using NBUFF=1

FLUXNET forcing, XIOS, half-hour sechiba history, one day stomate history, 5 years, no libIGCM, the total time is [in seconds, with standard deviation from five runs on obelix]

TAG21       1270 $\pm$ 60
TRUNK       4600 $\pm$ 600
CAN         5800 $\pm$ 500

CRUNCEP forcing, XIOS, half-hour sechiba history, one day stomate history, 5 years, no libIGCM

TAG21       1310 $\pm$ 70
TRUNK       1700 $\pm$ 100
CAN         1810 $\pm$ 90

Using NBUFF=0

FLUXNET forcing, XIOS, half-hour sechiba history, one day stomate history, 5 years, no libIGCM, the total time is [in seconds, with standard deviation from five runs on obelix]

TAG21       1250 $\pm$ 140
TRUNK       1480 $\pm$ 90
CAN         1700 $\pm$ 200

CRUNCEP forcing, XIOS, half-hour sechiba history, one day stomate history, 5 years, no libIGCM

TAG21       1310 $\pm$ 50
TRUNK       1440 $\pm$ 110
CAN         1700 $\pm$ 200