wiki:Parameter_handling

Documenting and archiving ORCHIDEE parameters

Initial draft: Sebastiaan Luyssaert (29/03/2020)

Attachment: Document summarizing initial discussion between SL, PP and AD

Objective: document the work around the documentation and handling of all ORCHIDEE parameters

Rational

Although it is clear than parameters are an integral part of the ORCHIDEE model and that the choice of a specific parameter value can integrate a large set of implicit assumptions, until presents parameters have received little attention compared to the code itself. Although there have been efforts to document the origin of our model parameters in the code itself, this initial approach can no longer account for the flexibility in parameter use that came with the introduction of PFTs that can draw their parameters from either the default parameters of the meta class they belong to or from an external source, i.e., the orchidee_pft.def file. Along the same lines, the diversity of driver files and set-ups has increased over the past years and with it has the diversity of parameter values. Finally, several parameters are optimized within an observed range, to enhance model performance. The initial approach where the origin of the parameters is documented is no longer fit for the diversity and complexity that comes with the more recent versions and applications of ORCHIDEE.

Objective

Propose a suitable and future-proof approach to document and archive the ORCHIDEE parameters as a means to: (1) enhance the traceability of the parameter values, (1) ensure the reproducibility of the parameter values, and (3) meet the highest QC/QA standard.

From observation to model parameter

Few are the cases where observed parameter values can enter the model without any further consideration. This is due to: (1) scaling issues between the observation (site) and the model (pixel), (2) spatial homogenization in the model which ignores the observed spatial heterogeneity, (3) the use of PFTs which cannot be correctly represented in the model unless their exact phylogenetic composition is known, (4) non-linearity of parameters making it undesirable to simply use the mean observed parameter values in the model, and (5) shortcomings in the models which are compensated for in the parameter values. The C-only versions of ORCHIDEE implicitly account for the N-cycle in their parameter values, else no observations could ever have been matched. Where some of these issues are explicitly accounted for others are still implicitly accounted for in the processing chain (Fig. 1)



Fig. 1. The different processing steps from the observed parameter values up to a model parameter used in a specific model version.

Sources of parameter values

Some of the main sources of parameters values are: (1) individual research papers reporting experimental or monitoring results, (2) meta-analysis reporting best-available knowledge, (3) analyses of remote sensing data, (4) existing trait and parameter data bases, and (5) observational data bases which could be used to derive parameter values.

Properly documenting and archiving these sources should be an integral part of each model development but would benefit from guidelines (absolute minimum) but preferable a formal repository. This is a huge task that comes with all the challenges that come with literature study: which studies/sources will be used and which will be rejected (e.g., peer review vs grey literature?), which values will be used and which will be rejected (e.g., control or also the values from the experimental treatments?), which meta-data are considered essential (e.g., geolocation, altitude, soil type, methodology). The diversity of sources that we use (e.g., individual site vs remote sensing) makes this a challenging task.

The current speed at which new papers are published and parameter values become available suggest that it is more realistic to focus on bringing together existing databases than building our own although for some more exotic parameters we will still have to search the literature. In addition to the conceptual issues we will have to deal with more technical issues such as storing and versioning of the data sources, maintaining a library of script to extract parameter values from databases that are updated (e.g., TRY database).

This database should enable back traceability implying that none of its entries should ever be overwritten but that sufficient labels should be present to filter out obsolete parameters (no longer used by ORCHIDEE) and obsolete values (no longer considered best-available). This criterion has implications for model development; if in ORCHIDEE we change the definition of a parameter we should also change its name. Ideally this database should be shared and maintained by several modelling groups as we all depend on the same primary sources for our initial parameter values.

Querying the database

This database is then the start from the ORCHIDEE-specific processing chain. A set of scripts should enable us to extract the parameter values we need for any specific version of the model. Such an extraction should account for the quality of the observation, the heterogeneity of the observations, the PFT definitions, soil type definitions, temporal issues (e.g., should we still use parameters from the 1960 knowing that ecosystem function has changed considerably). If no optimization is used, the result of this query will enter the ORCHIDEE specific database.

Optimizing and tuning

The database query (see above) could be used as the starting point of an optimization. Each optimization should be linked to a numbered query, a version number of ORCHIDAS, a version number of ORCHIDEE and version numbers of the dataset that were the objective of the optimization. With all of this information, the optimization should be reproducible. From that point of view manual tuning should longer be accepted as a valid source of parameters because it is not reproducible. Parameters that are manually tuned should then be used as the prior of a formal optimization.

Model configurations

Model configurations are documented and archived in the ORCHIDEE_OL folder. This approach meets the standards for traceability and reproducibility and could be further develop for more configurations. Configuration specific PFT parameters are already accounted for the orchidee_pft.def files. Other configuration specific parameters are not yet accounted for.

ORCHIDEE database

A first possible approach for the ORCHIDEE database can be found in ../config/ORCHIDEE_OL/MAKE_RUN_DEFS/processed_parameters.py. The current database only accounts for “observed” parameters and has no approach yet on how to deal with tuned parameters and or different parameters that are considered equally reliable. In the current approach the most recent parameter values is used. Previous values are ignored but can be used to reconstruct orchidee_pft.def files for previous model versions.

Orchidee_pft.def

This file contains the best available parameter estimates for different PFTs configurations (e.g., different number of PFTs with and without age classes). The script has been set-up such that the orchidee_pft.def only produces files with parameters that are still used by the model (this list needs to be manually updated). Orchidee_pft.def files are independent from the model configuration and whether the land-only or land-atmosphere set-up is used.

Implication for the ORCHIDEE source code. The outcome of the optimization is often a modifier. This is a very elegant approach which separates the original parameter value from its optimized value. It could also be seen as quality control of the model itself. If modifiers are getting smaller after model developments or after new observations have been used to estimate the parameter values, the development or new parameter could be considered to be more in line with current knowledge that the previous estimates. Should we introduce in the ORCHIDEE code for each parameter a variable that can be linked to observations and a modifier than can be optimized and tuned without affecting the initial parameter value?

The quality control and quality assurance chain. Note that this is only part of the QC/QA chain. The chain continues with an svn server for ORCHIDEE, commits that are backed up by a ticket, groups of tickets that are backed up by a mile stones. Milestones that come with trusting and reference simulations, simulations that are evaluated against a set of benchmarks and all of this being properly tied together in publications (peer reviewed or wiki).

Last modified 4 years ago Last modified on 2020-03-31T12:13:17+02:00

Attachments (2)

Download all attachments as: .zip