wiki:DevelopmentActivities/ORCHIDEE-ML-Spinup

Version 20 (modified by dgoll, 7 weeks ago) (diff)

--

Spin up with a Machine Learning approach

What is it about?

Aim: develop a spinup acceleration procedure which is model version independent. The idea is to develop a python tool set which can applied to the ORCHIDEE family of models.

How can I contribute to this effort?

Please contact the D.Goll if you want to join. Some example we would benefit from are:

  • data from conventional spinup simulations
  • expertise how to link it to other tools, like libIGCM, ORCHIDAS etc.
  • expertise how to host/distribute/maintain the software
  • machine learning, python

Task force members

Daniel Goll, Yan Sun, Jinfeng Chang, Yilong Wang, Yuanyuan Huang, Vladislav Bastrikov, Nicolas Viovy Matt McGrath?

Status reports

26/01/2021

  • DONE: Proof of concept for ORCHIDEE-CNP v1.2
  • ONGOING: Finding a common setup for pixel selection applicable to all ORCHIDEE versions
  • ONGOING: Collecting data from other ORCHIDEE versions for testing
  • ONGOING: Translating matlab into python code
  • ONGOING: Cleaning the code
  • ONGOING: Recruiting task force members

16/02/2021

Yan gave a presentation on progress with python coding, results on CNP and trunk, and timeline for next 2 months.

  • Input files: restart + climate forcing (not hist file as might ORCHIDEE might introduce noise)
  • K-means clustering: add plot which shows the total distance vs k to monitor if the chosen number of cluster paranmeter is well chosen (part of the monitoring info for user)
  • Add checks and quality statistics to monitor if each steps performs well & stop the procedure is results fail minimum quality criteria (e.g. stop if machine learning fails to predict training pixels)
  • Externalize all parameters of the routines in one file.

Work distribution:

  • Matt: Provide trunk v4.0 data (EQ files, + results from 200yr after scratch w/o anal spinup)
  • Yilong refines & extend coding of tool 1&2
  • Run tests with the refined tools for other forcings (everyone)
  • Yan will focus next month on PhD defens (20.March)

03/03/2021

  • First version of python tools are available for testing
  • Yilong gave an overview

Next steps:

  • put code and documentation on github (Daniel, Vlad, Yilong)
  • add documentation on how to run the tools; adapt them to other models (Yan,Yilong)

  • all attempt to run the tools with their model data (keep a log on github about what model data used)

information/suggestions on run the tools:

  • user specification files: need more information, e.g. what file name corresponds to Equilibirum information what to info from transient run (Yan)
  • things to improve: figure labelling, user spec file (simplify)
  • try to use qsub to avoid blocking nodes on obelix

16/03/2021

  • github has been setup and some initial test and exchanges were done
  • next: everyone try and test the tool on the two available datasets (CNP, trunk); report bugs, improvmenets, etc on github
  • ongoing: acquire data from other model (versions): CABLE, ORCHIDEE-MICT, ORCHIDEE-<any>
  • next meeting will be scheduled after discussion with Yan after her defence

01/04/2021

  • github code status: YY could run the code, DG did some test modifying some inputs, all detected (minor) problems are listed in issues in github
  • TODO1 (yan): provide information in README how to insert data from other simulation; separate the user specification files into experiment specific (e.g. path to model output, forcing period (for tool 3), etc) and model version specific (e.g. CNP, MICT, Trunk, CABLE, etc).
  • TODO2 (yan): provide a tool 2 output which condense the information from now multiple files into a single file.
  • TOdO3 (yan): work on the manuscript (incl. results from test with other model versions (if feasible from TODO4) and CABLE)
  • TODO4 (YY, DG, all): test the tools 1 and 2 when TODO1 and TODO2 are ready.
  • TODO5 (DG): discuss with project team about the running scripts.
  • TODO6( Yan) : code a evaluation tool (tool 3); check criterias are (1) high priority (total land C stock), (2) medium priority (land C stock on pixel), (3) others / drift over forcing period (i.e. climate loop).

14/04/2021

Progress since last meeting:

  • update of README
  • bug detected for biomass pool
  • evaluation tool for developers

To do

  • update development to github (e.g README)(yan)
  • produce evaluation tool to test if the ML works for training sites (Yan)
  • send the data location for CNP-MIMICS runs and MICT runs (Yan)
  • adapt the tool for MIMICS and MICT (Daniel , + all) to test if tool structure and documentation
  • produce restart files for CABLE (Yan)
  • finalize the paper within 4 weeks
  • next meeting in 3 week due to Yans move

05/05/2021

TODO:

  • revise the varlist.json to be more flexible regarding varying variables/dimensions in the restart files of ORCHIDEE versions (Vlad)
  • MICT restart file: which variables are needed which ae not? What do the dimensions stand for? (Jingfeng)
  • visualization of the quality of the training (Daniel)
  • CNP-MIMICS trainging data (Daniel)

19/05/2021

Progress since last meeting:

  • MICT: deepC_a, deepC_s, deepC_p are state variables. carbon stores the depth integrated SOC information and can be derived from the other three.
  • new json syntax for more flexibility proposed NEXT: Yan, Yilong discuss about feasilibilty to introduce the concept
  • evaluation tool: LOOCV (optional for developers), quick check plots (mandatory for users) NEXT: finalize and upload to github
  • evluation tools: different statistical variables to be tested, tradeoff between user-friendliness and information content

02/06/2021

  • Yans' tested the statistic proposed by Philippe and finalized the visulations of LOOCV
  • code will be uploaded to allow testing other ORCHIDEE versions
  • We discussed the suggestions by Vlad on generalized input reading it seem feasible
  • MICT model requires substantial work, but seems worth as we expect other models versions to inflate dimenions as well in the future, work is not blocking so we can move slowly for MICT

NEXT

  • Yilong will tackle the generalized input reading
  • Yan finalizes the paper sends it to co-authors; submission to journal before August
  • Daniel will test MIMICS version

07/07/2021

Work for the summer:

  • upload the LOOC visluation code (YS), which then allows to ...
  • .. test ORCHIDEE versions: MICT(JC), CNP(DG), CNP-MIMIC(DG), Trunk2.2(YS): for having a good idea how well the code works for ORCHIDEE versions
  • test CABLE (YY): for improving github code and documentation
  • finalize the paper (YS): redo the lost data, writing
  • Find a date to discuss with O-PRJ Team how to link with libIGCM and present the performance of the tool

Next meeting after summer break: 1. Sept 10h00

01/09/2021

  • CNP, CNP-MIMICS, trunk2.2, MICT run task 1-4 (for MICT a minor issue remains in task 4)
  • LCOOV code has been uploaded and tested for trunk 2.2
  • ML performance issues detected with MIMICS and MICT: for MIMICS we need better training data (no drift in pools), for MICT we might need to include climate information into the clustering (e.g. to make sure we sample well permafrost regions) or the climate forcing for ML training might be too short.

NEXT:

  • fix task 4 issue of MICT
  • test LCOOV for CNP(DG),CNP-MIMICS(DG), MICT
  • take up pace with paper writing (YS)
  • produce better MIMICS training data