wiki:Documentation/UserGuide/flags

Version 14 (modified by dgoll, 4 years ago) (diff)

--

How to get started debugging and compile with debug options

Author: M. McGrath
Last revision: J. Ghattas 2020/02/28

You are running ORCHIDEE, just like every other day, when it stops for no apparent reason. You don't have output files from the simulation, and the run.card lists "Fatal". What can you do?

DSG_review: would be good to indicate that one should first see if some libIGCM stuff failed, you don't want people jumping into debugging if just a input file is missing

Compile in debug mode

In order to pin down exactly what the problem is, you can recompile ORCHIDEE with debug flags. These flags enable extra checks on code execution to identify unwanted behavior.

DSG_review: the following paragraph on background information should be (re)moved to somewhere else

Why aren't these checks enabled by default? Speed. If you make certain assumptions, things will go much faster. For example, let's say we have a one-dimensional array with ten elements: ARRAY(1:10). Asking the question, "Are we correctly accessing this array?" is not too difficult to ask (checking to see if the element number we are trying to access is between 1 and 10), and Fortran is actually ahead of other languages here in requiring that each element of the array be within bounds (See Section 6.5.3 of the Fortran 2008 standard, for example: "The value of a subscript in an array element shall be within the bounds for its dimension."). Fortran should die with a segmentation fault if you have an array A(1:10,1:5) and you try to access A(11,1), while other languages will accept it because the offset of that address is still within the memory allocated for the array. In general, checks like this take time, in particular if you want to know exactly which line number is causing the crash.

DSG_review: the following paragraph: it isn't clear what the shortest time possible is referring to (maybe one could link to the other wiki pages which describe (reducing nproc, etc)

To turn on all these checks, we change the compiler flags. Adding these checks can make your code run 10 times slower, so after turning on these flags, the first step is often to find the conditions that reproduce your crash in the shortest time possible (reducing the number of processors, reducing the spatial domain or using restart files to start the simulation the day before the crash).

On the page More about compile methods, it is described how to compile using debug options. In short, for newer configurations (such as ORCHIDEE_3 or LMDZOR_v6.2 and newer), compilation is done by a script compile_X.sh and adding argument -debug activates the debug options.

Other useful tips for debugging

  • If the problem is DSG_review: suspected to be related to reading in .nc files, you can change l_dbg = .TRUE. in errioipsl.f90 to get more information.
  • If the problem DSG_review: suspected to be is related to XIOS, for example writing of output variables, you can make the following changes to iodef.xml to get more information printed out from XIOS.
      <variable id="info_level"                type="int">100</variable>
      <variable id="print_file" type="bool">true</variable>
    
  • DSG_review: The next point is the most important point here and shouldn't be hidden here; I would rephrase also: If problem is suggest to be due to orchidee_ol, ...

The PRINTLEV flags which controls the amount of text output from ORCHIDEE can be very useful in finding out which routine the code is crashing in. If you can't get a line number any other way, turn up the PRINTLEV as high as possible, check to see the last line printed out, and then check to see the next line which should be printed out. The crash must be happening somewhere between those two lines.

  • And if you are up against a bug that seems to change every single time you run the code, even with all the above flags on, you might want to check out Valgrind.