Changes between Version 13 and Version 14 of Documentation/UserGuide/flags


Ignore:
Timestamp:
2020-04-20T09:40:00+02:00 (4 years ago)
Author:
dgoll
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Documentation/UserGuide/flags

    v13 v14  
    66You are running ORCHIDEE, just like every other day, when it stops for no apparent reason.  You don't have output files from the simulation, and the run.card lists "Fatal".  What can you do? 
    77 
     8''DSG_review: would be good to indicate that one should first see if some libIGCM stuff failed, you don't want people jumping into debugging if just a input file is missing '' 
     9 
    810== Compile in debug mode == 
    911In order to pin down exactly what the problem is, you can recompile ORCHIDEE with debug flags. These flags enable extra checks on code execution to identify unwanted behavior. 
    1012 
     13''DSG_review: the following paragraph on background information should be (re)moved to somewhere else  '' 
     14 
    1115Why aren't these checks enabled by default?  Speed. If you make certain assumptions, things will go much faster. For example, let's say we have a one-dimensional array with ten elements: ARRAY(1:10).  Asking the question, "Are we correctly accessing this array?" is not too difficult to ask (checking to see if the element number we are trying to access is between 1 and 10), and Fortran is actually ahead of other languages here in requiring that each element of the array be within bounds (See Section 6.5.3 of the Fortran 2008 standard, for example: "The value of a subscript in an array element shall be within the bounds for its dimension.").  Fortran should die with a segmentation fault if you have an array A(1:10,1:5) and you try to access A(11,1), while other languages will accept it because the offset of that address is still within the memory allocated for the array.  In general, checks like this take time, in particular if you want to know exactly which line number is causing the crash.   
     16 
     17''DSG_review: the following paragraph: it isn't clear what the shortest time possible is referring to (maybe one could link to the other wiki pages which describe (reducing nproc, etc)  '' 
    1218 
    1319To turn on all these checks, we change the compiler flags. Adding these checks can make your code run 10 times slower, so after turning on these flags, the first step is often to find the conditions that reproduce your crash in the shortest time possible (reducing the number of processors, reducing the spatial domain or using restart files to start the simulation the day before the crash). 
     
    1824== Other useful tips for debugging == 
    1925 
    20 * If the problem is related to reading in .nc files, you can change l_dbg = .TRUE. in errioipsl.f90 to get more information. 
     26* If the problem is ''DSG_review: suspected to be'' related to reading in .nc files, you can change l_dbg = .TRUE. in errioipsl.f90 to get more information. 
    2127 
    22 * If the problem is related to XIOS, for example writing of output variables, you can make the following changes to iodef.xml to get more information printed out from XIOS. 
     28* If the problem ''DSG_review: suspected to be'' is related to XIOS, for example writing of output variables, you can make the following changes to iodef.xml to get more information printed out from XIOS. 
    2329{{{ 
    2430  <variable id="info_level"                type="int">100</variable> 
     
    2733 
    2834 
    29 * The [http://forge.ipsl.jussieu.fr/orchidee/wiki/Documentation/UserGuide/Printlev PRINTLEV] flags which controls the amount of text output from ORCHIDEE can be very useful in finding out which routine the code is crashing in.  If you can't get a line number any other way, turn up the PRINTLEV as high as possible, check to see the last line printed out, and then check to see the next line which should be printed out.  The crash must be happening somewhere between those two lines.  
     35* ''DSG_review: The next point is the most important point here and shouldn't be hidden here; I would rephrase also: If problem is suggest to be due to orchidee_ol, ...'' 
     36The [http://forge.ipsl.jussieu.fr/orchidee/wiki/Documentation/UserGuide/Printlev PRINTLEV] flags which controls the amount of text output from ORCHIDEE can be very useful in finding out which routine the code is crashing in.  If you can't get a line number any other way, turn up the PRINTLEV as high as possible, check to see the last line printed out, and then check to see the next line which should be printed out.  The crash must be happening somewhere between those two lines.  
    3037 
    3138* And if you are up against a bug that seems to change every single time you run the code, even with all the above flags on, you might want to check out [https://forge.ipsl.jussieu.fr/orchidee/wiki/Documentation/UserGuide/valgrind Valgrind].