Changes between Version 6 and Version 7 of Documentation/UserGuide/flags


Ignore:
Timestamp:
2019-03-28T10:50:19+01:00 (5 years ago)
Author:
mmcgrath
Comment:

In response to ticket #149

Legend:

Unmodified
Added
Removed
Modified
  • Documentation/UserGuide/flags

    v6 v7  
    11= How to prepare for debugging = 
    22 
    3 This page will be rewritten: see ticket #149. 
     3You are running ORCHIDEE, just like every other day, when it stops for no apparent reason.  You don't have output files from the simulation, and the run.card lists "Fatal".  What can you do? 
    44 
    5   
    6 Compilation flags let users change compiler's behaviour. For example, they can give you more information about an exception or it can apply more or less optimizations to the code. Be aware that each compiler has its own flags. 
     5In order to pin down exactly what the problem is, you can recompile ORCHIDEE with debug flags.  These flags enable extra checks on code execution to identify unwanted behavior. 
    76 
    8 == Intel == 
     7Why aren't these checks enabled by default?  Speed.  If you make certain assumptions, things will go much faster.  For example, let's say we have a one-dimensional array with ten elements: ARRAY(1:10).  Asking the question, "Are we correctly accessing this array?" is not too difficult to ask (checking to see if the element number we are trying to access is between 1 and 10), and Fortran is actually ahead of other languages here in requiring that each element of the array be within bounds (See Section 6.5.3 of the Fortran 2008 standard, for example: "The value of a subscript in an array element shall be within the bounds for its dimension.").  Fortran should die with a segmentation fault if you have an array A(1:10,1:5) and you try to access A(11,1), while other languages will accept it because the offset of that address is still within the memory allocated for the array.  In general, checks like this take time, in particular if you want to know exactly which line number is causing the crash.   
    98 
    10 Inside ../../util/AA_make.gdef you can modifify orchidee default flags. This lines belong to obelix HPC. Located at LSCE. They are ready for intel fortran compiler. 
     9To turn on all these checks, we change the compiler flags.  Adding these checks can make your code run 10 times slower, so after turning on these flags, the first step is often to find the conditions that reproduce your crash in the shortest time possible (reducing the number of processors, reducing the spatial domain or using restart files to start the simulation the day before the crash). 
     10 
     11With the new [https://forge.ipsl.jussieu.fr/orchidee/wiki/Documentation/UserGuide/CompileMethods FCM], adding debug flags is fairly straightforward.   
    1112 
    1213{{{ 
    13 ###-Q- lxiv8    F_O = -DCPP_PARA -O3 $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) -fp-model precise 
     14cd modipsl/config/ORCHIDEE_OL 
     15vi AA_make                       => change -prod into -debug everywhere you see it (three locations?) 
     16../../util/ins_make 
     17gmake clean && gmake             ! "gmake clean" only needs to be done if you have previously compiled with the "prod" line 
     18}}} 
     19 
     20While the lines are scrolling by, you can look for things like, "-DBZ_DEBUG -g -fno-inline" and "-fpe0 -O0 -g -traceback -fp-stack-check -ftrapuv -check bounds -check all -check noarg_temp_created" to reassure yourself that the debug flags are being used. 
     21 
     22Note that this will NOT compile IOIPSL in debug mode.  Sometimes it is useful to have IOIPSL in debug mode as well.  This can be done by modifying lines in modipsl:util/AA_make.gdef. 
     23 
     24For example, the following lines are for the obelix machine at LSCE, using the Intel fortran compiler.  The first line (with a single #) is the one used, while the second line (with multiple #s) is ignored. 
     25 
     26{{{ 
     27#-Q- lxiv8    F_O = -DCPP_PARA -O3 $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) -fp-model precise 
     28#####-Q- lxiv8    F_O = -DCPP_PARA -p -g -traceback -fp-stack-check -ftrapuv -check bounds $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) 
     29}}} 
     30 
     31The debug flags are in the second line.  In order to use debug flags with IOIPSL on obelix, the first line should be commented out (by adding ####) and the second should be uncommented (by removing # until there is only one at the beginning of the line), i.e. 
     32{{{ 
     33#####-Q- lxiv8    F_O = -DCPP_PARA -O3 $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) -fp-model precise 
    1434#-Q- lxiv8    F_O = -DCPP_PARA -p -g -traceback -fp-stack-check -ftrapuv -check bounds $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) 
    1535}}} 
    16 * First line is for production mode. In this case It is commented. 
    17 * Second line belongs to debug mode. 
    1836 
    19 In case you find non desired Nan in the code it is recommended to compile orchidee using the flag below. It triggers an exception instead of using Nan values. 
     37In case you find non-desired NaN values in the code, it is recommended to compile orchidee using the flag below. It triggers an exception and stops execution instead of trying to use NaN values.  NaN means "Not a Number", and happens when you try to divide by zero (or take the square root of a negative number). 
    2038 
    21 * -fpe0:  floating point error related with divided by zero. [http://fulla.fnal.gov/intel/compiler_f/main_for/fpops/fortran/fpops_fpew_f.htm More info] 
     39* -fpe0:  [http://fulla.fnal.gov/intel/compiler_f/main_for/fpops/fortran/fpops_fpew_f.htm More info] 
    2240 
    23 {{{ 
    24 #-Q- lxiv8    F_O = -DCPP_PARA -p -g -fpe0 -traceback -fp-stack-check -ftrapuv -check bounds $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) 
    25 }}} 
     41Note: if you modify this file you have to apply ins_make again. This way it spreads the changes to all Orchidee's folders. Then you must recompile from zero (using the "gmake clean && gmake" above). 
    2642 
    27 Note: if you modify this file you have to apply ins_make again. This way it spreads the changes to all Orchidee's folders. Then recompile from zero. 
    28  
    29 Sometimes, when you get an error that you can't track down, I've found it helps to use the following flags (on obelix).  It wasn't trivial to get XIOS, IOIPSL, and ORCHIDEE to all compile with the same flags, and perhaps this is not the best way to do it, but it works.  I noticed this was necessary to catch a memory error one time: the error was in ORCHIDEE, but it was showing up as a crash in XIOS, until I compiled everything with full debug flags...then it pointed straight to the line number in ORCHIDEE that was writing out of bounds. 
     43With the above, you should be able to get XIOS, IOIPSL, and ORCHIDEE to all compile with the same flags.  I noticed this was necessary to catch a memory error one time: the error was in ORCHIDEE, but it was showing up as a crash in XIOS, until I compiled everything with full debug flags...then it pointed straight to the line number in ORCHIDEE that was writing out of bounds. 
    3044 
    3145For XIOS, 
     
    3953}}} 
    4054 
    41 For IOIPSL, the ``ins_make -d'' command doesn't seem to have any effect.  I have to comment out the debug line in util/AA_make.gdef 
    42  
    43 {{{ 
    44   #-Q- lxiv8    F_O = -DCPP_PARA --i4 -r8 -fp-model precise -fpe0 -O0 -g -traceback -fp-stack-check 
    45         -ftrapuv -check bounds -check all -check noarg_temp_created -I$(MODDIR) -module $(MODDIR) 
    46 }}} 
    47  
    48 and then redo ins_make.  It is possible that if this line starts with F_D, using ``ins_make -d'' will trigger it. 
     55= Other useful tips for debugging = 
    4956 
    5057Can change l_dbg = .TRUE. in errioipsl.f90 to get more information printed out about reading in .nc files. 
     
    5764}}} 
    5865 
    59 Make sure to do a full gmake clean, and recompile, checking to see that the flags flashing by on the screen are the same as the ones above. 
     66After running libIGCM/ins_job to submit a job with libIGCM, the Job_ file is created.  This file has a setting for Verbosity.  It's good to make sure that is as high as possible (3 is the maximum value, currently) if you are running into issues that seem to be related to libIGCM. 
    6067 
    61 Don't forget to mention something about increasing libIGCM verbosity in the Job file to get more information, which can help track down unexpected behavior when using libIGCM. 
     68The [https://forge.ipsl.jussieu.fr/orchidee/wiki/Documentation/UserGuide/Printlev printlev] flags can be very useful in finding out which routine the code is crashing in.  If you can't get a line number any other way, turn up the printlev as high as possible, check to see the last line printed out, and then check to see the next line which should be printed out.  The crash must be happening somewhere between those two lines.  
     69 
     70And if you are up against a bug that seems to change every single time you run the code, even with all the above flags on, you might want to check out [https://forge.ipsl.jussieu.fr/orchidee/wiki/Documentation/UserGuide/valgrind Valgrind].