Opened 5 months ago

Closed 5 months ago

#756 closed defect (fixed)

rhLitter in XIOS

Reported by: mmcgrath Owned by: somebody
Priority: minor Milestone: ORCHIDEE 4.3
Component: Anthropogenic processes Version:
Keywords: Cc:

Description

r6995 of the Trunk, on Irene, compiled in debug mode, running with 2 CPUs (one for ORCHIDEE, one for XIOS), copied from the svn version of SPINUP_ANALYTIC_FG1 and modified to run for a single pixel:

LIMIT_WEST=8
LIMIT_NORTH=48
LIMIT_SOUTH=46
LIMIT_EAST=10

crashes in XIOS on the first day with a long error message.

[irene1275:58308:0] Caught signal 8 (Floating point exception)

backtrace

2 0x000000000006bc9c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.7.3112/src/mxm/util/debug/debug.c:641
3 0x000000000006c1ec mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.7.3112/src/mxm/util/debug/debug.c:616
4 0x00000000000363f0 killpg() ??:0
5 0x00000000040bca78 _ZN5blitz6DivideIddE5applyEdd() mmcgrath/TRUNK.HEAD/modeles/XIOS/extern/blitz/blitz/ops.h:147
6 0x000000000570ed75 _ZN5blitz21_bz_ArrayExprBinaryOpINS_13_bz_ArrayExprINS_17FastArrayIteratorIdLi1EEEEES4_NS_6DivideIddEEE10readHelperIdE5derefERKS4_SB_() mmcgrath/TRUNK.HEAD/modeles/XIOS/extern/blitz/blitz/array/expr.h:846
7 0x000000000570e27b _ZNK5blitz21_bz_ArrayExprBinaryOpINS_13_bz_ArrayExprINS_17FastArrayIteratorIdLi1EEEEES4_NS_6DivideIddEEEdeEv() mcgrathm/TRUNK.HEAD/modeles/XIOS/extern/blitz/blitz/array/expr.h:916
8 0x000000000570edd4 _ZNK5blitz13_bz_ArrayExprINS_21_bz_ArrayExprBinaryOpINS0_INS_17FastArrayIteratorIdLi1EEEEES4_NS_6DivideIddEEEEEdeEv() mcgrathm/TRUNK.HEAD/modeles/XIOS/extern/blitz/blitz/array/expr.h:185
9 0x00000000056acfba _ZN5blitz5ArrayIdLi1EEaSINS_13_bz_ArrayExprINS_21_bz_ArrayExprBinaryOpINS3_INS_17FastArrayIteratorIdLi1EEEEES7_NS_6DivideIddEEEEEEEERS1_RKNS_6ETBaseIT_EE() mcgrathm/TRUNK.HEAD/modeles/XIOS/extern/blitz/blitz/globeval.cc:579

10 0x00000000056769de _ZN5blitz5ArrayIdLi1EEC1INS_21_bz_ArrayExprBinaryOpINS_13_bz_ArrayExprINS_17FastArrayIteratorIdLi1EEEEES7_NS_6DivideIddEEEEEENS4_IT_EE() mcgrathm/TRUNK.HEAD/modeles/XIOS/extern/blitz/blitz/array/methods.cc:89
11 0x000000000566a04f _ZN4xios13COperatorExpr6div_ffERKNS_6CArrayIdLi1EEES4_() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/parse_expr/operator_expr.hpp:294
12 0x000000000562dce0 _ZN4xios27CFieldFieldArithmeticFilter5applyESt6vectorISt10shared_ptrINS_11CDataPacketEESaIS4_EE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/binary_arithmetic_filter.cpp:290
13 0x000000000550c7f5 _ZN4xios7CFilter12onInputReadyESt6vectorISt10shared_ptrINS_11CDataPacketEESaIS4_EE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/filter.cpp:16
14 0x0000000004ce4a17 _ZN4xios9CInputPin8setInputEmSt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/input_pin.cpp:43
15 0x00000000051ab01f _ZN4xios10COutputPin12deliverOuputESt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/output_pin.cpp:52
16 0x00000000051aa3b3 _ZN4xios10COutputPin13onOutputReadyESt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/output_pin.cpp:41
17 0x000000000550c88e _ZN4xios7CFilter12onInputReadyESt6vectorISt10shared_ptrINS_11CDataPacketEESaIS4_EE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/filter.cpp:18
18 0x0000000004ce4a17 _ZN4xios9CInputPin8setInputEmSt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/input_pin.cpp:43
19 0x00000000051ab01f _ZN4xios10COutputPin12deliverOuputESt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/output_pin.cpp:52
20 0x00000000051aa3b3 _ZN4xios10COutputPin13onOutputReadyESt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/output_pin.cpp:41
21 0x000000000550c88e _ZN4xios7CFilter12onInputReadyESt6vectorISt10shared_ptrINS_11CDataPacketEESaIS4_EE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/filter.cpp:18
22 0x0000000004ce4a17 _ZN4xios9CInputPin8setInputEmSt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/input_pin.cpp:43
23 0x00000000051ab01f _ZN4xios10COutputPin12deliverOuputESt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/output_pin.cpp:52
24 0x00000000051aa3b3 _ZN4xios10COutputPin13onOutputReadyESt10shared_ptrINS_11CDataPacketEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/output_pin.cpp:41
25 0x0000000005261d86 _ZN4xios13CSourceFilter10streamDataILi1EEEvNS_5CDateERKNS_6CArrayIdXT_EEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/filter/source_filter.cpp:91
26 0x00000000040c6174 _ZN4xios6CField7setDataILi1EEEvRKNS_6CArrayIdXT_EEE() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/node/field_impl.hpp:24
27 0x000000000491d6f9 cxios_write_data_k81() mcgrathm/TRUNK.HEAD/modeles/XIOS/src/interface/c/icdata.cpp:445
28 0x00000000034c6c15 idata_mp_xios_send_field_r8_1d_() mcgrathm/TRUNK.HEAD/modeles/XIOS/ppsrc/xios/interface/fortran/idata.f90:461
29 0x0000000001b18db0 xios_orchidee_mp_xios_orchidee_send_field_r1d_() mcgrathm/TRUNK.HEAD/modeles/ORCHIDEE/build/ppsrc/parallel/xios_orchidee.f90:846
30 0x0000000001132d0f stomate_lpj_mp_stomate_lpj_vegetation_() /ccc/workmcgrathm/TRUNK.HEAD/modeles/ORCHIDEE/build/ppsrc/stomate/stomate_lpj.f90:3039
31 0x0000000000bb2912 stomate_mp_stomate_main_() modeles/ORCHIDEE/build/ppsrc/stomate/stomate.f90:2991
32 0x00000000009ed5f2 slowproc_mp_slowproc_main_() mcgrathm/TRUNK.HEAD/modeles/ORCHIDEE/build/ppsrc/sechiba/slowproc.f90:936
33 0x0000000000932c7d sechiba_mp_sechiba_main_() mcgrathm/TRUNK.HEAD/modeles/ORCHIDEE/build/ppsrc/sechiba/sechiba.f90:1212
34 0x00000000005b26dc intersurf_mp_intersurf_main_2d_() mcgrathm/TRUNK.HEAD/modeles/ORCHIDEE/build/ppsrc/sechiba/intersurf.f90:584
35 0x0000000000511ef1 MAIN() mcgrathm/TRUNK.HEAD/modeles/ORCHIDEE/build/ppsrc/orchidee_ol/dim2_driver.f90:1285
36 0x000000000044d1ce main() ??:0
37 0x0000000000022545
libc_start_main() ??:0
38 0x000000000044d0e9 _start() ??:0
===================
./run_file_fuv0yr: line 6: 58308: Floating exception
srun: error: irene1275: task 0: Floating point exception
srun: Terminating job step 6558918.0

This corresponds to a line in stomate_lpj.f90:

CALL xios_orchidee_send_field("rhLitter",SUM(resp_hetero_litter*veget_max_hist(:,:,iage),dim=2)/1e3/one_day)

Change History (4)

comment:1 Changed 5 months ago by mmcgrath

Printing out the values passed to XIOS reveal nothing.

write(numout,*) "jifoew litter 2a ",resp_hetero_litter(:,:)
write(numout,*) "jifoew litter 2b ",veget_max_hist(:,:,iage)
write(numout,*) "jifoew litter 2c ",iage,one_day
write(numout,*) "jifoew litter 2d ",SUM(resp_hetero_litter*veget_max_hist(:,:,iage),dim=2)/1e3/one_day

gives

jifoew litter 2a 0.000000000000000E+000 0.000000000000000E+000

0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000

jifoew litter 2b 7.542555323038974E-002 0.000000000000000E+000

0.000000000000000E+000 2.126679958694316E-002 1.118539327006717E-002
1.613134121093276E-002 5.223430839837326E-002 1.756854725394173E-002
5.361974694990012E-003 0.113672534159015 2.058946976818429E-003
0.372207578776323 6.383289201626642E-002 0.000000000000000E+000
0.249054130425939

jifoew litter 2c 5 86400.0000000000
jifoew litter 2d 0.000000000000000E+000

The line immediately before, which does the same for the XIOS variable rh, is very similar and has no problems.

The problem happens even if the hist_level for writing the variable rhLitter is set to 10 (i.e., the variable is not written out).

I confirmed that both XIOS and ORCHIDEE are compiled in debug mode by looking through the compilation output.

comment:2 Changed 5 months ago by mmcgrath

I have also placed a write statement immediately after the call to XIOS. This write statement is never printed out, confirming that the code stops in the XIOS call above.

comment:3 Changed 5 months ago by mmcgrath

If this line is commented out, the code runs without stopping for the full year.

comment:4 Changed 5 months ago by luyssaert

  • Resolution set to fixed
  • Status changed from new to closed

There was a divide by zero in the XIOS field definition. The operation was taken out of XIOS and moved into ORCHIDEE where the divide by zero was taken care of. Changes have been committed in r7057.

Note: See TracTickets for help on using tickets.