Opened 9 months ago

Closed 4 months ago

#723 closed defect (fixed)

XIOS crash writing fDeforestToProduct with land use change

Reported by: mmcgrath Owned by: luyssaert
Priority: minor Milestone: ORCHIDEE 4.1
Component: Anthropogenic processes Version:
Keywords: Cc:

Description

Trying a global FG2 run without using a restart file on Irene, 127 CPUs, using an optimized executable for r6870 and the following differences from the svn r6869 of OOL_SEC_STO_FG2:

*****************************
Running diff for orchidee.def.
16,18c16
< USE_RESERVE_N = y
< OK_DYNROOT_HA = y
< HACK_E_FRAC = y
---
> 
191,192c189
< OK_READ_FM_MAP=n
< FOREST_MANAGED_FORCED=1
---
> OK_READ_FM_MAP=y

I get a crash in year 91.

In file "nc4_data_output.cpp", function "void xios::CNc4DataOutput::writeFieldData_(xios::CField *)",  line 2669 -> On writing field data: fDeforestToProduct
In the context : orchidee_server
Error when calling function ncPutVaraType(ncid, varId, start, count, data)
NetCDF: Numeric conversion not representable
Unable to write data given the location id: 196608 and the variable whose id: 99 and name: fDeforestToProduct


(1) **************** void cxios_init_server()

(2) **************** bool xios::CContext::checkBuffersAndListen(bool)
Object id="orchidee_server" object type="context"
*** XIOS attributes as defined in XML file(s) or via Fortran interface:
[]
*** Additional information:
[enabled files="sechiba1 sechiba3 stomate1 stomate2 stomate3 "]

(3) **************** static bool xios::CField::dispatchEvent(xios::CEventServer &)

(4) **************** static void xios::CField::recvUpdateData(xios::CEventServer &)

(5) **************** void xios::CField::recvUpdateData(std::map<int, xios::CBufferIn *, std::less<int>, std::allocator<st...)
Object id="__field_undef_id_736" object type="field"
*** XIOS attributes as defined in XML file(s) or via Fortran interface:
[compression_level="2" default_value="9.96921e+36" detect_missing_value="true" enabled="true" field_ref="fDeforestToProduct" freq_offset="0ts" freq_op="1ts" grid_ref="grid_landpoints_out" level="5" long_name="Decomposition out of product pools to CO2 (positive from land to atm)" ]
*** Additional information:
[]

(6) **************** void xios::CField::setData(const xios::CArray<double, N> &) [with int N = 1]
Object id="__field_undef_id_736" object type="field"
*** XIOS attributes as defined in XML file(s) or via Fortran interface:
[compression_level="2" default_value="9.96921e+36" detect_missing_value="true" enabled="true" field_ref="fDeforestToProduct" freq_offset="0ts" freq_op="1ts" grid_ref="grid_landpoints_out" level="5" long_name="Decomposition out of product pools to CO2 (positive from land to atm)" ]
*** Additional information:
[]

(7) **************** void xios::CField::writeUpdateData(const xios::CArray<double, 1> &)
Object id="__field_undef_id_736" object type="field"
*** XIOS attributes as defined in XML file(s) or via Fortran interface:
[compression_level="2" default_value="9.96921e+36" detect_missing_value="true" enabled="true" field_ref="fDeforestToProduct" freq_offset="0ts" freq_op="1ts" grid_ref="grid_landpoints_out" level="5" long_name="Decomposition out of product pools to CO2 (positive from land to atm)" ]
*** Additional information:
[]

(8) **************** void xios::CField::writeField()
Object id="__field_undef_id_736" object type="field"
*** XIOS attributes as defined in XML file(s) or via Fortran interface:
[compression_level="2" default_value="9.96921e+36" detect_missing_value="true" enabled="true" field_ref="fDeforestToProduct" freq_offset="0ts" freq_op="1ts" grid_ref="grid_landpoints_out" level="5" long_name="Decomposition out of product pools to CO2 (positive from land to atm)" ]
*** Additional information:
[]

(9) **************** void xios::CDataOutput::writeFieldData(xios::CField *)

      File                                    Function                                                                                                  Line
(9)   data_output.cpp                         void xios::CDataOutput::writeFieldData(xios::CField *)                                                    125
(8)   field.cpp                               void xios::CField::writeField()                                                                           307
(7)   field.cpp                               void xios::CField::writeUpdateData(const xios::CArray<double, 1> &)                                       277
(6)   field_impl.hpp                          void xios::CField::setData(const xios::CArray<double, N> &) [with int N = 1]                              19
(5)   field.cpp                               void xios::CField::recvUpdateData(std::map<int, xios::CBufferIn *, std::less<int>, std::allocator<st...)  233
(4)   field.cpp                               static void xios::CField::recvUpdateData(xios::CEventServer &)                                            213
(3)   field.cpp                               static bool xios::CField::dispatchEvent(xios::CEventServer &)                                             112
(2)   context.cpp                             bool xios::CContext::checkBuffersAndListen(bool)                                                          423
(1)   icdata.cpp                              void cxios_init_server()                                                                                  52
terminate called after throwing an instance of 'xios::CException'
forrtl: error (76): Abort trap signal
Image              PC                Routine            Line        Source
xios.x             000000000104CDFA  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B5A4B370630  Unknown               Unknown  Unknown
libc-2.17.so       00002B5A4B5B3377  gsignal               Unknown  Unknown
libc-2.17.so       00002B5A4B5B4A68  abort                 Unknown  Unknown
libstdc++.so.6.0.  00002B5A4835F015  _ZN9__gnu_cxx27__     Unknown  Unknown
libstdc++.so.6.0.  00002B5A4835CDE6  Unknown               Unknown  Unknown
libstdc++.so.6.0.  00002B5A4835CE31  Unknown               Unknown  Unknown
libstdc++.so.6.0.  00002B5A4835D0C9  __cxa_rethrow         Unknown  Unknown
xios.x             00000000004893D4  Unknown               Unknown  Unknown
xios.x             0000000000446B19  Unknown               Unknown  Unknown
xios.x             000000000107C022  Unknown               Unknown  Unknown
libc-2.17.so       00002B5A4B59F545  __libc_start_main     Unknown  Unknown
xios.x             0000000000446A29  Unknown               Unknown  Unknown
/ccc/scratch/cont003/drf/mcgrathm/RUN_DIR/5307486_15407/FG2.fDeforest.r6870.15407/./run_file_eWoBNv: line 768: 3090: Abort
srun: error: irene1424: task 127: Aborted
srun: Terminating job step 5307486.90
slurmstepd-irene1421: error: *** STEP 5307486.90 ON irene1421 CANCELLED AT 2020-09-12T04:43:59 ***

First task is to create a smaller reproducible test case. The files are found on Irene, mcgrathm/IGCM_OUT/OL2/TEST/test/FG2.fDeforest.r6870/.

Change History (4)

comment:1 Changed 9 months ago by mmcgrath

  • Priority changed from blocker to major

Unclear if this is still an issue in r6874, as I was able to run a full FG1, FG1trans, and FG2 with no crash, despite the recent commits not seeming to directly address this. Possibly a memory bug that is getting moved around? It's no longer a blocker, at any rate. Not sure what to rate it as, so putting it "major".

comment:2 Changed 8 months ago by luyssaert

  • Priority changed from major to minor

comment:3 Changed 4 months ago by luyssaert

  • Owner changed from somebody to luyssaert
  • Status changed from new to assigned

Rerun this setup with r7089. Compiled in production mode, split in the world in 4 quarters that overlap with each other by 2 degrees (-180,2,-2,90; -2,180,2,-90;...), used FG2 configuration (changed OK_READ_FM_MAP=y). The setup did not crash. The history files contained values (up to 300 g m-2 y-1).

comment:4 Changed 4 months ago by luyssaert

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.