wiki:Documentation/UserGuide/HangCrash

Version 2 (modified by luyssaert, 9 years ago) (diff)

--

How to find where the model is hanging

You launch the model in a parallel run and you know from previous runs that the run should take, say 600 seconds. After 1200 seconds the model is still running. That looks suspicious! A likely cause of this problem is that one processor is hanging preventing the model to properly crash. Here is some advice:

Check whether the model really hangs

Open the Script file and search for Cd. You should find a path that looks like /ccc/scratch/cont003/dsm/p529grat/RUN_DIR/2440877_122089/ACLFb.122089. Go to that folder and check when the most recent changes were made and to which files. The time of the last changes should give you an indication of whether the model really hangs or whether you are just to impatient.

Allow to model to properly crash

!++++++++TEMP+++++++++
WRITE(numout,*) "This should be the last sentence in all CPUS!"> CALL MPI_BARRIER(MPI_COMM_ORCH,ierr)
CALL ipslerr_p (3,'forestry', 'Seeing if we reach this point...remove!','','')
!+++++++++++++++++++

Did you follow the "coding guidelines"? If not, it is time to do so! Check the coding guideline on the use of CALL ipslerr() instead of STOP. Replace all your STOP statements by a CALL to ipslerr(). Don't be lazy now and add proper information else ipslerr may do it job but you still won't know where the model crashed.

Make the model crash