Opened 3 years ago

Closed 3 years ago

#470 closed defect (fixed)

orchideedriver corrupted memory error due to missmatch between nbland_loc vs nbpoint_loc

Reported by: ajornet Owned by: somebody
Priority: major Milestone: ORCHIDEE 2.0
Component: Driver files Version: trunc
Keywords: orchideedriver segmentation fault Cc:

Description

orchideedriver fails due to memory corruption. Check below the error message found:

*** glibc detected *** ./orchideedriver: malloc(): memory corruption: 0x000000001ba17af0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75dee)[0x2b868dd67dee]
/lib64/libc.so.6(+0x7a3aa)[0x2b868dd6c3aa]
/lib64/libc.so.6(__libc_malloc+0x5c)[0x2b868dd6caac]
/smplocal/intel/compilers_and_libraries_2018.2.199/linux/compiler/lib/intel64/libintlc.so.5(_mm_malloc+0x1c)[0x2b868da97c1c]
/smplocal/intel/compilers_and_libraries_2018.2.199/linux/compiler/lib/intel64/libifcoremt.so.5(for_allocate+0x2db)[0x2b868b8be2ab]
...

After some debugging, the issue takes place in this subroutine:

  SUBROUTINE forcing_givegrid (lon, lat, mask, area, corners, lindex, contfrac, calendar_tmp)
...
    REAL(r_std), INTENT(out) :: lon(iim_loc,jjm_loc), lat(iim_loc,jjm_loc)
    REAL(r_std), INTENT(out) :: mask(iim_loc,jjm_loc)
    REAL(r_std), INTENT(out) :: area(iim_loc,jjm_loc)
    REAL(r_std), INTENT(out) :: corners(iim_loc,jjm_loc,4,2)
    INTEGER(i_std), INTENT(out) :: lindex(nbpoint_loc)  <--- HERE
    REAL(r_std), INTENT(out) :: contfrac(nbpoint_loc)
...

The input argument lindex is explicitly defined as nbpoint_loc. But the given lindex is defined as nbland_loc.

In this particular case, it's 427 vs 436. That's why the memory is corrupted.

Check the forcing_givegrid calleer:

  SUBROUTINE globgrd_getgrid(fid, iim, jjm, nbland, model_guess, lon, lat, mask, area, corners, &
       &                     lindex, contfrac, calendar)
    
    INTEGER(i_std), INTENT(in)   :: fid
    INTEGER(i_std), INTENT(in)   :: iim, jjm, nbland
    CHARACTER(LEN=*), INTENT(in) :: model_guess
    !
    ! OUTPUT
    !
    REAL(r_std),DIMENSION(iim,jjm), INTENT(out)     :: lon, lat, mask, area
    REAL(r_std),DIMENSION(iim,jjm,4,2), INTENT(out) :: corners
    INTEGER(i_std), DIMENSION(nbland), INTENT(out)  :: lindex  <--- HERE
    REAL(r_std),DIMENSION(nbland), INTENT(out)      :: contfrac
    CHARACTER(LEN=20), INTENT(out)                  :: calendar
    !

...
   CALL forcing_givegrid(lon, lat, mask, area, corners, lindex, contfrac, calendar)
...

When checking how nbland_loc and nbpoint_loc are calculated:

In the subroutine forcing_zoomgrid, nbpoint_loc is defined as:

          lon_loc(i,j) = longlo_tmp(imin(1),jmin(1))
          lat_loc(i,j) = lat_glo(imin(1),jmin(1))
          mask_loc(i,j) = mask_glo(imin(1),jmin(1))
          !
          zoom_index(i,j,1) = imin(1)
          zoom_index(i,j,2) = jmin(1)
          !
       ENDDO
    ENDDO
    !
    nbpoint_loc = SUM(mask_loc) <--- HERE

mask_loc only has two possible different values, 0 or 1.

In the subroutine forcing_zoomgrid, nbland_loc is defined as:

             !
             contfrac_loc(ik) = contfrac_glo(origind(i,j))
             !
             lalo(ik,1) = lat_glo(i,j)
             lalo(ik,2) = longlo_tmp(i,j)
             !
          ENDIF
       ENDDO
    ENDDO
    !
    !
    nbland_loc = SUM(contfrac_loc)

contfrac_loc have values that range from 0 to 1. So the sum of each position does not describe how many gridcells have land (> 0.0).

Proposed fix:

             !
             contfrac_loc(ik) = contfrac_glo(origind(i,j))
             !
             lalo(ik,1) = lat_glo(i,j)
             lalo(ik,2) = longlo_tmp(i,j)
             !
          ENDIF
       ENDDO
    ENDDO
    !
    !
    ! nbland_loc = SUM(contfrac_loc) <- old line
    nbland_loc = 0
    DO ik=1, SIZE(contfrac_loc)
       IF (contfrac_loc(ik) > 0.0) THEN
          nbland_loc = nbland_loc + 1.0
       ENDIF
    ENDDO


As well as an extra check to avoid potential issues in the future:

  SUBROUTINE forcing_givegrid (lon, lat, mask, area, corners, lindex, contfrac, calendar_tmp)
    !
    ! This subroutine will return to the caller the grid which has been extracted from the
    ! the forcing file. It is assumed that the caller has called forcing_givegridsize before
    ! and knows the dimensions of the fields and thus has done the correct allocations.
    !
    !
    REAL(r_std), INTENT(out) :: lon(iim_loc,jjm_loc), lat(iim_loc,jjm_loc)
    REAL(r_std), INTENT(out) :: mask(iim_loc,jjm_loc)
    REAL(r_std), INTENT(out) :: area(iim_loc,jjm_loc)
    REAL(r_std), INTENT(out) :: corners(iim_loc,jjm_loc,4,2)
    INTEGER(i_std), INTENT(out) :: lindex(nbpoint_loc)
    REAL(r_std), INTENT(out) :: contfrac(nbpoint_loc)
    CHARACTER(LEN=20), INTENT(out) :: calendar_tmp
    !
    IF ( .NOT. is_root_prc ) THEN
       CALL ipslerr (3,'forcing_givegrid'," This routine can only be called on the root processor.", &
            &          "The information requested is only available on root processor.", " ")
    ENDIF
    !
    IF (nbpoint_loc .NE. nbland_loc) THEN ! <- NEW CHECK
       WRITE(numout, *) "forcing_givegrid:: nbpoint_loc=", nbpoint_loc
       WRITE(numout, *) "forcing_givegrid:: nbland_loc=", nbland_loc
       CALL ipslerr(3,'forcing_givegrid','nbpoint_loc and nbland_loc do match','','')
    ENDIF

Change History (1)

comment:1 Changed 3 years ago by jgipsl

  • Resolution set to fixed
  • Status changed from new to closed

Commit in trunk [5653]

Note: See TracTickets for help on using tickets.