Opened 4 years ago

Closed 11 months ago

#1710 closed Defect (fixed)

Instability in results due to new compilation environment

Reported by: nicolasmartin Owned by: mchekki
Priority: low Milestone: Unscheduled
Component: OCE Version: release-3.6
Severity: minor Keywords: HPC compilation optimization
Cc: mchekki

Description

Description

To resolve a crash issue on AMM12 configuration tested with the trusting on trunk, I have upgraded the Intel environment on Ada HPC from 2013.0 (ifort 13) to 2013.1 (ifort 14) for all trusting tests.
At the very next testing with the same version of the code, no more crash on AMM12 trunk but ORCA1LIM3 & AMM12 turned red on 3.6 branch. All outputs were widely modified, much more than an epsilon machine or what we could expect from this kind of upgrade.

Analysis

I have lead a comparison between the 2 releases of ifort with zero optimisation ('-O0') and then I recovered the identical results.
Maybe a part of the code is a little weak and can be alter by a newly default options either from the new compiler or either from a new optimization.

Recommendation

Not upgrading your computing environment unless you are stuck… ;-)
Several tests should be conducted in order to identify the option or optimization of the compiler and the routines implicated in this phenomenon.

Commit History (0)

(No commits)

Change History (9)

comment:1 Changed 4 years ago by nicolasmartin

  • Keywords compilation added; compiler removed

comment:2 Changed 4 years ago by nicolasmartin

  • Keywords nemo_v3_6* added

comment:3 Changed 3 years ago by clevy

  • Owner changed from nemo to mchekki

comment:4 Changed 2 years ago by clevy

  • Cc mchekki added
  • Status changed from new to assigned

comment:5 Changed 2 years ago by mchekki

Analysis

AMM12 config crashes using intel version 13, for O2 optimization, adding -check bounds solves the problem.

Moving down to O0 optimizations works also but this should be avoided as it slows down the run.

Moving to intel 15 (and higher) , no need to add -check bounds with O2 optimization.So probably the crash is due to a bug in version 13…

Still we get differences with solver.stat even using -O0, moving from version 15 to 16 ..(13→15 OK, 16→17 OK)

Intel recommendation for reproducibility of results

This is valid for processors with the same architecture.

Starting intel version 17, use :

-fp-model consistent

Before version 17 :

-fp-model source -fimf-arch-consistency=true -no-fma

I compared the solver.stat , with version 13/15/16/17 using "-O2 -r8 -traceback -fp-model source -fimf-arch-consistency=true -no-fma" and no differences were observed

comment:6 Changed 2 years ago by nemo

  • Keywords release-3.6* added; nemo_v3_6* removed

comment:7 Changed 2 years ago by nemo

  • Keywords release-3.6* removed

comment:8 Changed 2 years ago by nicolasmartin

  • Keywords nemo_v3_6* removed
  • Milestone changed from 2015 nemo_v3_6_STABLE to Unscheduled
  • Severity set to minor
  • Type changed from Bug to Defect

comment:9 Changed 11 months ago by gsamson

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.