Version 3 (modified by frrh, 4 years ago) (diff)

Bugs in NEMO-CICE-MEDUSA

Testing on the Met Office CrayXC40, using working copy u-aj777RHdebug and its variants we start by trying a 2-day (2x1-day cycles) set up with array bounds testing. Using -Rb for both NEMO and CICE code compilation we find:

  • TOP_SRC/MEDUSA/sms_medusa.F90 doesn't compile - ierr is OOB - it needs to be declared with a dimension of 8.
  • CICE generates:
      lib-4961 : WARNING 
      Subscript -330 is out of range for dimension 2 for array
      'array_g' at line 2206 in file 'ice_gather_scatter.F90' with bounds 1:332.
    
       ARRAY_G 2nd index is OOB in;
    
                  msg_buffer(i,j) = ARRAY_G(this_block%i_glob(i)+nghost,&
                                          this_block%j_glob(j)+nghost)
    
    Heaven knows why we have a negative number here!?

This doesn't cause a failure because its an OOB read. I suspect it would cause a failure if it was an OOB write.

  • We then get a failure in trc_nam_medusa.F90:
      lib-4213 : UNRECOVERABLE library error 
      A pointer or allocatable array in an I/O list has not been associated
      or allocated.
    
       Encountered during a namelist WRITE to unit 27
       Fortran unit 27 is connected to a sequential formatted text file:
         "output.namelist.pis"
    

This error message doesn't lead us to any particular statement or line number which is a bit rubbish… the compiler must know but it doesn't bother to tell us.

  • trcnam_medusa.F90 has a section where it's initialising variables from the

natbio namelist. However it initialises jdms_input twice thus…

jdms_input = 0 jdms_input = 3

Why?

jdms_model is not initialised at all - is the 2nd occurrence supposed to refer to that?

jq10 is not initialised.

Some variables are declared twice in natbio. e.g. vsed, xhr

  • Writing of natbio causes the above error. Suggesting something in that namelist is unset.

Skipping that, we get a similar error writing natroam! Skip that and natopt seems to be OK but it's the only one of the three namelists that is. The model then goes on to complete (and completes a 2nd 1-day cycle OK).

  • The code also refers to a namelist named "nammeddia", but we have no such namelist. Our namelists refer to something called "nammedia" (only one "d") Presumably that's a typo. JP says this is not currently used in our configurations (if it was, it would crash looking for a missing namelist!)
  • Checking job.err, the only warning we have is the one about the ARRAY_G reference in CICE. This is present in both the NRUN and teh CRUN (why wouldn't it be?)

So we have a number of things to do:

  1. Correct ierr dimension to 8 in sms_medusa.F90
  2. Remove duplicate variable declarations in natbio
  3. Ensure missing fields are given default values in natbio
  4. Replace the 2nd occurrence of jdms_input with jdms_model, presumably
  5. Investigate why the namelist writes fail.

Checking with JP, he confirms that jdms_model is what is meant to be used in the above, with a default of 3, and jq10 should be given a default of 1.5. I've applied these to my branch and JP has also applied them to the main MEDUSA_stable branch. Ditto the ierr dimensioning fix.

Further investigation into why printing natbio contents fails reveals that the FRIVER_DEP array is unallocated at time of the attempted write. natroam seems to have similar issues with other fields. It seems that these arrays are only initialised some time after the writing of the namelist contents (if at all) in sms_medusa_alloc. The various arrays involved seem to be dimensioned by jpi, jpj and jpk. so it seems doubtful that these would ever be given values by the namelist input file! They'd hvae to be massive (over a million different values in an ORCA1 res). So is there really any point in having these as part of the namelist definition?