New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
#2653 (Namelist quirks with Cray compilers (trunk only)) – NEMO

Opened 3 years ago

Last modified 3 years ago

#2653 assigned Defect

Namelist quirks with Cray compilers (trunk only)

Reported by: acc Owned by: acc
Priority: low Milestone:
Component: TOP Version: trunk
Severity: minor Keywords:
Cc:

Description

Context

The behaviour of internal file based namelists differs in codes compiled with or without MPI using recent Cray compilers. Expected behaviour can be obtained if, and only if, every .true. or .false. stand-alone assignment is followed by a comma. These commas should not be necessary and are not necessary with intel or gfortran compilers (or even Crayftn without MPI??). This has been reported to Cray. Meanwhile, some simple scripts (see below) can be used to add commas. Details follow, including a simple program that reproduces the behaviour.

Analysis

The background


The current development version of NEMO uses internal files for its namelists. This is to avoid filesystem accesses; instead rank zero reads the entire namelist, strips comments and broadcasts the entire character buffer to everyone else. This is working on a variety of systems (including ARCHER).

On Archer2, I've been encountering errors which the model assumes are misspelt variables in the namelist but the input namelists are fine. It turns out I can replicate these errors with a simple namelist; but, strangely, only in an MPI environment.

The example program


The program: errornml_mpi.F90 (listed below) defines a simple namelist (as a character string) and broadcasts this to other ranks. The program can also be compiled without mpi. Two different namelists can be used, decided by a command-line argument: g (for good) or b (for bad). The two namelists differ only by the
addition of commas immediately after .true. or .false. settings. This is my work-around because it works with the commas but not without them. Commas should not be necessary (and aren't necessary for all the real, integer and character variables in the full namelist).

Currently Loaded compiler:

ftn -V
Cray Fortran : Version 10.0.4

In non-mpi mode, I find no difference in behaviour between the two namelists:

ftn -o errornml -O1 errornml_mpi.F90
 
./errornml g
The whole internal file:
&namctl sn_cfctl%l_trcstat=.true.,  sn_cfctl%l_runstat=.true., /
----
namctl in namelist_cfg:
 &NAMCTL  SN_CFCTL = T, T, F, F, F, F, F, 0, 1000000, 1, 1, LN_TIMING = F,
 LN_DIACFL = F, NN_ISPLT = 0, NN_JSPLT = 0, NN_ICTLS = 0, NN_ICTLE = 0, NN_JCTLS
 = 0, NN_JCTLE = 0 /
----
 STOP
 
./errornml b
The whole internal file:
&namctl sn_cfctl%l_trcstat=.true.  sn_cfctl%l_runstat=.true. /
----
namctl in namelist_cfg:
 &NAMCTL  SN_CFCTL = T, T, F, F, F, F, F, 0, 1000000, 1, 1, LN_TIMING = F,
 LN_DIACFL = F, NN_ISPLT = 0, NN_JSPLT = 0, NN_ICTLS = 0, NN_ICTLE = 0, NN_JCTLS
 = 0, NN_JCTLE = 0 /
----
 STOP

With MPI and srun (even with just 1 rank) the behaviour changes:

ftn -o errornml_mpi -DUSE_MPI  -O1 errornml_mpi.F90
srun -n 1 ./errornml_mpi g
The whole internal file:
&namctl sn_cfctl%l_trcstat=.true.,  sn_cfctl%l_runstat=.true., /
----
namctl in namelist_cfg:
 &NAMCTL  SN_CFCTL = T, T, F, F, F, F, F, 0, 1000000, 1, 1, LN_TIMING = F,
 LN_DIACFL = F, NN_ISPLT = 0, NN_JSPLT = 0, NN_ICTLS = 0, NN_ICTLE = 0, NN_JCTLS
 = 0, NN_JCTLE = 0 /
----
 STOP
 
srun -n 1 ./errornml_mpi b
The whole internal file:
&namctl sn_cfctl%l_trcstat=.true.  sn_cfctl%l_runstat=.true. /
----
namctl in namelist_cfg:
Not found or bad
 
Not found or bad
 
MPICH ERROR [Rank 0] [job id 42135.24] [Sat Nov  7 13:28:42 2020] [unknown]
[nid001114] - Abort(25) (rank 0 in comm 0): application called
MPI_Abort(MPI_COMM_WORLD, 25) - process 0

and just to prove the MPI is working:

srun -n 2 ./errornml_mpi g
The whole internal file:
&namctl sn_cfctl%l_trcstat=.true.,  sn_cfctl%l_runstat=.true., /
----
namctl in namelist_cfg:
 &NAMCTL  SN_CFCTL = T, T, F, F, F, F, F, 0, 1000000, 1, 1, LN_TIMING = F,
 LN_DIACFL = F, NN_ISPLT = 0, NN_JSPLT = 0, NN_ICTLS = 0, NN_ICTLE = 0, NN_JCTLS
 = 0, NN_JCTLE = 0 /
----
 STOP
----
namctl in namelist_cfg:     1
The whole internal file:
&namctl sn_cfctl%l_trcstat=.true.,  sn_cfctl%l_runstat=.true., /
----
 &NAMCTL  SN_CFCTL = T, T, F, F, F, F, F, 0, 1000000, 1, 1, LN_TIMING = F,
 LN_DIACFL = F, NN_ISPLT = 0, NN_JSPLT = 0, NN_ICTLS = 0, NN_ICTLE = 0, NN_JCTLS
 = 0, NN_JCTLE = 0 /
----
 STOP

I can't explain this so I'm assuming it is a compiler bug but it is odd that it seems impossible to replicate without the mpi environment.

The work-around is to add commas (see scripts below) but it is tedious to have to maintain different namelists for different systems and Cray compilers are alone in this behaviour. The actual namelists are some 38k characters but it is only logical variables that need these commas appended.

Recommendation

Wait for Cray to fix their compiler. Meanwhile, run the edit_nmls script in the directory above tests and cfgs on each new checkout before building any configurations with Cray compilers. reverse_edit_nmls should reverse changes before any check-ins. Permanently adding the commas is an option but it is difficult to ensure they are maintained.

errornml_mpi.F90

   PROGRAM errornml
#if defined USE_MPI
      USE MPI
#endif
      CHARACTER(LEN=:), ALLOCATABLE    :: cdnambuff
      CHARACTER(LEN=100)               :: this_works='&namctl sn_cfctl%l_trcstat=.true.,  sn_cfctl%l_runstat=.true., /'
      CHARACTER(LEN=100)               :: this_breaks='&namctl sn_cfctl%l_trcstat=.true.  sn_cfctl%l_runstat=.true. /'
      CHARACTER(LEN=100)               :: use_this
      INTEGER                          :: kout = 6
      LOGICAL                          :: ldwp =.FALSE. !: .true. only for the root broadcaster
      INTEGER                          :: itot, ierr, mpprank, mppsize
      TYPE :: sn_ctl
         LOGICAL :: l_runstat = .FALSE.
         LOGICAL :: l_trcstat = .FALSE.
         LOGICAL :: l_oceout  = .FALSE.
         LOGICAL :: l_layout  = .FALSE.
         LOGICAL :: l_prtctl  = .FALSE.
         LOGICAL :: l_prttrc  = .FALSE.
         LOGICAL :: l_oasout  = .FALSE.
         INTEGER :: procmin   = 0
         INTEGER :: procmax   = 1000000
         INTEGER :: procincr  = 1
         INTEGER :: ptimincr  = 1
      END TYPE
      TYPE(sn_ctl), SAVE :: sn_cfctl
      LOGICAL ::   ln_timing
      LOGICAL ::   ln_diacfl
      INTEGER ::   nn_ictls
      INTEGER ::   nn_ictle
      INTEGER ::   nn_jctls
      INTEGER ::   nn_jctle
      INTEGER ::   nn_isplt
      INTEGER ::   nn_jsplt
      NAMELIST/namctl/ sn_cfctl, ln_timing, ln_diacfl,                                &
         &             nn_isplt,  nn_jsplt,  nn_ictls, nn_ictle, nn_jctls, nn_jctle
      character(len=32)           :: arg
      integer                     :: iarg

      use_this=this_works
      do iarg = 1, command_argument_count()
        call get_command_argument(iarg, arg)

        select case (arg)
            case ('g', 'good')
                use_this = this_works
            case ('b', 'bad')
                use_this = this_breaks
        end select
      end do
!
# if defined USE_MPI
         CALL mpi_init( ierr )
         CALL mpi_comm_size( MPI_COMM_WORLD, mppsize, ierr )
         CALL mpi_comm_rank( MPI_COMM_WORLD, mpprank, ierr )
         if( mpprank .eq. 0 ) ldwp = .true.
         if(ldwp) THEN
#endif
         !
         ! Test the contents of the internal file
         !
         ! Write it all out:
           ! hand-crafted substitute for the load_nml subroutine which reads whole namelists and
           ! condenses them into a character string
             itot=len_trim(use_this)
             IF ( .NOT. ALLOCATED(cdnambuff) ) ALLOCATE( CHARACTER(LEN=itot) :: cdnambuff )
             cdnambuff=TRIM(use_this)
           !
           write(6,'(a)') 'The whole internal file: '
           write(6,'(32A)') cdnambuff
           write(6,'(a)')'----'

# if defined USE_MPI
           call mpp_bcast_nml( cdnambuff , itot )
#endif

           write(6,'(a)') 'namctl in namelist_cfg: '
           read(cdnambuff, namctl, iostat=ios, end=99, err=99)
           write(6,namctl)
           write(6,'(a)')'----'
           goto 101
           !
    99     write(6,'(a)') 'Not found or bad ',ios
           goto 98
           !
   101     CONTINUE
# if defined USE_MPI
         ELSE
           call mpp_bcast_nml( cdnambuff , itot )
           write(*,'(a)')'----'
           write(*,'(a,i5)') 'namctl in namelist_cfg: ',mpprank
           write(6,'(a)') 'The whole internal file: '
           write(6,'(32A)') cdnambuff
           write(6,'(a)')'----'
           read(cdnambuff, namctl, iostat=ios, end=98, err=98)
           write(*,namctl)
           write(*,'(a)')'----'
         ENDIF
         call MPI_BARRIER(MPI_COMM_WORLD, ierr)
         call MPI_FINALIZE(ierr)
#endif
         STOP
    98   write(6,'(a)') 'Not found or bad ',ios
# if defined USE_MPI
         CALL MPI_ABORT(MPI_COMM_WORLD, ios, ierr)
  CONTAINS

   SUBROUTINE mpp_bcast_nml( cdnambuff , kleng )
      CHARACTER(LEN=:)    , ALLOCATABLE, INTENT(INOUT) :: cdnambuff
      INTEGER                          , INTENT(INOUT) :: kleng
      !!----------------------------------------------------------------------
      !!                  ***  routine mpp_bcast_nml  ***
      !!
      !! ** Purpose :   broadcast namelist character buffer
      !!
      !!----------------------------------------------------------------------
      !!
      INTEGER ::   iflag
      !!----------------------------------------------------------------------
      !
      call MPI_BCAST(kleng, 1, MPI_INT, 0, MPI_COMM_WORLD, iflag)
      call MPI_BARRIER(MPI_COMM_WORLD, iflag)
      IF ( .NOT. ALLOCATED(cdnambuff) ) ALLOCATE( CHARACTER(LEN=kleng) :: cdnambuff )
      call MPI_BCAST(cdnambuff, kleng, MPI_CHARACTER, 0, MPI_COMM_WORLD, iflag)
      call MPI_BARRIER(MPI_COMM_WORLD, iflag)
      !
   END SUBROUTINE mpp_bcast_nml
#endif

  END PROGRAM errornml

edit_nmls (better suggestions are welcome):

#!/bin/bash
for d in cfgs tests
do
cd $d
for nml in `find ./ -name '*namelist_*'`
do
chk=$(grep -c -i -e"= *.false.," -e"= *.true.," $nml)
if test $chk -eq 0 ; then
echo "Changing : "$nml
ed - $nml << EOF
%s/=\( *\).false./=\1.false.,/
w
q
EOF
ed - $nml << EOF
%s/=\( *\).FALSE./=\1.FALSE.,/
w
q
EOF
ed - $nml << EOF
%s/=\( *\).true./=\1.true.,/
w
q
EOF
ed - $nml << EOF
%s/=\( *\).TRUE./=\1.TRUE.,/
w
q
EOF
else
echo $nml " may have already been processed: "$chk" lines already correct"
fi
done
cd ../
done

reverse_edit_nmls:

cat reverse_edit_nmls
#!/bin/bash
for d in cfgs tests
do
cd $d
for nml in `find ./ -name '*namelist_*'`
do
chk=$(grep -c -i -e"= *.false.," -e"= *.true.," $nml)
if test $chk -ne 0 ; then
echo "Changing : "$nml
ed - $nml << EOF
%s/=\( *\).false.,/=\1.false./
w
q
EOF
ed - $nml << EOF
%s/=\( *\).FALSE.,/=\1.FALSE./
w
q
EOF
ed - $nml << EOF
%s/=\( *\).true.,/=\1.true./
w
q
EOF
ed - $nml << EOF
%s/=\( *\).TRUE.,/=\1.TRUE./
w
q
EOF
else
echo $nml " may have already been processed: "$chk" lines already reverted"
fi
done
cd ../
done

Commit History (0)

(No commits)

Change History (2)

comment:1 Changed 3 years ago by acc

  • Owner set to acc
  • Status changed from new to assigned
  • Version changed from v4.0.* to trunk

comment:2 Changed 3 years ago by andmirek

The solution works for crayftn 8.7.7. The implementation for MetOffce? configuration will be documented in ticket #2655

Note: See TracTickets for help on using tickets.