Opened 7 years ago

Closed 6 years ago

#1307 closed Bug (fixed)

Slow start-ups with v3.6 on some systems

Reported by: acc Owned by: acc
Priority: low Milestone:
Component: OCE Version: release-3.6
Severity: Keywords:
Cc:

Description

I've been experiencing very slow start-ups on a new cluster with a lustre filesystem. It turns out to be the prevalence of output statements such as:

WRITE ( numond, namsplit )

in the code which write out the results of merging the new style namelists. As it stands, every processing element writes this information to the same file which is unnecessary and, it seems, liable to cause a major problem on some systems. I propose to alter all writes to numond, numoni, numonp, numont, numonb and numonc by prepending a IF(lwp). There will also be a slight change in nemogcm.F90 to make sure the first writes occur after the definition of lwp.

This will mean small changes in about 113 files. I'll make the changes next week unless there are any objections. One remaining problem will be the re-occurrance of the behaviour if ln_ctl is used. Should we introduce a new lwp equivalent that isn't altered by ln_ctl?

Commit History (0)

(No commits)

Change History (4)

comment:1 Changed 7 years ago by smasson

why not using (narea == 1) as in the definition of lwp in nemo_init?
narea is defined in dom_oce, so it should be known by (almost?) all routines

comment:2 Changed 7 years ago by acc

..don't really want to clutter over a 100 files with that construct. I was planning to introduce a logical write master (lwm) set to (narea ==1). Something like:

   LOGICAL       ::   lwm      = .FALSE.    !: boolean : true on the 1st processor only (always)
   LOGICAL       ::   lwp      = .FALSE.    !: boolean : true on the 1st processor only .OR. ln_ctl

then all the namelist output statements become simply:

IF(lwm) WRITE ( numond, namsplit )

comment:3 Changed 7 years ago by gm

Nice idea Andrew!
lwm is cool

Gurvan

comment:4 Changed 6 years ago by acc

  • Resolution set to fixed
  • Status changed from new to closed

Fixed at revision 4624. Note log message at this revision erroneously referred to ticket #1305. Should have been this one (#1307). Log message should have read:

#1307. Fix slow start-up problems on some systems by introducing and using lwm logical to restrict output of merged namelists to the first (or only) processor. lwm is true only on the first processor regardless of ln_ctl. Small changes to all flavours of nemogcm.F90 are also required to write namctl and namcfg after the call to mynode which now opens output.namelist.dyn and writes nammpp.

Note: See TracTickets for help on using tickets.