New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
#2396 (Bug in fabm running in coupled mode) – NEMO

Opened 4 years ago

Closed 4 years ago

Last modified 2 years ago

#2396 closed Bug (fixed)

Bug in fabm running in coupled mode

Reported by: jcastill Owned by: mathiot
Priority: normal Milestone: 2020 WP
Component: TOP Version: v4.0
Severity: major Keywords: coupling diurnalcycle fabm radiation v4.0
Cc: dford, smasson, andmirek

Description

Context

The problem arises with inconsistencies in tracer initialization and radiation coupling when using diurnal cycling. Runs using ersem and wave coupling consistently crash, and also some atmosphere coupled runs.

Analysis

Radiation diurnal cycling is used only if the namelist variable ln_dm2dc=.true., both in coupled and uncoupled mode. When using diurnal cycling in a coupled run, it is required that the sum of the coupling frequency of all radiation terms is equal to one day, otherwise the program will stop with an error during coupling initialization (OCE routine sbc_cpl_rcv).

During tracer initialization (TOP routine trc_stp_ctl, which happens after coupling initialization), diurnal cycling is performed either when ln_dm2dc=.true., or when we have a coupled run and the coupling frequency of radiation terms is different to one day. The second case is wrong, as diurnal cycling will happen in coupled mode if there is no radiation coupling (for example, when coupling only to a wave model) or if the coupling frequency is not equal to one day, independently of the value of ln_dm2dc.

Fix

To solve the problem, it is enough to use diurnal cycling in tracer initialization only when ln_dm2dc, as what happens in coupled runs is already taken care of in the coupling routines, which also require ln_dm2dc=.true. That means substituting the lines in routine trc_stp_ctl:

      l_trcdm2dc = ln_dm2dc .OR. ( ln_cpl .AND. ncpl_qsr_freq /= 1 )
      l_trcdm2dc = l_trcdm2dc  .AND. .NOT. l_offline

by the following:

      l_trcdm2dc = ln_dm2dc  .AND. .NOT. l_offline

Commit History (3)

ChangesetAuthorTimeChangeLog
12621mathiot2020-03-27T20:27:46+01:00

remove branch 2020/ticket2396 (ticket #2396)

12552mathiot2020-03-13T15:51:19+01:00

ticket #2396: add discussed fix

12550mathiot2020-03-13T15:42:58+01:00

create branch for ticket #2396 (Bug in fabm running in coupled mode)

Change History (19)

comment:1 Changed 4 years ago by smasson

We cannot use the proposed solution as we want l_trcdm2dc = T even for example when ln_dm2dc = F but when we use for example a 2h coupling.
l_trcdm2dc is T as soon as the qsr is received or computed at higher frequency than 1 day.
ln_dm2dc is T when we use an analytical formulation to redistribute daily mean qsr into a time-step mean value including the diurnal cycle.

Which version and revision number of the code are you using?
This part of the code was modified in

This is available in release 4.0.2:
https://forge.ipsl.jussieu.fr/nemo/browser/NEMO/releases/release-4.0.2

comment:2 Changed 4 years ago by jcastill

Even when you want l_trcdm2dc equal to T as soon as the qsr is received or computed at higher frequency than 1 day, I still see some issues. If we go case by case:

. If we have a purely forced run (ln_cpl and ln_mixcpl=.false.), then ncpl_qsr_freq will be initialized to zero and the value of l_trcdm2dc will only depend on ln_dm2dc -> this seems OK
. If we have a coupled run then there are several situations possible:

  • We couple any of the atmospheric radiation terms 'O_QsrOce', 'O_QsrMix', 'I_QsrOce', or 'I_QsrMix'. The value of ncpl_qsr_freq will be different from zero. If ln_dm2dc is true, then ncpl_qsr_freq will be required to be equal to one, but l_trcdm2dc will be true because ln_dm2dc is true. If ln_dm2dc is false, ncpl_qsr_freq will have any value larger than zero, so l_trcdm2dc will be false if it is equal to one, and true otherwise -> is this the expected behaviour?
  • We couple to an atmospheric model, but we do not couple to any radiation term (for example because we provide these fields via forcing using ln_mixcpl=.true.), or we do not couple to an atmospheric model at all, but just to a wave model. In this case ncpl_qsr_freq will be a NaN (thanks to the line 'ncpl_qsr_freq = 86400 / ncpl_qsr_freq' in the sbc_cpl_rcv routine), and the value of l_trcdm2dc will possibly be TRUE independently of ln_dm2dc (but might be it is compiler dependent). Is in this case where we had a segmentation fault, and where I see the main problem is.

comment:3 Changed 4 years ago by jcastill

  • Status changed from new to assigned

comment:4 Changed 4 years ago by smasson

The first 2 points are ok to me.

You are right for the last point. I propose that in sbccpl, we replace

ncpl_qsr_freq = 86400 / ncpl_qsr_freq

by

IF( ncpl_qsr_freq /= 0) THEN   ;    ncpl_qsr_freq = 86400 / ncpl_qsr_freq
ELSE                           ;    ncpl_qsr_freq = 1   ! so l_trcdm2dc is controlled only by ln_dm2dc
ENDIF

comment:5 Changed 4 years ago by smasson

  • Cc smasson added

comment:6 Changed 4 years ago by jcastill

Another solution would be just to have in sbccpl:

IF( ncpl_qsr_freq /= 0) ncpl_qsr_freq = 86400 / ncpl_qsr_freq

and in trc_stp_ctl:

l_trcdm2dc = ln_dm2dc .OR. ( ln_cpl .AND. ncpl_qsr_freq /= 1 .AND. ncpl_qsr_freq /= 0 )
l_trcdm2dc = l_trcdm2dc .AND. .NOT. l_offline

so that ncpl_qsr_freq has a value of zero if there is no radiation coupling, which is the logical value in this case.

comment:7 Changed 4 years ago by smasson

ok perfect.
seb

comment:8 Changed 4 years ago by mathiot

In 12550:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:9 follow-up: Changed 4 years ago by smasson

do we really need to do a branch for such a small bugfix?

comment:10 Changed 4 years ago by mathiot

In 12552:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:11 in reply to: ↑ 9 Changed 4 years ago by mathiot

Replying to smasson:

do we really need to do a branch for such a small bugfix?

This is the process we have for all inputs into our operational or official releases of the MetO system (NEMO, UM, DA, post-processing ...). I agree this is a small fix but as I tried to force people to stick with it, I have to do it myself too for NEMO.

comment:12 Changed 4 years ago by smasson

hum... not sure to like the idea to have all theses branches for almost nothing... When do we merge them? delete them?

How do we know that some bugfixes have already been solved in the trunk when we work with it?

In any case, how do you proceed for the same bugfix to be applied on the 4.0-HEAD?
I think the idea was to use 4.0-HEAD to directly incorporate bugfixes which solution was not "too" complicated...
We won't create branches from the 4.0-HEAD and merge them later into the 4.0-HEAD...

comment:13 Changed 4 years ago by mathiot

  • Cc andmirek added

Our process is (not only for our GO package branch):

  • 1 open ticket to described the bug/suggested fix ...
  • 2 create branch
  • 3 implement fix
  • 4 testing fix
  • 5 review branch
  • 6 once review OK, merge branch into trunk (in this case after Juan has review it)
  • 7 delete branch (not necessary, for example the UM keep all the branches in case someone want to implement a specific fix into its branch)
  • 8 close ticket

To know if it has been included into the trunk or NEMO 4.0-HEAD, you can simply check the ticket (comment or status of the ticket).

For NEMO 4.0-HEAD, I will have to follow the same process from point 2 to 7 and close the ticket once it is done for the trunk and v4.0-HEAD. Some people suggested to me it should be from point 1 to 8.

comment:14 Changed 4 years ago by mathiot

Test by Juan successful.

Sette results:

Current code is : NEMO/branches/2020/ticket2396 @ r12618  ( last change @ r12552 )

SETTE validation report generated for : 

       NEMO/branches/2020/ticket2396 @ r12552 (last changed revision)

       on XC40_METO_IFORT arch file


!!---------------1st pass------------------!!

   !----restart----!   
WGYRE_PISCES_ST              run.stat    restartability  passed :  12552
WGYRE_PISCES_ST              tracer.stat restartability  passed :  12552
WORCA2_ICE_PISCES_ST         run.stat    restartability  passed :  12552
WORCA2_ICE_PISCES_ST         tracer.stat restartability  passed :  12552
WORCA2_OFF_PISCES_ST         tracer.stat restartability  passed :  12552
WAMM12_ST                    run.stat    restartability  passed :  12552
WORCA2_SAS_ICE_ST            run.stat    restartability  passed :  12552
WAGRIF_DEMO_ST               run.stat    restartability  passed :  12552
WSPITZ12_ST                  run.stat    restartability  passed :  12552
WISOMIP_ST                   run.stat    restartability  passed :  12552
WOVERFLOW_ST                 run.stat    restartability  passed :  12552
WLOCK_EXCHANGE_ST            run.stat    restartability  passed :  12552
WVORTEX_ST                   run.stat    restartability  passed :  12552
WICE_AGRIF_ST                run.stat    restartability  passed :  12552

   !----repro----!   
WGYRE_PISCES_ST              run.stat    reproducibility passed :  12552
WGYRE_PISCES_ST              tracer.stat reproducibility passed :  12552
WORCA2_ICE_PISCES_ST         run.stat    reproducibility passed :  12552
WORCA2_ICE_PISCES_ST         tracer.stat reproducibility passed :  12552
WORCA2_OFF_PISCES_ST         tracer.stat reproducibility passed :  12552
WAMM12_ST                    run.stat    reproducibility passed :  12552
WORCA2_SAS_ICE_ST            run.stat    reproducibility passed :  12552
WORCA2_ICE_OBS_ST            run.stat    reproducibility passed :  12552
WAGRIF_DEMO_ST               run.stat    reproducibility passed :  12552
WSPITZ12_ST                  run.stat    reproducibility passed :  12552
WISOMIP_ST                   run.stat    reproducibility passed :  12552
WVORTEX_ST                   run.stat    reproducibility passed :  12552
WICE_AGRIF_ST                run.stat    reproducibility passed :  12552

   !----agrif check----!   
ORCA2 AGRIF vs ORCA2 NOAGRIF run.stat    unchanged  -    passed :  12552 12552

   !----result comparison check----!   

check result differences between :
VALID directory : /projects/jmmp/pmathiot/NEMO/NEMO_dev/NEMO/branches/2020/ticket2396/NEMO_VALIDATION at rev 12552
and
REFERENCE directory : /projects/jmmp/pmathiot/NEMO/NEMO_dev/NEMO/trunk/NEMO_VALIDATION at rev 12549

WGYRE_PISCES_ST       run.stat    files are identical  
WGYRE_PISCES_ST       tracer.stat files are identical  
WORCA2_ICE_PISCES_ST  run.stat    files are identical  
WORCA2_OFF_PISCES_ST  tracer.stat files are identical  
WAMM12_ST             run.stat    files are identical  
WISOMIP_ST            run.stat    files are identical  
WORCA2_SAS_ICE_ST     run.stat    files are identical  
WAGRIF_DEMO_ST        run.stat    files are identical  
WSPITZ12_ST           run.stat    files are identical  
WISOMIP_ST            run.stat    files are identical  
WVORTEX_ST            run.stat    files are identical  
WICE_AGRIF_ST         run.stat    files are identical  

comment:15 Changed 4 years ago by mathiot

In 12619:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:16 Changed 4 years ago by mathiot

In 12620:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:17 Changed 4 years ago by mathiot

  • Resolution set to fixed
  • Status changed from assigned to closed

Fix at revision r12620.

comment:18 Changed 4 years ago by mathiot

In 12621:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:19 Changed 2 years ago by nemo

  • Keywords v4.0 added
Note: See TracTickets for help on using tickets.