Opened 6 years ago

Closed 3 years ago

#1464 closed Enhancement (fixed)

Efficiency improvements in the OBS operator.

Reported by: jenniewaters Owned by: jenniewaters
Priority: low Milestone: 2015 WP
Component: OCE Version: trunk
Severity: Keywords:
Cc: Review:
MP ready?:
Progress:

Description

The OBS operator needs to be improved to make it more efficient for use with higher resolution models (eg. ORCA12). In particular the use of global arrays should be avoided.

Commit History (0)

(No commits)

Change History (9)

comment:1 Changed 5 years ago by jenniewaters

  • Owner changed from frjj to jenniewaters

comment:3 Changed 5 years ago by jenniewaters

This branch has been created for developments to the observation operator to improve memory use. The observations operator uses global arrays and this causes memory inefficiencies when NEMO is run on a large number of processors. This is a particular obstacle for high resolution models such as ORCA12.
This problem was first raised by Clement Bricaud in #1186.

Last edited 5 years ago by jenniewaters (previous) (diff)

comment:4 Changed 5 years ago by jenniewaters

Some changes were made to the branch at r5822 to get the SETTE tests working correctly for the obs test.
https://forge.ipsl.jussieu.fr/nemo/changeset?reponame=&new=5822%40branches%2F2015%2Fdev_r5776_UKMO2_OBS_efficiency_improvs%2FNEMOGCM&old=5776%40trunk%2FNEMOGCM

These changes are unrelated to the NEMO code changes and therefore the obs operator NEMO code changes should be compared to r5822 for review.

comment:5 Changed 5 years ago by jenniewaters

The changes to the observation operator code are at r5943 and can be viewed here:
https://forge.ipsl.jussieu.fr/nemo/changeset?reponame=&new=5943%40branches%2F2015%2Fdev_r5776_UKMO2_OBS_efficiency_improvs%2FNEMOGCM&old=5822%40branches%2F2015%2Fdev_r5776_UKMO2_OBS_efficiency_improvs%2FNEMOGCM

At an early version of NEMO these code changes have been found to produce a small reduction in memory and wall clock at ORCA025 and a significant reduction in memory (memory use is more than halved) at ORCA12.

The code changes have been tested with SETTE with both ln_grid_global=.TRUE. and ln_grid_global=.FALSE. and have passed all the tests. The results are all completely unchanged when ln_grid_global=.TRUE..

When ln_grid_global=.FALSE. the solver.stat outputs in the ORCA2_LIM_OBS test are identical to the outputs from the trunk. There are some small differences in the "overall RMS obs minus model of good observations" in the ocean.output but this is attributed to some observations potentially being assigned to a different processor with these new code changes. I have checked the full feedback files and the global obs minus background statistics are identical for the new code and the trunk code.

comment:6 Changed 5 years ago by jenniewaters

The wiki review ticket can be found here:
https://forge.ipsl.jussieu.fr/nemo/wiki/ticket/1464

comment:7 Changed 5 years ago by jenniewaters

Questions from Clement Bricaud:

  1. INCLUDE 'mpif.h' is not necessary in obs_mpp_find_obs_proc_local , no ?
  1. after your changes, the problem of memory inefficiencies with ln_grid_global = T is still present with big configurations; I think you could add a warning for this case
  1. obs_mpp_find_obs_proc seems not be called now; it would be removed, no ?
  1. did you perform a comparison ln_grid_global = T vs ln_grid_global = F of elapsed time ? is it necessary to keep the method, knowing that ln_grid_global = T could lead to memory crash ?

comment:8 Changed 5 years ago by jenniewaters

My response:

  1. Yes, you are correct. I have removed the unnecessary INCLUDE 'mpif.h'
  1. I have included a warning for the case when ln_grid_global=T
  1. I have removed the original code and renamed my new subroutine as obs_mpp_find_obs_proc. I have added a comment in the history to document this change.
  1. For ORCA025, the elapsed time is very similar for ln_grid_global = T vs ln_grid_global = F. I think we still need to keep the ln_grid_global=T option as there are configurations where this is more efficient. The ln_grid_global option is better for load balancing and therefore would be the preferable option if (for example) you were only reading in sea ice observations in a lower resolution model. We only really see an advantage to ln_grid_global=F when we exceed 200 processors.

The updates to the code can be seen here:
https://forge.ipsl.jussieu.fr/nemo/changeset?reponame=&new=6018%40branches%2F2015%2Fdev_r5776_UKMO2_OBS_efficiency_improvs%2FNEMOGCM&old=5943%40branches%2F2015%2Fdev_r5776_UKMO2_OBS_efficiency_improvs%2FNEMOGCM

I have tested these changes in SETTE and they produce the same results as the previous revision of the code.

comment:9 Changed 3 years ago by timgraham

  • Resolution set to fixed
  • Status changed from new to closed

This was merged into the trunk in 2015

Note: See TracTickets for help on using tickets.