[[PageOutline]] Last edited [[Timestamp]] [[BR]] '''Author''' : jenniewaters '''ticket''' : #1464 '''Branch''' : [https://forge.ipsl.jussieu.fr/nemo/browser/branches/2015/dev_r5776_UKMO2_OBS_efficiency_improvs dev_r5776_UKMO2_OBS_efficiency_improvs ] ---- === Description === This branch has been created for developments to the observation operator to improve memory use. The observations operator uses global arrays and this causes memory inefficiencies when NEMO is run on a large number of processors. This is a particular obstacle for high resolution models such as ORCA12. The proposed changes remove any calls to global arrays when ln_grid_global=.FALSE.. Note that global arrays are still used if ln_grid_global=.TRUE. as this would require a much more extensive re-write of the code. ---- === Testing === Testing could consider (where appropriate) other configurations in addition to NVTK]. ||SETTE Tested|| YES || ||Other model configurations|| YES || ||Processor configurations tested||[ 2 by 8] and [4 by 4 ]|| ||If adding new functionality please confirm that the [[BR]]New code doesn't change results when it is switched off [[BR]]and !''works!'' when switched on|| YES || (Answering UNSURE is likely to generate further questions from reviewers.) 'Please add further summary details here' * All SETTE tests pass * Tested SETTE configs: AMM12 AMM12_32 AMM12_LONG C1D_PAPA GYRE GYRE_4 GYRE_BFM GYRE_LONG GYRE_PISCES GYRE_XIOS ISOMIP_4 ISOMIP_LONG ORCA2AGUL_1_2 ORCA2LIM3_16 ORCA2LIM3_LONG ORCA2LIMPIS_16 ORCA2LIMPIS_LONG ORCA2OFFPIS_16 ORCA2OFFPIS_LONG ORCA2_LIM ORCA2_LIM3 ORCA2_LIM_CFC_C14b ORCA2_LIM_OBS ORCA2_LIM_PISCES ORCA2_OFF_PISCES ORCA2_SAS_LIM SAS_32 SAS_LONG * Focused on the outputs of the ORCA2_LIM_OBS test as this is the only test using the obsoper. * Changes were also tested in an older version of the code at ORCA12 and were found to reduce the memory use (when ln_grid_global=.FALSE.) by more that 50%. === Bit Comparability === ||Does this change preserve answers in your tested standard configurations (to the last bit) ?|| YES, when ln_grid_global=.FALSE.. See below for full details|| ||Does this change bit compare across various processor configurations. || YES || ||Is this change expected to preserve answers in all possible model configurations?|| YES || ||Is this change expected to preserve all diagnostics? [[BR]]!,,!''Preserving answers in model runs does not necessarily imply preserved diagnostics. !''||YES -see below for more detail|| If you answered !'''NO!''' to any of the above, please provide further details: * solver.stat and ocean.output are completely unchanged when ln_grid_global=.TRUE. * solver.stat is unchanged when ln_grid_global=.FALSE. * There are very small differences (order {{{10^-12}}}) in the "overall RMS obs minus model of good observations" in the ocean.output when ln_grid_global=.FALSE. is used in the new code. However, these differences are of the same magnitude as the differences in output the from the trunk when ln_grid_global=.FALSE. is used compared to ln_grid_global=.TRUE. * As an additional check, the obs minus background stats calculated directly from the outputted feedback files are exactly reproduced by the new code to 4 significant figures (they are likely to be exact to higher than 4 significant figures, this is the precision which we use as standard). * No differences to the model output, although the differences in the order/distribution of the observations may lead to very tiny differences in the calculation of stats (online or offline). * Which routine(s) are causing the difference? '''obs_mpp_find_obs_proc_local''' * Why the changes are not protected by a logical switch or new section-version? '''I have decided to replace the obs_mpp_find_obs_proc routine with obs_mpp_find_obs_proc_local as any differences in the distribution of observations is arbitrary and should have no scientific impact.''' * What is needed to achieve regression with the previous model release (e.g. a regression branch, hand-edits etc). If this is not possible, explain why not. ''' NA ''' * What do you expect to see occur in the test harness jobs? ''' NA ''' * Which diagnostics have you altered and why have they changed?Please add details here........ ''' NA ''' ---- === System Changes === ||Does your change alter namelists?|| NO || ||Does your change require a change in compiler options?|| NO || If any of these apply, please document the changes required here....... ---- === Resources === These changes improve memory use when a large number of processors are used. For ORCA12 on 928 processors, the memory use is reduced by more than 50%.[[BR]] [[BR]] An ORCA025 run on 192 processors with ln_grid_global=.TRUE. uses 258 seconds wall clock and 244Gb of memory.[[BR]] An ORCA025 run on 192 processors with ln_grid_global=.FALSE. and these code changes uses 265 seconds wall clock and 221Gb of memory. ---- === IPR issues === ||Has the code been wholly (100%) produced by NEMO developers staff working exclusively on NEMO?|| YES || If No: * Identify the collaboration agreement details * Ensure the code routine header is in accordance with the agreement, (Copyright/Redistribution etc).Add further details here if required..........