#2128 closed Defect (fixed)
Uniqueness of iceberg labels can not be guaranteed if restarting from a collated restart file
Reported by: | acc | Owned by: | acc |
---|---|---|---|
Priority: | low | Milestone: | 2018 release-4.0 |
Component: | ICB | Version: | trunk |
Severity: | minor | Keywords: | icebergs v4.0 |
Cc: |
Description
Context
There is potential for icebergs to be labelled non-uniquely if a run is restarted from a collated iceberg restart file. This does not affect the heat and mass transport by the icebergs but it can make it impossible to reconstruct all individual trajectories from the time mean datasets.
Analysis
A bit of background information is needed to explain:
- When new icebergs are created following a calving event the host processor adds a new entry to its linked list and assigns an iceberg_number to the new structure. Within that structure it also sets properties such as the year and day (as a real proportion of the current year) of the calving event.
- The first iceberg on each processor is labelled with the 'narea' value and this iceberg_number is incremented by the total number of processes (jpnij) each time a new iceberg is created. Thus each iceberg_number should be unique.
- The uniqueness is relied on whenever a time-series of distributed output is collected together. An iceberg trajectory that crosses between procesors can easily be reconstructed by splicing together the different entries from different files in time order.
However....
If iceberg trajectory restart data is collected into a global dataset (by, for example, using tools/REBUILD_NEMO/icb_combrst.py ) and the collated dataset used to restart the model, several things happen in the current code:
- Information about the last number used on each processor is lost (the 'kount' array aka 'num_bergs' will just hold values from the first dataset collated)
- This means the starting offset for each processor will be the same on any subsequent restart. So the second new iceberg generated, for example, by area 1 will have the same iceberg number as the first generated by area 2 etc.
In terms of mass and heat transports this isn't an error. Each iceberg is a distinct entry in a linked list and duplicate iceberg_numbers are not an issue. However, if the time-series datasets are collated into global sets then it is probable that trajectories will be created by splicing sections from different icebergs with the same number. An attempt to plot iceberg trajectories from these files will show some apparently teleporting across the globe at various stages. It may be possible to reconstruct the correct trajectories by using additional information to help distinguish identically labelled icebergs but this is of no use once the distributed data has been discarded.
Recommendation
This issue could be avoided by storing the num_bergs values for each area in the collated dataset. This will only help though if restarting using the same processor decomposition. Often, the purpose of creating a global restart dataset is to change the decomposition.
A more robust solution will be to detect invalid num_bergs values when a restart is read and to set appropriate values. This is possible because each num_bergs values should be in the set: {N*jpnij + narea} where N is an integer >= -1 . If invalid values are detected then the smallest value that can guarantee that all new icebergs are uniquely labelled is MAX(iceberg_number) - jpnij + narea. This will require a global mpp_sum call for the error flag followed (if necessary) by a mpp_max call to determine the current maximum iceberg_number.
Commit History (1)
Changeset | Author | Time | ChangeLog |
---|---|---|---|
10065 | acc | 2018-08-23T18:14:54+02:00 | Additional sanity checks in icbrst.F90 to avoid the risk of generating duplicate iceberg_numbers when restarting from a collated iceberg restart file. See ticket #2128 |
Change History (3)
comment:1 Changed 6 years ago by acc
comment:2 Changed 6 years ago by davestorkey
- Resolution set to fixed
- Status changed from new to closed
I've reviewed this change and I think it solves the problem.
comment:3 Changed 3 years ago by nemo
- Keywords v4.0 added
In 10065: