Opened 7 years ago

Closed 3 years ago

#108 closed enhancement (wontfix)

Load_balance_orchidee.dat

Reported by: jgipsl Owned by: jgipsl
Priority: minor Milestone: ORCHIDEE 2.0
Component: Driver files Version:
Keywords: Cc:

Description

The file Load_balance_orchidee.dat is created during run time and should be kept as input file for next execution. This has to be done in ORCHIDEE_OL configurations.

Attachments (3)

run.card.NoLoad (3.5 KB) - added by jgipsl 3 years ago.
run.card.WithLoad (5.3 KB) - added by jgipsl 3 years ago.
run.card_highresol.WithLoad (4.2 KB) - added by jgipsl 3 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 5 years ago by jgipsl

  • Priority changed from critical to minor
  • Status changed from new to accepted

Tests needs to be done. It might help to have better performances.

comment:2 Changed 3 years ago by jgipsl

Background

The model always tries to read the file Load_balance_orchidee.dat and it always writes the file in the end of the execution (see for example ORCHIDEE trunk rev 4587).

If the file is not found in the beginning of the execution, then the distribution of grid-cells between the processors is done uniformly. If the file exists, the distribution will be done using the information in the file. In the end of the execution, the time spend for each process, is compared and the distribution is readjust to expect a better balance next execution. The new distribution is written in the file(the file is overwritten if it exists already).

The performances are better when the file exists. The performances seems better after about 4 times the file has been re-used.

In attached files, see the execution time for a simulation using CRU-NCEP/v5.3.2/twodeg forcing files on 31MPI (+1MPI xios server).
Case 1) Never keep the Load_balance_orchidee.dat when starting a new execution, see attached file run.card.NoLoadfile
Case 2) The Load_balance_orchidee.dat from previous year is used except for the 1st and 11th year, see attached file run.card.WithLoadfile

The results as in case 2, is currently the default case using ORCHIDEE_OL configurations with libIGCM with a PeriodNb=10. This is because all files produced but not saved (not specified in orchidee_ol.card, sechiba.card or stomate.card) will be left in the run directory. The file is therfor re-used until a new run directory is created (after PeriodNb executions).

Conclusion

If the Load_balance_orchidee.dat is re-used, a gain in time will be seen. But the gain in time is only obtained when a new run directory is created. A good use of PeriodNb when running ORCHIDEE_OL configurations with libIGCM is already a way to re-use the Load_balance_orchidee.dat file.

Changed 3 years ago by jgipsl

Changed 3 years ago by jgipsl

comment:3 Changed 3 years ago by jgipsl

See also how the RealCpuTime with a higher resolution. run.card_highresol.WithLoad commes from a global run on half degree resolution (WFDEI_GPCC forcing). CumulPeriod 1 and 11 start without the file Load_balance_orchidee.dat (PeriodNb=10). The corresponding RealCpuTime are higher and they are also higher for the next coming period (2, 12).

Note that the results are strictly the same with or without the Load_balance_orchidee.dat.

Changed 3 years ago by jgipsl

comment:4 Changed 3 years ago by jgipsl

  • Resolution set to wontfix
  • Status changed from accepted to closed

Test with adding load_balance in the restart file instead of writing text file
I've tested to put the information in the restart file instead of writing a text file Load_Balance_data.txt. It is easy to add writing with restput in the driver restart file but the reading makes problem. This is due to the fact that reading of the restart file is initialized after the information about load balance is needed. The reading of the text file is done the subroutine Read_Load_balance which is called from Init_orchidee_data_para_driver which is called from forcing_info.

In dim2_driver:

CALL forcing_info (filename, iim, jjm, llm, tm, date0, dt_force, force_id)
...
CALL restini &
    (driv_restname_in, iim_g, jjm_g, lon_g, lat_g, llm, tmplev, &
     driv_restname_out, itau_dep_rest, date0_rest, dt_rest, rest_id, driver_reset_time)

The call restini can not be done before forcing_info because the dimensions are note yet known (iim_g, jjm_g,..) and the rest_id is not known before the restini has been done. Therefore it is not easy to add reading of load_balance in the restart file.

Conclusion of the ticket
It is important to read the load_balance_data.txt file to have good performances. Using a high PeriodNb this is automatically done. It would have been good to add the information in the restart file instead of writing a text file but this is not so easy to implement and it'll not be done (at least not in the current version of ORCHIDEE).

During the ORCHIDEE Day 8 of September 2017, we decided not to add a special treatment for the Load_Balance_data.txt as the performance are good when using a high PeriodNb.

Note: See TracTickets for help on using tickets.