wiki:DevelopmentActivities/Branches/ORCHIDEE-MICT-IMBALANCE-P/SimulationTimes

Version 193 (modified by ajornet, 6 years ago) (diff)

--

Performance

Basic Performance Report

Overview

This document tries to understand Orchidee MICT computing time behavior. In the latest version 6.5 it takes a lot of time to compute. Around 8h in 0.5 degrees for 1 year. So it is necessary to understand why It happens. Once the issues are identified it might be possible to apply different solutions.

In order to make such thing possible the code is profiled. Different tools are used (vtune, vampir, gprof, ...). They provide an easy way to identify basic hotspots in the code.

Report

attachment:performance_mict_albert_jornet_150616.pdf

MICT V6 (3344 + PFT interpolation) Module computing time

Starting from a basic configuration. At each test a new module is activated. This increases the numbers of modules at each test. Its purposes is to show the impact of each module when is used.

Perf_MICT_options

Trunk vs MICT Comparision 11/04/2016

  • Date 11/04/2016
  • ADA Machine
  • IOIPSL production mode
  • Orchidee production mode
  • 1Y
  • 16 cores
  • Forcing:
    • 1 Degree
    • 3H

Considerations:

  • MICT is in the same level of modifications as Trunk revision 3346
  • MICT is using parallel interpolation for aggregate 2D subroutine

Overview

Orchidee vs trunk profiling

Subroutines are placed in 4 different groups described below:

  • ioipsl: all subroutines related to IOIPSL library
  • Top orchidee: subroutines >1% of computing time
  • Interpolation: interpolation time by aggregate_2D subroutine
  • other orchidee: remaining subroutines from orchidee

Mict R3359 (gprof)

This is a profiling test done with gprof tool:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 25.66   1383.92  1383.92  2245127     0.00     0.00  mathelp_mp_ma_fuscat_r21_
  9.62   1902.84   518.92  3835809     0.00     0.00  mathelp_mp_moycum_index_
  9.18   2398.02   495.18  3835826     0.00     0.00  histcom_mp_histwrite_real_
  5.96   2719.41   321.39    17524     0.00     0.00  thermosoil_mp_thermosoil_cond_pft_
  3.81   2924.90   205.49    17520     0.00     0.00  hydrol_mp_hydrol_soil_
  3.62   3119.87   194.97   420480     0.00     0.00  hydrol_mp_hydrol_soil_coef_
  3.59   3313.39   193.52    17524     0.00     0.00  thermosoil_mp_thermosoil_getdiff_
  3.11   3481.04   167.65      365     0.00     0.00  stomate_wet_ch4_pt_ter_wet2_mp_ch4_wet_flux_density_wet2_
  3.05   3645.33   164.29      365     0.00     0.00  stomate_wet_ch4_pt_ter_wet1_mp_ch4_wet_flux_density_wet1_
  2.92   3803.03   157.70      365     0.00     0.00  stomate_wet_ch4_pt_ter_wet3_mp_ch4_wet_flux_density_wet3_
  2.86   3957.34   154.31      365     0.00     0.00  stomate_wet_ch4_pt_ter_0_mp_ch4_wet_flux_density_0_
  2.74   4105.24   147.90      365     0.00     0.00  stomate_wet_ch4_pt_ter_wet4_mp_ch4_wet_flux_density_wet4_
  2.67   4249.50   144.26    17522     0.00     0.00  thermosoil_mp_thermosoil_coef_
  1.63   4337.37    87.87    17520     0.00     0.00  hydrol_mp_hydrol_diag_soil_
  1.59   4423.39    86.02  2666157     0.00     0.00  mod_orchidee_omp_transfert_mp_gather_omp_r1_
  1.57   4507.82    84.43       55     0.00     0.00  interpol_help_mp_aggregate_2d_
  1.37   4581.90    74.08    17520     0.00     0.00  diffuco_mp_diffuco_trans_co2_
  1.36   4655.06    73.16    17520     0.00     0.00  stomate_mp_stomate_main_
  1.22   4720.59    65.53    17520     0.00     0.00  stomate_permafrost_soilcarbon_mp_microactem_
  1.06   4777.86    57.27    17520     0.00     0.00  hydrol_mp_hydrol_main_
  0.96   4829.85    51.99  1602027     0.00     0.00  mathelp_mp_ma_fuscat_r11_
  0.77   4871.20    41.35    17522     0.00     0.00  thermosoil_mp_thermosoil_readjust_
  0.74   4911.35    40.15  2664512     0.00     0.00  mod_orchidee_omp_transfert_mp_gather_omp_i1_

Total Simulation time: 5358 seconds

IO: mathelp + histcom = 25.66 + 9.62 + 9.18 = ~45%

Trunk R3346

This is a profiling test done with gprof tool:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 22.26    441.54   441.54        7     0.06     0.06  interpol_help_mp_aggregate_2d_
 14.52    729.66   288.12  2171415     0.00     0.00  histcom_mp_histwrite_real_
 13.26    992.66   263.00    17520     0.00     0.00  hydrol_mp_hydrol_soil_
 10.28   1196.56   203.90   773813     0.00     0.00  mathelp_mp_ma_fuscat_r21_
  5.07   1297.17   100.61  2171397     0.00     0.00  mathelp_mp_moycum_index_
  4.16   1379.77    82.60    17520     0.00     0.00  diffuco_mp_diffuco_trans_co2_
  3.81   1455.34    75.57    17520     0.00     0.00  hydrol_mp_hydrol_diag_soil_
  3.67   1528.21    72.87   157680     0.00     0.00  hydrol_mp_hydrol_soil_coef_
  2.29   1573.69    45.48  1400412     0.00     0.00  mathelp_mp_ma_fuscat_r11_
  2.27   1618.76    45.07    17520     0.00     0.00  hydrol_mp_hydrol_main_
  1.86   1655.66    36.90    17521     0.00     0.00  thermosoil_mp_thermosoil_getdiff_
  1.46   1684.63    28.97    17521     0.00     0.00  thermosoil_mp_thermosoil_humlev_
  0.99   1704.17    19.54   157680     0.00     0.00  hydrol_mp_hydrol_soil_tridiag_
  0.94   1722.82    18.65    17520     0.00     0.00  stomate_litter_mp_littercalc_
  0.92   1740.99    18.17    17520     0.00     0.00  hydrol_mp_hydrol_split_soil_
  0.86   1758.10    17.11    17520     0.00     0.00  stomate_mp_stomate_main_
  0.81   1774.07    15.98  1133588     0.00     0.00  mod_orchidee_omp_transfert_mp_gather_omp_r1_

Total Simulation time: 1956 seconds

IO: mathelp + histcom = 14.25 + 10.28 + 5.07 = ~30%

Trunk vs MICT Comparision 18/02/2016

18/02/2016: revisions trunk 2916 and MICT 3161 were considered to be equivalents.

The same run.def file is used to compare both developments.

The simulations were carried out under the following conditions:

  • 1 Year
  • Global
  • CRU-NCEP v5.3.2 (6 hourly)
  • CURIE
  • IO library: IOIPSL/XIOS
    • Yearly output
  • Compilation mode IOIPSL: production
  • Compilation mode Orchidee: production
  • Compilation mode XIOS: production
    • e.g: 64 cores = 64 ORC + 1 XIOS

trunk_vs_mict_performance

Configurations

  • S0: no freeze + no explicitsnow + no ok_pc + no hydrol_cwrr
    • Used by default
  • S1: S0 + freeze + explicitsnow + ok_pc + hydrol_cwrr
  • S2: S1 + ch4_calcul
  • S3: S2 + dgvm

Trunk R5293 (S1) - Irene - with restarts

N procs 8 16 32 64 128 256 512 1024
0.5 deg
1 deg -
2 deg -

Default options changed:

DO_WOOD_HARVEST=n

Trunk R5293 (S1) - Irene

N procs 8 16 32 64 128 256 512 1024
0.5 deg 1h52 1h06 52m09 44m08 40m57 40m47 42m08
1 deg 44m10 28m30 19m05 15m03 13m41 13m02 13m15 15m19
2 deg 13m30 9m24 7m26 6m46 6m19 6m39 7m21 -

Default options changed:

DO_WOOD_HARVEST=n

Trunk R5293 (S1) - Curie - with restarts

N procs 8 16 32 64 128 256 512 1024
0.5 deg 4h50 3h28 58m12 29m09 16m39 12m45 11m19 13m56
1 deg 55m52 27m29 14m14 8m31 6m57 5m04 4m58 -
2 deg 15m23 7m58 5m03 ?? 3m47 3m56 3m48 -

Default options changed:

DO_WOOD_HARVEST=n

Trunk R5293 (S1) - Curie

N procs 8 16 32 64 128 256 512 1024
0.5 deg 4h10 2h21 1h27 1h05 54m35 49m23 46m50 48m08
1 deg 1h03 36m06 23m36 17m47 15m51 14m25 13m58 -
2 deg 18m08 11m40 8m39 7m25 6m34 6m19 6m19 -

Default options changed:

DO_WOOD_HARVEST=n

MICT R5292 (S1) - Irene (NO align array64byte)

N procs 8 16 32 64 128 256 512 1024
0.5 deg 7h08 3h55 1h54 2h27 33m25 23m17 18m34 21m11
1 deg 1h45 56m27 26m 28m01 15m04 9m54 10m10 -
2 deg 25m05 14m15 7m56 5m32 4m33 6m01 8m34 -

MICT R5292 (S1) - Irene (align array64byte)

N procs 8 16 32 64 128 256 512 1024
0.5 deg 6h57 3h53 1h49 1h07 33m30 22m51 19m03 21m05
1 deg 1h36 55m08 26m08 1957s -> 32m37 (?) 9m38 9m 10m24 -
2 deg 25m54 14m 6m31 5m42 3m41 5m22 8m05 -

MICT R5273 (S1)

N procs 8 16 32 64 128 256 512 1024
0.5 deg 11h21 5h13 2h28 1h13 37m57 22m09 15m04 16m35
1 deg 2h33 1h13 35m25 18m02 10m34 7m14 6m26 -
2 deg 38m14 18m06 9m47 5m42 3m58 3m18 3m25 -
  • GLUC is disabled

MICT R5270 (new GLUC)

N procs 8 16 32 64 128 256 512 1024
2 deg 46m02 13m48 8m47
  • 61 PFT's
  • IOIPSL: no outputs

MICT R5255 (S1)

N procs 8 16 32 64 128 256 512 1024
0.5 deg 11h17 5h14 2h28 1h16 41m56 25m30 18m56 19m13
1 deg 2h30 1h12 34m35 18m50 11m47 8m27 7m25 -
2 deg 37m36 18m36 10m34 6m51 5m04 4m14 4m16 -

This new test includes the following options:

SOILTYPE_CLASSIF = usda

# MICT hydrol
USE_SOILC_TEMPDIFF=y
use_refSOC=y
use_refSOC_hydrol=y

SOIL_REFSOC_FILE = refSOC.nc
SOIL_REFSOC_1d_FILE = refSOC_1d.nc

MICT R5255 (S0)

N procs 8 16 32 64 128 256 512 1024
0.5 deg 10h53 4h57 2h20 1h08 35m54 22m48 17m16 17m29
1 deg 2h23 1h07 32m39 16m48 9m35 6m07 5m53 -
2 deg 34m31 16m25 8m32 5m02 3m34 2m54 2m51 -

This new test includes the following options:

SOILTYPE_CLASSIF = usda

# MICT hydrol
USE_SOILC_TEMPDIFF=y
use_refSOC=y
use_refSOC_hydrol=y

SOIL_REFSOC_FILE = refSOC.nc
SOIL_REFSOC_1d_FILE = refSOC_1d.nc

Orchidee-CN-P R4758

N procs 8 16 32 64 128 256 512 1024
0.5 deg 6h15 2h47 1h16 37m06 20m41 13m07 11m37 13m46
1 deg 1h20 38m05 18m39 10m16 6m56 4m55 5m42 16m49
2 deg 20m16 10m20 5m49 4m05 3m15 3m11 3m32

TRUNK R4788

N procs 8 16 32 64 128 256 512 1024
0.5 deg 3h06 1h38 54m18 34m39 25m44 21m43 19m56 24m54
1 deg 45m21 24m08 14m23 9m34 7m16 6m27 10m14 20m33
2 deg 12m01 6m35 4m21 3m04 2m41 2m33 2m32 -

Note: are results correct? The Trunk has no parallel interpolation algorithm.

MICT R4755 (S0)

N procs 8 16 32 64 128 256 512 1024
0.5 deg 12h37 5h37 2h38 1h16 38m16 22m32 16m16 18m34
1 deg 2h44 1h18 37m37 19m06 10m35 7m52 7m50 -
2 deg 41m12 19m31 10m02 5m42 3m42 3m09 3m18 -

MICT R4755 (S1)

N procs 8 16 32 64 128 256 512 1024
0.5 deg ??? 5h51 2h45 1h22 43m52 25m25 19m05 19m49
1 deg 2h51 1h23 39m57 20m59 13m28 10m29 9m07 -
2 deg 43m33 21m11 11m24 6m53 4m46 4m03 4m20 -

Note: netcdf restart files compression is activated

MICT R4414 (S0)

N procs 8 16 32 64 128 256 512 1024
0.5 deg 11h50 5h29 2h33 1h14 37m02 25m 12m52 14m43
1 deg 2h41 1h17 38m41 18m44 10m32 9m
2 deg 40m 19m03 10m17 5m32 4m33 2m42

MICT R4385 (S2)

  • Standard
N procs 8 16 32 64 128 256 512 1024
0.5 deg 42m40
  • SOA ch4
N procs 8 16 32 64 128 256 512 1024
0.5 deg 41m34

MICT R4385 (S1)

  • No restarts (unlimited IOIPSL):
N procs 8 16 32 64 128 256 512 1024
0.5 deg 12h17 5h42 2h41 1h17 38m12 21m39 13m52 14m58
1.0 deg 2h47 1h20 38m16 19m25 10m35 6m58 - -
2.0 deg 42m 20m 10m29 6m12 4m32 3m03 - -
  • No restarts (limited IOIPSL): All running (_latest)
N procs 8 16 32 64 128 256 512 1024
0.5 deg c c 2h39 1h17 37m58 21m02 13m49 14m59
  • With restarts (unlimited IOIPSL): All running (_latest_next)
N procs 8 16 32 64 128 256 512 1024
0.5 deg c c 2h46 1h24 43m51 31m 23m11 22m32
  • With restarts (limited IOIPSL): (_latest_limitedio)
N procs 8 16 32 64 128 256 512 1024
0.5 deg 36m26 20m21 12m52 14m45
  • No restarts (limited IOIPSL): thermosoil_cond_pft + no precise (_latest_refactor)

thermosoil_cond_pft is refactored again. Performance modifications were lost in previous commits. Refactorization + no procise allows the vectorization of pow and exp subroutines.

N procs 8 16 32 64 128 256 512 1024
0.5 deg 32m23

MICT R4289 (S1)

N procs 8 16 32 64 128 256 512
0.5 deg 13h 6h11 2h50 1h23 42m06 23m47 15m42
1.0 deg 2h57 1h26 42m13 20m18 10m51 7m19 -
2.0 deg 44m58 21m21 10m54 6m17 4m 2m56 -

Note: IOISPL alignment is introduced by default

MICT R4277 + interpolation (S1)

N procs 8 16 32 64 128 256 512
0.5 deg 16h39 8h05 3h38 1h42 50m12 26m13 16m47
1.0 deg 3h45 1h44 50m16 22m29 13m11 7m29
2.0 deg 55m12 25m09 12m35 7m13 4m27 3m12

Commit in [4289/branches/ORCHIDEE-MICT/ORCHIDEE]

MICT R4277 + thermosoil refactor + IOIPSL alignment (S1)

N procs 8 16 32 64 128 256 512
0.5 deg 13h42 6h28 2h59 1h28 44m14 24m48 16m35
1.0 deg 3h25 1h29 42m57 21m20 11m24 7m13
2.0 deg 45m 22m48 11m42 7m23 4m15 3m05

MICT R4277 + IOIPSL aligment (S1)

N procs 8 16 32 64 128 256 512
0.5 deg 16h39 8h46 3h 1h46(2nd try) 52m 27m14 17m19
1.0 deg 4h10 1h48(2nd try) 52m51 24m21 12m37 7m40 -
2.0 deg 55m 26m20 13m04 6m46 4m28 3m11 -

MICT R4277 + thermosoil refactor (S1)

N procs 8 16 32 64 128 256 512
0.5 deg 13h38 6h28 3h 1h29 48m16 28m52 20m20
1.0 deg 3h08 1h31 44m56 23m28 14m37 10m18
2.0 deg 49m46 25m35 14m55 10m11 7m59 6m57

Commited in [4280/branches/ORCHIDEE-MICT/ORCHIDEE]

MICT R4277 (S1)

N procs 8 16 32 64 128 256 512
0.5 deg 16h39 8h21 3h43 1h47 55m34 31m18 21m28
1.0 deg 3h27 1h48 53m37 26m21 15m38 10m46 -
2.0 deg 59m17 28m31 16m05 10m42 8m13 7m03 -

MICT R4277 + thermosoil refactor (S0)

N procs 8 16 32 64 128 256 512
0.5 deg 7h42 3h18 1h33 47m07 24m48 15m53
1.0 deg 3h27 1h36 47m 21m38 13m33 9m14
2.0 deg 51m38 24m19 14m08 8m56 5m16 4m18

Commited in [4280/branches/ORCHIDEE-MICT/ORCHIDEE]

MICT R4277 (S0)

N procs 8 16 32 64 128 256 512
0.5 deg 7h33 3h15 1h32 49m49 24m43 15m47
1.0 deg 3h26 1h36 46m05 21m18 13m25 9m05
2.0 deg 51m07 24m06 14m06 8m56 5m12 4m09

MICT R4274 (S0)

N procs 8 16 32 64 128 256 512
0.5 deg - 7h33 3h15 1h32 46m49 24m43 15m47
1.0 deg 3h26 1h36 46m05 21m18 13m25 9m05
2.0 deg 51m07 24m06 14m04 8m56 5m12 4m09

Trunk R3934 (XIOS 2 + S0)

N procs 8 16 32 64 128 256
0.5 deg 3h23 2h01 1h19 1h01 53m18 48m40
1.0 deg 52m07 32m10 21m38 17m18 14m52 13m56
2.0 deg 14m50 9m11 6m38 5m39 5m15 5m13

Notes:

  • Interpolation is sequential

Mict R3932 (XIOS 2) + CROP + IOIPSL restarts

This is an early test with IOIPSL + restarts to 3, 4 and 5 dimensions. This revision is still in a perso directory. It includes remaining revisions from TRUNK. It will be merge to the main MICT branch any time soon.

Its purpose is to provide a first draft of this modification.

N procs 8 16 32 64 128 256
0.5 deg 15h54 7h50 3h42 2h06 55m01 34m35
  • 0.5: The number of XIOS outputs is reduced so the simulation can finish.

Mict R3932 (XIOS 2) + IOIPSL restarts

This is an early test with IOIPSL + restarts to 3, 4 and 5 dimensions. This revision is still in a perso directory. It includes remaining revisions from TRUNK. It will be merge to the main MICT branch any time soon.

Its purpose is to provide a first draft of this modification.

N procs 8 16 32 64 128 256
0.5 deg Out of mem. 5h31 2h40 1h24 44m32 23m15
1 deg 2h44 1h21 39m24 19m15 10m44 7m21
2 deg 43m41 20m37 10m58 6m23 4m15 3m24
  • 0.5: The number of XIOS outputs is reduced so the simulation can finish.

Mict R3811 (XIOS 2)

N procs 8 16 32 64 128 256
0.5 deg out of memory out of memory out of memory out of memory 1h33 1h27
1 deg 2h48 1h27 44m24 25m21 19m52 13m47
2 deg 43m48 21m31 11m13 7m40 5m16 4m30

Output Netcdf files:

  • 0.5 Degree
Filename Size # vars
stomate_rest_out.nc 14G 379 (double)
sechiba_rest_out.nc 8.5G 234 (double)
driver_rest_out.nc 28M 13 (double)
sechiba_history.nc 3.0G 114 (float)
stomate_history.nc 3.3G 297 (float)

Changes

  • CROP restart variables are now only active when CROP is enabled.
  • XIOS history outputs now include 4D/5D dimension. It allows to reduce the number of variables in the outputs.

Issues

  • 0.5 deg Out of memory is due to XIOS

Conclusion

  • 0.5deg - 64 procs: it might be due to 4D/5D variables.
  • 0.5deg computing time: less restart variables to write decreased total time. CROP module still has this problem.

Mict R3791 (XIOS 2)

  • Date: 30/09/16
  • Add all XIOS output fields

Time table:

N procs 8 16 32 64 128 256
0.5 deg out of memory out of memory out of memory 14h56 15h42 11h06
1 deg 3h19 2h28 1h52 2h22 1h32 1h23
2 deg 54m38 38m53 23m58 24m43 24m37 16m11

Output Netcdf files:

  • 0.5 Degree
Filename Size # vars
stomate_rest_out.nc 20G 611 (double)
sechiba_rest_out.nc 8.6G 234 (double)
driver_rest_out.nc 28M 13 (double)
sechiba_history.nc 2.0G 388 (float)
stomate_history.nc 3.8G 1179 (float)

Changes

  • Add CROP module.

Issues

  • 0.5 deg Memory requirements are high
  • 0.5 deg Simulation Time is far too high. Even when the module is disabled.

Conclusion

  • 0.5deg Memory: the introduction of XIOS increases the memory usage
  • 0.5deg simulation time: a lots of more restart variables to write

Mict R3587 (XIOS 2!)

  • Small fixes
  • Trunk update

Time table:

N procs 8 16 32 64 128 256
0.5 deg out of memory 4h43 2h21 1h18 58 47
1 deg running 1h05 35 23 16 18
2 deg - - - - - -

Mict R3587 (XIOS 2 + thermosoil_cond_pft)

This specific branch involves the subroutine thermosoil_cond_pft. It is shown in some profiling reports to be highly consuming. The next tests are an effort to improve the performance.

All tests are done with 0.5 degres. All other parameters are the same specified in this section.

N procs 32 64 128 256
Avx + align 32 + vecalign32 2h08 1h16 49 40
Align 32 + vecalign32 2h13 1h30 54 43
Avx + align 32 2h12 1h18 1h02 47
Align 16 2h23 1h13 1h04 50

Description:

  • avx: 256 bit register
  • align 32: -align array32byte compilation flag
  • align 16: -align array16byte compilation flag
  • vecalign: source code lines to help the compiler improve the performance

Mict R3567

  • New driver

Time table:

N procs 8 16 32 64 128 256
0.5 deg timeout 11h22 7h21 4h51 3h38 2h49
1 deg 4h22 2h21 1h24 54 39 30
2 deg 58 31 19 13 9 7

Mict R3527

  • PFT parallel interpolation

Time table:

N procs 8 16 32 64 128 256
0.5 deg >16h39 322 days 11h09 7h06 4h50 3h31 2h47
1 deg 4h10 2h14 1h20 52 37 30
2 deg 55m05 30m01 18m05 12 9 7

Mict R3161

Time table:

N procs 4 8 16 32 64 128
0.5 deg timeout timeout 13h00 8h46 6h35 5h38
1 deg 6h37 4h20 2h36 1h45 1h21 1h08
2 deg 1h40 56 35 24 19 16

Note: 0.5 deg in 4 N procs did not start due to memory requirements. 0.5 deg in 8 N procs could not finish the simulation in the maximum time given by the HPC. It stopped at the simulation day 322. Both values can be extrapolated.

Trunk R2916

The same simulations with the same options where carried out with the following results:

N procs 4 8 16 32 64 128
0.5 deg 8h38 5h31 3h26 2h23 1h48 1h31
1 deg 2h07 1h17 47 32 25 21
2 deg 38 19 11 8 6 5

IOIPSL

Restart File Creation :

Attachments (12)