= Performance = [[PageOutline]] == Basic Performance Report == === Overview === This document tries to understand Orchidee MICT computing time behavior. In the latest version 6.5 it takes a lot of time to compute. Around 8h in 0.5 degrees for 1 year. So it is necessary to understand why It happens. Once the issues are identified it might be possible to apply different solutions. In order to make such thing possible the code is profiled. Different tools are used (vtune, vampir, gprof, ...). They provide an easy way to identify basic hotspots in the code. === Report === attachment:performance_mict_albert_jornet_150616.pdf == MICT V6 (3344 + PFT interpolation) Module computing time == Starting from a basic configuration. At each test a new module is activated. This increases the numbers of modules at each test. Its purposes is to show the impact of each module when is used. [[Image(test_perf_matguimb.png​, 80%)]] == Trunk vs MICT Comparision 11/04/2016 == * Date 11/04/2016 * ADA Machine * IOIPSL production mode * Orchidee production mode * 1Y * 16 cores * Forcing: * 1 Degree * 3H Considerations: * MICT is in the same level of modifications as Trunk revision 3346 * MICT is using '''parallel interpolation''' for aggregate 2D subroutine === Overview === [[Image(trunk_vs_mict_grouped.png​, 20%, title="main")]] Subroutines are placed in 4 different groups described below: * ioipsl: all subroutines related to IOIPSL library * Top orchidee: subroutines >1% of computing time * Interpolation: interpolation time by aggregate_2D subroutine * other orchidee: remaining subroutines from orchidee === Mict R3359 (gprof) === This is a profiling test done with gprof tool: {{{ Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ks/call Ks/call name 25.66 1383.92 1383.92 2245127 0.00 0.00 mathelp_mp_ma_fuscat_r21_ 9.62 1902.84 518.92 3835809 0.00 0.00 mathelp_mp_moycum_index_ 9.18 2398.02 495.18 3835826 0.00 0.00 histcom_mp_histwrite_real_ 5.96 2719.41 321.39 17524 0.00 0.00 thermosoil_mp_thermosoil_cond_pft_ 3.81 2924.90 205.49 17520 0.00 0.00 hydrol_mp_hydrol_soil_ 3.62 3119.87 194.97 420480 0.00 0.00 hydrol_mp_hydrol_soil_coef_ 3.59 3313.39 193.52 17524 0.00 0.00 thermosoil_mp_thermosoil_getdiff_ 3.11 3481.04 167.65 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet2_mp_ch4_wet_flux_density_wet2_ 3.05 3645.33 164.29 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet1_mp_ch4_wet_flux_density_wet1_ 2.92 3803.03 157.70 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet3_mp_ch4_wet_flux_density_wet3_ 2.86 3957.34 154.31 365 0.00 0.00 stomate_wet_ch4_pt_ter_0_mp_ch4_wet_flux_density_0_ 2.74 4105.24 147.90 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet4_mp_ch4_wet_flux_density_wet4_ 2.67 4249.50 144.26 17522 0.00 0.00 thermosoil_mp_thermosoil_coef_ 1.63 4337.37 87.87 17520 0.00 0.00 hydrol_mp_hydrol_diag_soil_ 1.59 4423.39 86.02 2666157 0.00 0.00 mod_orchidee_omp_transfert_mp_gather_omp_r1_ 1.57 4507.82 84.43 55 0.00 0.00 interpol_help_mp_aggregate_2d_ 1.37 4581.90 74.08 17520 0.00 0.00 diffuco_mp_diffuco_trans_co2_ 1.36 4655.06 73.16 17520 0.00 0.00 stomate_mp_stomate_main_ 1.22 4720.59 65.53 17520 0.00 0.00 stomate_permafrost_soilcarbon_mp_microactem_ 1.06 4777.86 57.27 17520 0.00 0.00 hydrol_mp_hydrol_main_ 0.96 4829.85 51.99 1602027 0.00 0.00 mathelp_mp_ma_fuscat_r11_ 0.77 4871.20 41.35 17522 0.00 0.00 thermosoil_mp_thermosoil_readjust_ 0.74 4911.35 40.15 2664512 0.00 0.00 mod_orchidee_omp_transfert_mp_gather_omp_i1_ }}} Total Simulation time: 5358 seconds IO: mathelp + histcom = 25.66 + 9.62 + 9.18 = ~45% === Trunk R3346 === This is a profiling test done with gprof tool: {{{ Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ks/call Ks/call name 22.26 441.54 441.54 7 0.06 0.06 interpol_help_mp_aggregate_2d_ 14.52 729.66 288.12 2171415 0.00 0.00 histcom_mp_histwrite_real_ 13.26 992.66 263.00 17520 0.00 0.00 hydrol_mp_hydrol_soil_ 10.28 1196.56 203.90 773813 0.00 0.00 mathelp_mp_ma_fuscat_r21_ 5.07 1297.17 100.61 2171397 0.00 0.00 mathelp_mp_moycum_index_ 4.16 1379.77 82.60 17520 0.00 0.00 diffuco_mp_diffuco_trans_co2_ 3.81 1455.34 75.57 17520 0.00 0.00 hydrol_mp_hydrol_diag_soil_ 3.67 1528.21 72.87 157680 0.00 0.00 hydrol_mp_hydrol_soil_coef_ 2.29 1573.69 45.48 1400412 0.00 0.00 mathelp_mp_ma_fuscat_r11_ 2.27 1618.76 45.07 17520 0.00 0.00 hydrol_mp_hydrol_main_ 1.86 1655.66 36.90 17521 0.00 0.00 thermosoil_mp_thermosoil_getdiff_ 1.46 1684.63 28.97 17521 0.00 0.00 thermosoil_mp_thermosoil_humlev_ 0.99 1704.17 19.54 157680 0.00 0.00 hydrol_mp_hydrol_soil_tridiag_ 0.94 1722.82 18.65 17520 0.00 0.00 stomate_litter_mp_littercalc_ 0.92 1740.99 18.17 17520 0.00 0.00 hydrol_mp_hydrol_split_soil_ 0.86 1758.10 17.11 17520 0.00 0.00 stomate_mp_stomate_main_ 0.81 1774.07 15.98 1133588 0.00 0.00 mod_orchidee_omp_transfert_mp_gather_omp_r1_ }}} Total Simulation time: 1956 seconds IO: mathelp + histcom = 14.25 + 10.28 + 5.07 = ~30% == Trunk vs MICT Comparision 18/02/2016 == 18/02/2016: revisions trunk 2916 and MICT 3161 were considered to be equivalents. The same run.def file is used to compare both developments. The simulations were carried out under the following conditions: * 1 Year * Global * CRU-NCEP v5.3.2 (6 hourly) * CURIE * IO library: IOIPSL/XIOS * Yearly output * Compilation mode IOIPSL: production * Compilation mode Orchidee: production * Compilation mode XIOS: production * e.g: 64 cores = 64 ORC + 1 XIOS * Filesystem: * Irene: /ccc/work/cont003/dsmipsl/p529jorn/ * Curie: WORKDIR [[Image( trunk_vs_mict_performance.png, 50%, title="MICT execution time")]] === Configurations === * S0: no freeze + no explicitsnow + no ok_pc + no hydrol_cwrr * Used by default * S1: S0 + freeze + explicitsnow + ok_pc + hydrol_cwrr * S2: S1 + ch4_calcul * S3: S2 + dgvm === MICT R5788 (S1) - Irene === * No restarts ||= N procs =||= 12 =||= 24 =||= 48 =||= 96 =||= 192 =||= 384 =||= 768 =||= 1536 =|| ||= 0.5 deg =|| 3h03 || 1h44 || 1h08 || 33m36 || *18m47 || 13m18 || 12m23 || 14m55 || ||= 1 deg =|| 43m46 || 24m25 || *15m55 || 08m28 || 5m56 || 5m41 || 6m36 || - || ||= 2 deg =|| 11m02 || *6m32 || 4m18 || 2m48 || 2m57 || 3m41 || 4m54 || - || Note: * = best compromise === MICT R5760 (S1) - Irene === * No restarts ||= N procs =||= 12 =||= 24 =||= 48 =||= 96 =||= 192 =||= 384 =||= 768 =||= 1536 =|| ||= 0.5 deg =|| 3h05 || 1h44 || 1h08 || 33m51 || 19m06 || 13m02 || 11m54 || 16m09 || ||= 1 deg =|| 43m33 || 24m09 || 15m43 || 8m32 || 5m56 || 5m31 || 7m || 13m17 || ||= 2 deg =|| 10m55 || 6m21 || 4m15 || 2m44 || 2m55 || 3m44 || 5m32 || || === MICT R5449 (S1) - Irene === ||= N procs =||= 12 =||= 24 =||= 48 =||= 96 =||= 192 =||= 384 =||= 768 =||= 1536 =|| ||= 0.5 deg =|| 3h15 || 1h51 || 1h13 || 36m40 || 21m34 || 16m53 || 16m03 || 21m11 || ||= 1 deg =|| 46m28 || 26m19 || 17m33 || 9m49 || 7m26 || 8m32 || 10m10 || 15m30 || ||= 2 deg =|| 12m20 || 7m41 || 5m30 || 3m55 || 4m34 || 7m09 || 9m46 || || * No restarts ||= N procs =||= 12 =||= 24 =||= 48 =||= 96 =||= 192 =||= 384 =||= 768 =||= 1536 =|| ||= 0.5 deg =|| 3h06 || 1h36 || 58m38 || 31m19 || 18m06 || 12m20 || 10m02 || 13m10 || ||= 1 deg =|| 43m40 || 22m38 || 13m13 || 7m29 || 4m49 || 3m46 || 4m43 || 9m37 || ||= 2 deg =|| 11m19 || 5m56 || 3m42 || 2m11 || 1m38 || 1m42 || 2m41 || || === MICT R5418 (S1) - Irene === * Processors are multiple of Irene core nodes(48c/node): ||= N procs =||= 12 =||= 24 =||= 48 =||= 96 =||= 192 =||= 384 =||= 768 =||= 1536 =|| ||= 0.5 deg =|| 4h11 || 2h15 || 1h22 || 41m44 || 23m19 || 17m52 || 16m24 || 21m18 || ||= 1 deg =|| 55m55 || 30m50 || 19m32 || 10m29 || 7m56 || 8m34 || 10m13 || 15m22 || ||= 2 deg =|| 14m21 || 8m32 || 5m49 || 4m09 || 4m41 || 7m16 || 8m04 || || * No restarts ||= N procs =||= 12 =||= 24 =||= 48 =||= 96 =||= 192 =||= 384 =||= 768 =||= 1536 =|| ||= 0.5 deg =|| 4h09 || 2h || 1h07 || 36m51 || 19m54 || 12m46 || 10m20 || 13m01 || ||= 1 deg =|| 53m42 || 26m45 || 15m24 || 8m46 || 5m22 || 4m14 || 4m36 || 7m55 || ||= 2 deg =|| 12m43 || 7m || 4m11 || 2m30 || 1m42 || 1m45 || 2m49 || || * Processors are multiple of Curie core nodes(16c/node) - no restarts ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 6h13 || 3h28 || 1h37 || 1h03 || 31m09 || 20m39 || 16m09 || 18m17 || ||= 1 deg =|| 1h28 || 48m38 || 23m05 || 18m43 || 8m34 || 7m44 || 8m42 || 11m39 || ||= 2 deg =|| 22m18 || 11m49 || 6m42 || 10m25 || 3m52 || 8m31 || 7m40 || - || === Trunk R5293 (S1) - Irene - with restarts === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 2h38 || 1h22 || 45m34 || 45m22 || 13m16 || 10m25 || 9m46 || 11m05 || ||= 1 deg =|| 38m25 || 20m52 || 11m19 || 11m30 || 4m59 || 4m32 || 4m31 || 6m06 || ||= 2 deg =|| 10m48 || 6m16 || 4m19 || 3m44 || 3m13 || 3m12 || 3m19 || - || * 2nd run ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| || 1h27 || 41m44 || 46m50 || 13m22 || 10m55 || 9m44 || 11m46 || ||= 1 deg =|| 39m35 || 21m03 || 10m50 || 6m58 || 4m57 || 4m33 || 4m22 || 7m03 || ||= 2 deg =|| 11m46 || 6m19 || 4m22 || 4m || 3m17 || 3m15 || 3m35 || - || Default options changed: {{{ DO_WOOD_HARVEST=n }}} === Trunk R5293 (S1) - Irene === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 2h57 || 1h52 || 1h06 || 52m09 || 44m08 || 40m57 || 40m47 || 42m08 || ||= 1 deg =|| 44m10 || 28m30 || 19m05 || 15m03 || 13m41 || 13m02 || 13m15 || 15m19 || ||= 2 deg =|| 13m30 || 9m24 || 7m26 || 6m46 || 6m19 || 6m39 || 7m21 || - || * 2nd run ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 2h58 || 1h52 || 1h08 || 52m05 || 43m56 || 41m18 || 40m06 || 41m02 || ||= 1 deg =|| 45m17 || 28m01 || 19m16 || 15m15 || 13m18 || 13m10 || 13m16 || 15m25 || ||= 2 deg =|| 13m15 || 9m17 || 7m28 || 6m57 || 6m19 || 6m30 || 7m34 || || Default options changed: {{{ DO_WOOD_HARVEST=n }}} === Trunk R5293 (S1) - Curie - with restarts === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 3h54 || 1h49 || 58m12 || 29m09 || 16m39 || 12m45 || 11m19 || 13m56 || ||= 1 deg =|| 55m52 || 27m29 || 14m14 || 8m31 || 6m57 || 5m04 || 4m58 || - || ||= 2 deg =|| 15m23 || 7m58 || 5m03 || ?? || 3m47 || 3m56 || 3m48 || - || Default options changed: {{{ DO_WOOD_HARVEST=n }}} === Trunk R5293 (S1) - Curie === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 4h10 || 2h21 || 1h27 || 1h05 || 54m35 || 49m23 || 46m50 || 48m08 || ||= 1 deg =|| 1h03 || 36m06 || 23m36 || 17m47 || 15m51 || 14m25 || 13m58 || - || ||= 2 deg =|| 18m08 || 11m40 || 8m39 || 7m25 || 6m34 || 6m19 || 6m19 || - || Default options changed: {{{ DO_WOOD_HARVEST=n }}} === MICT R5292 (S1) - Irene (NO align array64byte) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 7h08 || 3h55 || 1h54 || 2h27 || 33m25 || 23m17 || 18m34 || 21m11 || ||= 1 deg =|| 1h45 || 56m27 || 26m || 28m01 || 15m04 || 9m54 || 10m10 || - || ||= 2 deg =|| 25m05 || 14m15 || 7m56 || 5m32 || 4m33 || 6m01 || 8m34 || - || === MICT R5292 (S1) - Irene (align array64byte) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 6h57 || 3h53 || 1h49 || 1h07 || 33m30 || 22m51 || 19m03 || 21m05 || ||= 1 deg =|| 1h36 || 55m08 || 26m08 || 1957s -> 32m37 (?) || 9m38 || 9m || 10m24 || - || ||= 2 deg =|| 25m54 || 14m || 6m31 || 5m42 || 3m41 || 5m22 || 8m05 || - || === MICT R5273 (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 11h21 || 5h13 || 2h28 || 1h13 || 37m57 || 22m09 || 15m04 || 16m35 || ||= 1 deg =|| 2h33 || 1h13 || 35m25 || 18m02 || 10m34 || 7m14 || 6m26 || - || ||= 2 deg =|| 38m14 || 18m06 || 9m47 || 5m42 || 3m58 || 3m18 || 3m25 || - || * GLUC is disabled === MICT R5270 (new GLUC) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 2 deg =|| || || 46m02 || || 13m48 || || 8m47 || || * 61 PFT's * IOIPSL: no outputs === MICT R5255 (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 11h17 || 5h14 || 2h28 || 1h16 || 41m56 || 25m30 || 18m56 || 19m13 || ||= 1 deg =|| 2h30 || 1h12 || 34m35 || 18m50 || 11m47 || 8m27 || 7m25 || - || ||= 2 deg =|| 37m36 || 18m36 || 10m34 || 6m51 || 5m04 || 4m14 || 4m16 || - || This new test includes the following options: {{{ SOILTYPE_CLASSIF = usda # MICT hydrol USE_SOILC_TEMPDIFF=y use_refSOC=y use_refSOC_hydrol=y SOIL_REFSOC_FILE = refSOC.nc SOIL_REFSOC_1d_FILE = refSOC_1d.nc }}} === MICT R5255 (S0) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 10h53 || 4h57 || 2h20 || 1h08 || 35m54 || 22m48 || 17m16 || 17m29 || ||= 1 deg =|| 2h23 || 1h07 || 32m39 || 16m48 || 9m35 || 6m07 || 5m53 || - || ||= 2 deg =|| 34m31 || 16m25 || 8m32 || 5m02 || 3m34 || 2m54 || 2m51 || - || This new test includes the following options: {{{ SOILTYPE_CLASSIF = usda # MICT hydrol USE_SOILC_TEMPDIFF=y use_refSOC=y use_refSOC_hydrol=y SOIL_REFSOC_FILE = refSOC.nc SOIL_REFSOC_1d_FILE = refSOC_1d.nc }}} === Orchidee-CN-P R4758 === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 6h15 || 2h47 || 1h16 || 37m06 || 20m41 || 13m07 || 11m37 || 13m46 || ||= 1 deg =|| 1h20 || 38m05 || 18m39 || 10m16 || 6m56 || 4m55 || 5m42 || 16m49 || ||= 2 deg =|| 20m16 || 10m20 || 5m49 || 4m05 || 3m15 || 3m11 || 3m32 || || === TRUNK R4788 === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 3h06 || 1h38 || 54m18 || 34m39 || 25m44 || 21m43 || 19m56 || 24m54 || ||= 1 deg =|| 45m21 || 24m08 || 14m23 || 9m34 || 7m16 || 6m27 || 10m14 || 20m33 || ||= 2 deg =|| 12m01 || 6m35 || 4m21 || 3m04 || 2m41 || 2m33 || 2m32 || - || Note: are results correct? The Trunk has no parallel interpolation algorithm. === MICT R4755 (S0) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 12h37 || 5h37 || 2h38 || 1h16 || 38m16 || 22m32 || 16m16 || 18m34 || ||= 1 deg =|| 2h44 || 1h18 || 37m37 || 19m06 || 10m35 || 7m52 || 7m50 || - || ||= 2 deg =|| 41m12 || 19m31 || 10m02 || 5m42 || 3m42 || 3m09 || 3m18 || - || === MICT R4755 (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| ??? || 5h51 || 2h45 || 1h22 || 43m52 || 25m25 || 19m05 || 19m49 || ||= 1 deg =|| 2h51 || 1h23 || 39m57 || 20m59 || 13m28 || 10m29 || 9m07 || - || ||= 2 deg =|| 43m33 || 21m11 || 11m24 || 6m53 || 4m46 || 4m03 || 4m20 || - || Note: netcdf restart files compression is activated === MICT R4414 (S0) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 11h50 || 5h29 || 2h33 || 1h14 || 37m02 || 25m || 12m52 || 14m43 || ||= 1 deg =|| 2h41 || 1h17 || 38m41 || 18m44 || 10m32 || 9m || || || ||= 2 deg =|| 40m || 19m03 || 10m17 || 5m32 || 4m33 || 2m42 || || || === MICT R4385 (S2) === * Standard ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| || || || || 42m40 || || || || * SOA ch4 ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| || || || || 41m34 || || || || === MICT R4385 (S1) === * No restarts (unlimited IOIPSL): ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| 12h17 || 5h42 || 2h41 || 1h17 || 38m12 || 21m39 || 13m52 || 14m58 || ||= 1.0 deg =|| 2h47 || 1h20 || 38m16 || 19m25 || 10m35 || 6m58 || - || - || ||= 2.0 deg =|| 42m || 20m || 10m29 || 6m12 || 4m32 || 3m03 || - || - || * No restarts (limited IOIPSL): All running (_latest) ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| c || c || 2h39 || 1h17 || 37m58 || 21m02 || 13m49 || 14m59 || * With restarts (unlimited IOIPSL): All running (_latest_next) ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| c || c || 2h46 || 1h24 || 43m51 || 31m || 23m11 || 22m32 || * With restarts (limited IOIPSL): (_latest_limitedio) ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| || || || || 36m26 || 20m21 || 12m52 || 14m45 || * No restarts (limited IOIPSL): thermosoil_cond_pft + no precise (_latest_refactor) thermosoil_cond_pft is refactored again. Performance modifications were lost in previous commits. Refactorization + no procise allows the vectorization of pow and exp subroutines. ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =||= 1024 =|| ||= 0.5 deg =|| || || || || 32m23 || || || || === MICT R4289 (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| 13h || 6h11 || 2h50 || 1h23 || 42m06 || 23m47 || 15m42 || ||= 1.0 deg =|| 2h57 || 1h26 || 42m13 || 20m18 || 10m51 || 7m19 || - || ||= 2.0 deg =|| 44m58 || 21m21 || 10m54 || 6m17 || 4m || 2m56 || - || Note: IOISPL alignment is introduced by default === MICT R4277 + interpolation (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| 16h39 || 8h05 || 3h38 || 1h42 || 50m12 || 26m13 || 16m47 || ||= 1.0 deg =|| 3h45 || 1h44 || 50m16 || 22m29 || 13m11 || 7m29 || || ||= 2.0 deg =|| 55m12 || 25m09 || 12m35 || 7m13 || 4m27 || 3m12 || || Commit in [4289/branches/ORCHIDEE-MICT/ORCHIDEE] === MICT R4277 + thermosoil refactor + IOIPSL alignment (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| 13h42 || 6h28 || 2h59 || 1h28 || 44m14 || 24m48 || 16m35 || ||= 1.0 deg =|| 3h25 || 1h29 || 42m57 || 21m20 || 11m24 || 7m13 || || ||= 2.0 deg =|| 45m || 22m48 || 11m42 || 7m23 || 4m15 || 3m05 || || === MICT R4277 + IOIPSL aligment (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| 16h39 || 8h46 || 3h || 1h46(2nd try) || 52m || 27m14 || 17m19 || ||= 1.0 deg =|| 4h10 || 1h48(2nd try) || 52m51 || 24m21 || 12m37 || 7m40 || - || ||= 2.0 deg =|| 55m || 26m20 || 13m04 || 6m46 || 4m28 || 3m11 || - || === MICT R4277 + thermosoil refactor (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| 13h38 || 6h28 || 3h || 1h29 || 48m16 || 28m52 || 20m20 || ||= 1.0 deg =|| 3h08 || 1h31 || 44m56 || 23m28 || 14m37 || 10m18 || || ||= 2.0 deg =|| 49m46 || 25m35 || 14m55 || 10m11 || 7m59 || 6m57 || || Commited in [4280/branches/ORCHIDEE-MICT/ORCHIDEE] === MICT R4277 (S1) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| 16h39 || 8h21 || 3h43 || 1h47 || 55m34 || 31m18 || 21m28 || ||= 1.0 deg =|| 3h27 || 1h48 || 53m37 || 26m21 || 15m38 || 10m46 || - || ||= 2.0 deg =|| 59m17 || 28m31 || 16m05 || 10m42 || 8m13 || 7m03 || - || === MICT R4277 + thermosoil refactor (S0) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| || 7h42 || 3h18 || 1h33 || 47m07 || 24m48 || 15m53 || ||= 1.0 deg =|| 3h27 || 1h36 || 47m || 21m38 || 13m33 || 9m14 || || ||= 2.0 deg =|| 51m38 || 24m19 || 14m08 || 8m56 || 5m16 || 4m18 || || Commited in [4280/branches/ORCHIDEE-MICT/ORCHIDEE] === MICT R4277 (S0) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| || 7h33 || 3h15 || 1h32 || 49m49 || 24m43 || 15m47 || ||= 1.0 deg =|| 3h26 || 1h36 || 46m05 || 21m18 || 13m25 || 9m05 || || ||= 2.0 deg =|| 51m07 || 24m06 || 14m06 || 8m56 || 5m12 || 4m09 || || === MICT R4274 (S0) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =||= 512 =|| ||= 0.5 deg =|| - || 7h33 || 3h15 || 1h32 || 46m49 || 24m43 || 15m47 || ||= 1.0 deg =|| 3h26 || 1h36 || 46m05 || 21m18 || 13m25 || 9m05 || || ||= 2.0 deg =|| 51m07 || 24m06 || 14m04 || 8m56 || 5m12 || 4m09 || || === Trunk R3934 (XIOS 2 + S0) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| 3h23 || 2h01 || 1h19 || 1h01 || 53m18 || 48m40 || ||= 1.0 deg =|| 52m07 || 32m10 || 21m38 || 17m18 || 14m52 || 13m56 || ||= 2.0 deg =|| 14m50 || 9m11 || 6m38 || 5m39 || 5m15 || 5m13 || Notes: * Interpolation is sequential === Mict R3932 (XIOS 2) + CROP + IOIPSL restarts === This is an early test with IOIPSL + restarts to 3, 4 and 5 dimensions. This revision is still in a perso directory. It includes remaining revisions from TRUNK. It will be merge to the main MICT branch any time soon. Its purpose is to provide a first draft of this modification. ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| 15h54 || 7h50 || 3h42 || 2h06 || 55m01 || 34m35 || * 0.5: The number of XIOS outputs is reduced so the simulation can finish. === Mict R3932 (XIOS 2) + IOIPSL restarts === This is an early test with IOIPSL + restarts to 3, 4 and 5 dimensions. This revision is still in a perso directory. It includes remaining revisions from TRUNK. It will be merge to the main MICT branch any time soon. Its purpose is to provide a first draft of this modification. ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| Out of mem. || 5h31 || 2h40 || 1h24 || 44m32 || 23m15 || ||= 1 deg =|| 2h44 || 1h21 || 39m24 || 19m15 || 10m44 || 7m21 || ||= 2 deg =|| 43m41 || 20m37 || 10m58 || 6m23 || 4m15 || 3m24 || * 0.5: The number of XIOS outputs is reduced so the simulation can finish. === Mict R3811 (XIOS 2) === ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| out of memory || out of memory || out of memory || out of memory || 1h33 || 1h27 || ||= 1 deg =|| 2h48 || 1h27 || 44m24 || 25m21 || 19m52 || 13m47 || ||= 2 deg =|| 43m48 || 21m31 || 11m13 || 7m40 || 5m16 || 4m30 || Output Netcdf files: * 0.5 Degree || Filename || Size || # vars || || stomate_rest_out.nc || 14G || 379 (double) || || sechiba_rest_out.nc || 8.5G || 234 (double) || || driver_rest_out.nc || 28M || 13 (double) || || sechiba_history.nc || 3.0G || 114 (float) || || stomate_history.nc || 3.3G || 297 (float) || Changes * CROP restart variables are now only active when CROP is enabled. * XIOS history outputs now include 4D/5D dimension. It allows to reduce the number of variables in the outputs. Issues * 0.5 deg Out of memory is due to XIOS Conclusion * 0.5deg - 64 procs: it might be due to 4D/5D variables. * 0.5deg computing time: less restart variables to write decreased total time. CROP module still has this problem. === Mict R3791 (XIOS 2) === * Date: 30/09/16 * Add all XIOS output fields Time table: ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| out of memory || out of memory || out of memory || 14h56 || 15h42 || 11h06 || ||= 1 deg =|| 3h19 || 2h28 || 1h52 || 2h22 || 1h32 || 1h23 || ||= 2 deg =|| 54m38 || 38m53 || 23m58 || 24m43 || 24m37 || 16m11 || Output Netcdf files: * 0.5 Degree || Filename || Size || # vars || || stomate_rest_out.nc || 20G || 611 (double) || || sechiba_rest_out.nc || 8.6G || 234 (double) || || driver_rest_out.nc || 28M || 13 (double) || || sechiba_history.nc || 2.0G || 388 (float) || || stomate_history.nc || 3.8G || 1179 (float) || Changes * Add CROP module. Issues * 0.5 deg Memory requirements are high * 0.5 deg Simulation Time is far too high. Even when the module is disabled. Conclusion * 0.5deg Memory: the introduction of XIOS increases the memory usage * 0.5deg simulation time: a lots of more restart variables to write === Mict R3587 (XIOS 2!) === * Small fixes * Trunk update Time table: ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| out of memory || 4h43 || 2h21 || 1h18 || 58 || 47 || ||= 1 deg =|| running || 1h05 || 35 || 23 || 16 || 18 || ||= 2 deg =|| - || - || - || - || - || - || === Mict R3587 (XIOS 2 + thermosoil_cond_pft) === This specific branch involves the subroutine thermosoil_cond_pft. It is shown in some profiling reports to be highly consuming. The next tests are an effort to improve the performance. All tests are done with 0.5 degres. All other parameters are the same specified in this section. ||= N procs =||= 32 =||= 64 =||= 128 =||= 256 =|| ||=Avx + align 32 + vecalign32 =|| 2h08 || 1h16 || 49 || 40 || ||=Align 32 + vecalign32 =|| 2h13 || 1h30 || 54 || 43 || ||=Avx + align 32 =|| 2h12 || 1h18 || 1h02 || 47 || ||=Align 16 =|| 2h23 || 1h13 || 1h04 || 50 || Description: * avx: 256 bit register * align 32: -align array32byte compilation flag * align 16: -align array16byte compilation flag * vecalign: source code lines to help the compiler improve the performance === Mict R3567 === * New driver Time table: ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| timeout || 11h22 || 7h21 || 4h51 || 3h38 || 2h49 || ||= 1 deg =|| 4h22 || 2h21 || 1h24 || 54 || 39 || 30 || ||= 2 deg =|| 58 || 31 || 19 || 13 || 9 || 7 || === Mict R3527 === * PFT parallel interpolation Time table: ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| >16h39 322 days|| 11h09 || 7h06 || 4h50 || 3h31 || 2h47 || ||= 1 deg =|| 4h10 || 2h14 || 1h20 || 52 || 37 || 30 || ||= 2 deg =|| 55m05 || 30m01 || 18m05 || 12 || 9 || 7 || === Mict R3161 === Time table: ||= N procs =||= 4 =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =|| ||= 0.5 deg =|| timeout || timeout || 13h00 || 8h46 || 6h35 || 5h38 || ||= 1 deg =|| 6h37 || 4h20 || 2h36 || 1h45 || 1h21 || 1h08 || ||= 2 deg =|| 1h40 || 56 || 35 || 24 || 19 || 16 || Note: 0.5 deg in 4 N procs did not start due to memory requirements. 0.5 deg in 8 N procs could not finish the simulation in the maximum time given by the HPC. It stopped at the simulation day 322. Both values can be extrapolated. === Trunk R2916 === The same simulations with the same options where carried out with the following results: ||= N procs =||= 4 =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =|| ||0.5 deg || 8h38 || 5h31 || 3h26 || 2h23 || 1h48 || 1h31 || ||1 deg || 2h07 || 1h17 || 47 || 32 || 25 || 21 || ||2 deg || 38 || 19 || 11 || 8 || 6 || 5 || == IOIPSL == Restart File Creation : [[Image( ioipsl_forcesoil.jpg, 50%, title="IOIPSL restart file creation")]] == Table template == ||= N procs =||= 12 =||= 24 =||= 48 =||= 96 =||= 192 =||= 384 =||= 768 =||= 1536 =|| ||= 0.5 deg =|| || || || || || || || || ||= 1 deg =|| || || || || || || || || ||= 2 deg =|| || || || || || || || ||