Changes between Version 17 and Version 18 of ParallelismPerformances


Ignore:
Timestamp:
2012-12-06T11:06:22+01:00 (11 years ago)
Author:
dsolyga
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ParallelismPerformances

    v17 v18  
    166166=== Patch Evaluation === 
    167167 
     168==== Test : NCC forcing files (1°) ==== 
     169 
    168170In order to study both the influence of the IO patches and the Load balance file, I make a survey using the following setup : 
    169171 * NCC forcing file : 360*180, 15238 land points 
    170  * Loop 10 times over the same year to study the influence of the Load balance file. 
     172 * 10 years starting from scratch to study the influence of the Load balance file. 
    171173 * sechiba_hist_level = 4 
    172174 * stomate_hist_level = 5 
     175 * 125 variables written in the output ! 
    173176 * Monthly outputs 
    174177 * Tests done on Curie 
     
    183186Notice that the patch is significant for a high number of processors (>16). For 32 and 48 processors, the gain is about 60% (it seems that it is the optimal for NCC). 
    184187After 48, the gain diminished. [[BR]] 
    185 ''' Recommendations : ''' 
     188''' Recommendations for NCC forcing : ''' 
    186189 * NCC forcing files : 32 processors 
    187  * CRUNCEP : ~128 processors (evaluation as CRUNCEP has 4 times more points than NCC)  
    188  * Other forcing files : use a linear relationship to evaluate your number of processors. If you have 45000 points you could use 32*3 = 96 processors.  
    189  
    190 __Summary__ : ORCHIDEE should be used on more than 128 processors for the moment. I you increase the number of processors, you could lose time because of MPI communications and the multiple 
    191 writring of output files.[[BR]] 
    192190To know how the parallelization has been improved, you could read the following report [wiki:ParallelVersion here] (in french sorry!). In this report, the optimal number of 
    193191processors was evaluated to 6 for NCC forcing files!    
     192 
     193 
     194 
     195====  Test : CRU-NCEP (0.5°) ==== 
     196 
     197With Nicolas Viovy, we agree on a common protocol to compare his version and the standard one : 
     198 * CRU-NCEP forcing file : 0.5°, ~60000 land points 
     199 * 3 years stating from scratch to study the influence of the Load balance file. 
     200 * sechiba_hist_level = 1 
     201 * stomate_hist_level = 1 
     202 * ~20 variables written in the output ! 
     203 * Monthly outputs 
     204 * Tests done on Curie 
     205 * REBUILD is done after the run 
     206SECHIBA_hist_level and STOMATE_hist_level are voluntary low, because Nicolas has about 20 variables in his output files.[[BR]] 
     207 
     208'''Results :''' 
     209 * Nicolas version : 
     210 
     211|| Number processors  ||  Time per processor  || 
     212||  32                ||  ~20 min (evaluation)|| 
     213||  64                || 10 min             || 
     214||  128               ||     ~5 min          || 
     215 
     216 * Standard version (trunk, revision 1076)  : 
     217|| Number processors  ||  Time per processor  || 
     218||  16                ||    24 min        || 
     219||  32                ||    16 min        || 
     220||  48                ||    13 min        || 
     221||  64                ||    11 min        || 
     222||  128               ||     9 min 30     || 
     223 
     224''' Conclusion  :''' 
     225 * The performance between Nicolas and standard version are similar until 64 processors. After, there are no more improvements in the standard version. 
     226 * For CRU-NCEP forcing files, the optimal number of processors is 64. Don't use more : you will use too much time computing. 
     227 * With the standard version, you can use the routing. It is not really possible with Nicolas version. 
     228 * There are still two problems to solve : 
     229   - Change level output for some variables : there are too many variables written by ORCHIDEE. We could set to level 1 all the essential variables necessary to performed a spin-up.  
     230   - Why we lose scalability when we use more than 64 processors ?   
     231 
     232''' ACTIONS : ''' 
     233 * Redefined output level for ORCHIDEE variables 
     234 * Use Vampir to understand the behaviour of ORCHIDEE on a high number of processors.