# Changeset 1560 for XIOS/dev/branch_openmp

Ignore:
Timestamp:
07/13/18 14:18:28 (2 years ago)
Message:

report update

Location:
XIOS/dev/branch_openmp/Note
Files:
5 edited

Unmodified
Removed
• ## XIOS/dev/branch_openmp/Note/rapport ESIWACE.aux

 r1552 \newlabel{fig:sendrecv}{{6}{5}} \citation{ep:2018} \@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces }}{6}} \@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces \input "rapport ESIWACE"-1.cpt\relax }}{6}} \newlabel{fig:bcast}{{7}{6}} \@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces }}{6}} \@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces \input "rapport ESIWACE"-2.cpt\relax }}{6}} \newlabel{fig:allreduce}{{8}{6}} \citation{ep:2018} \citation{ep:2018} \bibstyle{plain} \bibdata{reference} \bibcite{ep:2018}{1} \@writefile{toc}{\contentsline {section}{\numberline {3}The multi-threaded XIOS and performance results}{7}} \@writefile{toc}{\contentsline {subsection}{\numberline {3.1}LMDZ work-flow}{7}} \bibcite{Dinan:2013}{2} \bibcite{Sridharan:2014}{3} \@writefile{toc}{\contentsline {section}{\numberline {3}The multi-threaded XIOS and performance results}{7}} \@writefile{toc}{\contentsline {section}{\numberline {4}Future works for XIOS}{7}} \@writefile{toc}{\contentsline {subsection}{\numberline {3.2}CMIP6 work-flow}{8}} \@writefile{toc}{\contentsline {section}{\numberline {4}Future works for XIOS}{8}}
• ## XIOS/dev/branch_openmp/Note/rapport ESIWACE.log

 r1552 This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) (preloaded format=pdflatex 2017.8.24)  27 JUN 2018 15:10 This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) (preloaded format=pdflatex 2017.8.24)  13 JUL 2018 12:17 entering extended mode restricted \write18 enabled. \verbatim@in@stream=\read1 ) (/usr/share/texlive/texmf-dist/tex/latex/cprotect/cprotect.sty Package: cprotect 2011/01/27 v1.0e (Bruno Le Floch) (/usr/share/texlive/texmf-dist/tex/latex/base/ifthen.sty Package: ifthen 2014/09/29 v1.1c Standard LaTeX ifthen package (DPC) ) (/usr/share/texlive/texmf-dist/tex/latex/bigfoot/suffix.sty Package: suffix 2006/07/15 1.5a Variant command support ) \CPT@WriteOut=\write3 \c@CPT@WriteCount=\count109 \c@CPT@numB=\count110 \CPT@commandatend@toks=\toks27 ) (./rapport ESIWACE.aux) \openout1 = "rapport ESIWACE.aux"'. LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 17. LaTeX Font Info:    ... okay on input line 17. LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 17. LaTeX Font Info:    ... okay on input line 17. LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 17. LaTeX Font Info:    ... okay on input line 17. LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 17. LaTeX Font Info:    ... okay on input line 17. LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 17. LaTeX Font Info:    ... okay on input line 17. LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 17. LaTeX Font Info:    ... okay on input line 17. LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 18. LaTeX Font Info:    ... okay on input line 18. LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 18. LaTeX Font Info:    ... okay on input line 18. LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 18. LaTeX Font Info:    ... okay on input line 18. LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 18. LaTeX Font Info:    ... okay on input line 18. LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 18. LaTeX Font Info:    ... okay on input line 18. LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 18. LaTeX Font Info:    ... okay on input line 18. (/usr/share/texlive/texmf-dist/tex/context/base/supp-pdf.mkii [Loading MPS to PDF converter (version 2006.09.02).] \scratchcounter=\count109 \scratchcounter=\count111 \scratchdimen=\dimen120 \scratchbox=\box30 \nofMPsegments=\count110 \nofMParguments=\count111 \everyMPshowfont=\toks27 \MPscratchCnt=\count112 \nofMPsegments=\count112 \nofMParguments=\count113 \everyMPshowfont=\toks28 \MPscratchCnt=\count114 \MPscratchDim=\dimen121 \MPnumerator=\count113 \makeMPintoPDFobject=\count114 \everyMPtoPDFconversion=\toks28 \MPnumerator=\count115 \makeMPintoPDFobject=\count116 \everyMPtoPDFconversion=\toks29 ) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/pdftexcmds.sty Package: pdftexcmds 2011/11/29 v0.20 Utility functions of pdfTeX for LuaTeX (HO e )) \c@lstlisting=\count115 \c@lstlisting=\count117 File: Charge1.png Graphic file (type png) Package pdftex.def Info: Charge1.png used on input line 33. Package pdftex.def Info: Charge1.png used on input line 34. (pdftex.def)             Requested size: 165.01357pt x 91.23924pt. File: Charge2.png Graphic file (type png) Package pdftex.def Info: Charge2.png used on input line 34. Package pdftex.def Info: Charge2.png used on input line 35. (pdftex.def)             Requested size: 165.25446pt x 91.05858pt. [1 File: domain.pdf Graphic file (type pdf) Package pdftex.def Info: domain.pdf used on input line 60. Package pdftex.def Info: domain.pdf used on input line 64. (pdftex.def)             Requested size: 236.1567pt x 71.13055pt. File: omp.pdf Graphic file (type pdf) Package pdftex.def Info: omp.pdf used on input line 68. Package pdftex.def Info: omp.pdf used on input line 78. (pdftex.def)             Requested size: 291.64784pt x 126.32893pt. [2 <./domain.pdf>] File: scheme.png Graphic file (type png) Package pdftex.def Info: scheme.png used on input line 86. Package pdftex.def Info: scheme.png used on input line 102. (pdftex.def)             Requested size: 266.18977pt x 207.17032pt. [3 <./omp.pdf>] File: tag.png Graphic file (type png) Package pdftex.def Info: tag.png used on input line 140. Package pdftex.def Info: tag.png used on input line 156. (pdftex.def)             Requested size: 301.11966pt x 41.35376pt. [4 <./scheme.png (PNG copy)>] File: sendrecv.png Graphic file (type png) Package pdftex.def Info: sendrecv.png used on input line 149. Package pdftex.def Info: sendrecv.png used on input line 165. (pdftex.def)             Requested size: 331.63313pt x 277.83307pt. File: bcast.png Graphic file (type png) Package pdftex.def Info: bcast.png used on input line 173. Package pdftex.def Info: bcast.png used on input line 188. (pdftex.def)             Requested size: 153.87605pt x 165.62003pt. \openout3 = "rapport ESIWACE-1.cpt"'. (./rapport ESIWACE-1.cpt) File: allreduce.png Graphic file (type png) Package pdftex.def Info: allreduce.png used on input line 185. Package pdftex.def Info: allreduce.png used on input line 200. (pdftex.def)             Requested size: 223.13535pt x 165.62003pt. [6 <./bcast.png (PNG copy)> <./allreduce.png (PNG copy)>] (./rapport ESIWACE.b bl \openout3 = "rapport ESIWACE-2.cpt"'. (./rapport ESIWACE-2.cpt) [6 <./bcast.png (PNG copy)> <./allreduce.png (PNG co py)>] (./rapport ESIWACE.bbl Underfull \hbox (badness 3354) in paragraph at lines 4--9 []\OT1/cmr/m/n/10 XIOS de-vel-op-per group.  Note for XIOS End-points.  Tech-ni [] ) [7] (./rapport ESIWACE.aux) ) [7]) [8] (./rapport ESIWACE.aux) ) Here is how much of TeX's memory you used: 4637 strings out of 494953 61251 string characters out of 6180977 137489 words of memory out of 5000000 7849 multiletter control sequences out of 15000+600000 9090 words of font info for 34 fonts, out of 8000000 for 9000 4795 strings out of 494953 63636 string characters out of 6180977 139394 words of memory out of 5000000 7989 multiletter control sequences out of 15000+600000 9397 words of font info for 35 fonts, out of 8000000 for 9000 14 hyphenation exceptions out of 8191 41i,8n,35p,1270b,264s stack positions out of 5000i,500n,10000p,200000b,80000s Output written on "rapport ESIWACE.pdf" (7 pages, 269577 bytes). 41i,8n,35p,1270b,618s stack positions out of 5000i,500n,10000p,200000b,80000s < /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb> Output written on "rapport ESIWACE.pdf" (8 pages, 273883 bytes). PDF statistics: 86 PDF objects out of 1000 (max. 8388607) 57 compressed objects within 1 object stream 89 PDF objects out of 1000 (max. 8388607) 59 compressed objects within 1 object stream 0 named destinations out of 1000 (max. 500000) 46 words of extra memory for PDF output out of 10000 (max. 10000000)
• ## XIOS/dev/branch_openmp/Note/rapport ESIWACE.tex

 r1552 \usepackage{url} \usepackage{verbatim} \usepackage{cprotect} % Title Page project develops a new dynamical core for LMD-Z, the atmospheric general circulation model (GCM) part of IPSL-CM Earth System Model. \url{http://www.lmd.polytechnique.fr/~dubos/DYNAMICO/}} all use XIOS as the output back end. M\'et\'eoFrance and MetOffice also choose XIOS to manege the I/O for their models. to manage the I/O for their models. Although XIOS copes well with many models, there is one potential optimization in XIOS which needs to be investigated: making XIOS thread-friendly. This topic comes along with the configuration of the climate models. Take LMDZ as example, it is designed with the 2-level parallelization scheme. To be more specific, LMDZ uses the domain decomposition method in which each sub-domain is associated with one MPI process. Inside of the sub-domain, the model also uses OpenMP derivatives to accelerate the computation. We can imagine that the sub-domain be divided into sub-sub-domain and is managed by threads. This topic comes along with the configuration of the climate models. Take LMDZ as example, it is designed with the 2-level parallelization scheme. To be more specific, LMDZ uses the domain decomposition method in which each sub-domain is associated with one MPI process. Inside of the sub-domain, the model also uses OpenMP derivatives to accelerate the computation. We can imagine that the sub-domain be divided into sub-sub-domain and is managed by threads. \begin{figure}[ht] \end{figure} As we know, each sub-domain, or in another word, each MPI process is a XIOS client. The data exchange between client and XIOS servers is handled by MPI communications. In order to write an output field, all threads must gather the data to the master thread who acts as MPI process in order to call MPI routines. There are two disadvantages about this method : first, we have to spend time on gathering information to the master thread which not only increases the memory use, but also implies an OpenMP barrier; second, while the master thread calls MPI routine, other threads are in the idle state thus a waster of computing resources. What we want obtain with the thread-friendly XIOS is that all threads can act like MPI processes. They can call directly the MPI routine thus no waste in memory nor in computing resources as shown in Figure \ref{fig:omp}. As we know, each sub-domain, or in another word, each MPI process is a XIOS client. The data exchange between client and XIOS servers is handled by MPI communications. In order to write an output field, all threads must gather the data to the master thread who acts as MPI process in order to call MPI routines. There are two disadvantages about this method : first, we have to spend time on gathering information to the master thread which not only increases the memory use, but also implies an OpenMP barrier; second, while the master thread calls MPI routine, other threads are in the idle state thus a waster of computing resources. What we want obtain with the thread-friendly XIOS is that all threads can act like MPI processes. They can call directly the MPI routine thus no waste in memory nor in computing resources as shown in Figure \ref{fig:omp}. \begin{figure}[ht] \end{figure} There are two ways to make XIOS thread-friendly. First of all, change the structure of XIOS which demands a lot of modification is the XIOS library. Knowing that XIOS is about 100 000 lines of code, this method will be very time consuming. What's more, the modification will be local to XIOS. If we want to optimize an other code to be thread-friendly, we have to redo the modifications. The second choice is to add an extra interface to MPI in order to manage the threads. When a thread want to call an MPI routine inside XIOS, it will first pass the interface, in which the communication information will be analyzed before the MPI routine is invoked. With this method, we only need to modify a very small part of XIOS in order to make it work. What is more interesting is that the interface we created can be adjusted to suit other MPI based libraries. There are two ways to make XIOS thread-friendly. First of all, change the structure of XIOS which demands a lot of modification is the XIOS library. Knowing that XIOS is about 100 000 lines of code, this method will be very time consuming. What's more, the modification will be local to XIOS. If we want to optimize an other code to be thread-friendly, we have to redo the modifications. The second choice is to add an extra interface to MPI in order to manage the threads. When a thread want to call an MPI routine inside XIOS, it will first pass the interface, in which the communication information will be analyzed before the MPI routine is invoked. With this method, we only need to modify a very small part of XIOS in order to make it work. What is more interesting is that the interface we created can be adjusted to suit other MPI based libraries. data, execution of the MPI function by all master/root threads, distribution or arrangement of the resulting data among threads. %The most representative functions of the collective communications are \verb|MPI_Gather| and \verb|MPI_Bcast|. For example, if we want to perform a broadcast operation, only 2 steps are needed (\textit{c.f.} Figure \ref{fig:bcast}). Firstly, the root \centering \includegraphics[scale=0.3]{bcast.png} \caption{} \cprotect\caption{\verb|MPI_Bcast|} \label{fig:bcast} \end{figure} \centering \includegraphics[scale=0.3]{allreduce.png} \caption{} \cprotect\caption{\verb|MPI_Allreduce|} \label{fig:allreduce} \end{figure} Other MPI routines, such as \verb|MPI_Wait|, \verb|MPI_Intercomm_create| \textit{etc.}, can be found in the technique report of the endpoint interface. endpoint interface \cite{ep:2018}. \section{The multi-threaded XIOS and performance results} The development of endpoint interface for thread-friendly XIOS library took about one year and a half. The main difficulty is the co-existence of MPI processes and OpenMP threads. All MPI classes must be redefined in the endpoint interface along with all the routines. The development is now available on the forge server: \url{http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/dev/branch_openmp}. One technique report is also available in which one can find more detail about how endpoint works and how the routines are implemented \cite{ep:2018}. We must note that the thread-friendly XIOS library is still in the phase of optimization. It will be released in the future with a stable version. All the functionalities of XIOS is reserved in its thread-friendly version. Single threaded code can work successfully with the new version of XIOS. For multi-threaded models, some modifications are needed in order to work with the multi-threaded XIOS library. Detail can be found in our technique report \cite{ep:2018}. co-existence of MPI processes and OpenMP threads. One essential requirement for using the endpoint interface is that the underlying MPI implementation must support the level-3 of thread support which is \verb|MPI_THREAD_MULTIPLE|. This means that if the MPI process is multi-threaded, multiple threads may call MPI at once with no restrictions. Another importance aspect to be mentioned is that in XIOS, we have variables with \verb|static| attribute. It means that inside of an MPI process, threads share the static variable. In order to use correctly the endpoint interface, these static variables have to be defined as \verb|threadprivate| to limit the visibility to thread. To develop the endpoint interface, we redefined all MPI classes along with all the MPI routines that are used in XIOS library. The current version of the interface includes about 7000 lines of code and is now available on the forge server: \url{http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/dev/branch_openmp}. One technique report is also available in which one can find more detail about how endpoint works and how the routines are implemented \cite{ep:2018}. We must note that the thread-friendly XIOS library is still in the phase of optimization. It will be released in the future with a stable version. All the functionalities of XIOS is reserved in its thread-friendly XIOS library. Single threaded code can work successfully under the endpoint interface with the new version of XIOS. For multi-threaded models, some modifications are needed in order to work with the multi-threaded XIOS library. For example, the MPI initialization has be to modified to require the \verb|MPI_THREAD_MULTIPLE| support. Each thread should have its own data set. What's most important is that the OpenMP master region in which the master thread calls XIOS routines should be erased in order that every threads can call XIOS routines simultaneously. More detail can be found in our technique report \cite{ep:2018}. Even though the multi-threaded XIOS library is not fully accomplished and further optimization in ongoing. We have already done some tests to see the potential of the endpoint framework. We take LMDZ as the target model and have tested with several work-flow charges. \subsection{LMDZ work-flow} In the LMDZ work-flow, we have a daily output file. We have up to 413 two-dimension variables and 187 three-dimension variables. According to user's need, we can change the output\_level'' key argument in the \verb|xml| file to select the desired variables to be written. In our tests, we choose to set output\_level=2'' for a light output, and output\_level=11'' for a full output. We run the LMDZ code for one, two, and three-month simulations using 12 MPI client processes and 1 server process. Each client process includes 8 OpenMP threads which gives us 92 XIOS clients in total. \subsection{CMIP6 work-flow} \begin{comment}
• ## XIOS/dev/branch_openmp/Note/rapport ESIWACE.tex.backup

 r1552 \usepackage{amsmath} \usepackage{url} \usepackage{verbatim} % Title Page project develops a new dynamical core for LMD-Z, the atmospheric general circulation model (GCM) part of IPSL-CM Earth System Model. \url{http://www.lmd.polytechnique.fr/~dubos/DYNAMICO/}} all use XIOS as the output back end. M\'et\'eoFrance and MetOffice also choose XIOS to manege the I/O for their models. to manage the I/O for their models. \caption{This figure shows the classic pattern of a P2P communication with the endpoint interface. Thread/endpoint rank 0 sends a message to thread/endpoint rank 3 with tag=1. The underlying MPI function called by the sender is indeed a send for MPI rank of 1 and tag=65537. From the receiver's point of view, the endpoint 3 is actually receving a message from MPI rank 0 with and tag=65537. From the receiver's point of view, the endpoint 3 is actually receiving a message from MPI rank 0 with tag=65537.} \label{fig:sendrecv} Figure \ref{fig:allreduce} illustrates how the \verb|MPI_Allreduce| function is proceeded in the endpoint interface. First of all, We perform a intra-process allreduce'' operation: source data is reduced from slave threads to the master thread via local memory transfer. Next, alm master threads call the classic \verb|MPI_Allreduce| routine. Finally, all master threads send the updated reduced data to its Next, all master threads call the classic \verb|MPI_Allreduce| routine. Finally, all master threads send the updated reduced data to its slaves via local memory transfer. endpoint interface. \section{The multi-threaded XIOS and performce results} \section{The multi-threaded XIOS and performance results} The development of endpoint interface for thread-friendly XIOS library took about one year and a half. The main difficulty is the co-existance of MPI processes and OpenMP threads. All MPI classes must be redefined in the endpoint interface along with all the routines. co-existence of MPI processes and OpenMP threads. All MPI classes must be redefined in the endpoint interface along with all the routines. The development is now available on the forge server: \url{http://forge.ipsl.jussieu.fr/ioserver/browser/XIOS/dev/branch_openmp}. One technique report is also available in which one can find more detail about how endpoint works and how the routines are implemented future with a stable version. All the funcionalities of XIOS is reserved in its thread-friendly version. Single threaded code can work successfully with the new All the functionalities of XIOS is reserved in its thread-friendly version. Single threaded code can work successfully with the new version of XIOS. For multi-threaded models, some modifications are needed in order to work with the multi-threaded XIOS library. Detail can be found in our technique report \cite{ep:2018}. Even though the multi-threaded Even though the multi-threaded XIOS library is not fully accomplished and further optimization in ongoing. We have already done some tests to see the potential of the endpoint framework. We take LMDZ as the target model and have tested with several work-flow charges. \subsection{LMDZ work-flow} In the LMDZ work-flow, we have a daily output file. We have up to 413 two-dimension variables and 187 three-dimension variables. According to user's need, we can change the output\_level'' key argument in the xml file to select the desired variables to be written. In our tests, we choose to set output\_level=2'' for a light output, and `output\_level=11'' for a full output. \subsection{CMIP6 work-flow} \begin{comment} \section{Performance of LMDZ using EP\_XIOS} histmth with daily output \section{Perspectives of EP\_XIOS} \end{comment} \section{Future works for XIOS}
Note: See TracChangeset for help on using the changeset viewer.