#LyX file created by tex2lyx 2.3 \lyxformat 544 \begin_document \begin_header \save_transient_properties true \origin /homel/ywang/Documents/MPI_Endpoints/Note/ \textclass article \begin_preamble \usepackage{listings} \usepackage[usenames,dvipsnames,svgnames,table]{xcolor} % Title Page \title{Note for MPI Endpoints} \author{} \end_preamble \use_default_options false \maintain_unincluded_children false \language english \language_package none \inputencoding utf8 \fontencoding default \font_roman "default" "default" \font_sans "default" "default" \font_typewriter "default" "default" \font_math "auto" "auto" \font_default_family default \use_non_tex_fonts false \font_sc false \font_osf false \font_sf_scale 100 100 \font_tt_scale 100 100 \use_microtype false \use_dash_ligatures true \graphics default \default_output_format default \output_sync 0 \bibtex_command default \index_command default \paperfontsize 10 \spacing single \use_hyperref false \papersize a4paper \use_geometry false \use_package amsmath 2 \use_package amssymb 0 \use_package cancel 0 \use_package esint 1 \use_package mathdots 0 \use_package mathtools 0 \use_package mhchem 0 \use_package stackrel 0 \use_package stmaryrd 0 \use_package undertilde 0 \cite_engine basic \cite_engine_type default \biblio_style plain \use_bibtopic false \use_indices false \paperorientation portrait \suppress_date false \justification true \use_refstyle 0 \use_minted 0 \index Index \shortcut idx \color #008000 \end_index \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \paragraph_indentation default \is_math_indent 0 \math_numbering_side default \quotes_style english \dynamic_quotes 0 \papercolumns 1 \papersides 1 \paperpagestyle default \tracking_changes false \output_changes false \html_math_output 0 \html_css_as_file 0 \html_be_strict false \end_header \begin_body \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash maketitle \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash begin{abstract} \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash end{abstract} \end_layout \end_inset \end_layout \begin_layout Section Purpose \end_layout \begin_layout Standard Use threads as if they are MPI processes. Each thread will be assigned a rank and be associated with a endpoints communicator (EP_Comm). Convention: one OpenMP thread corresponds to one endpoint. \end_layout \begin_layout Section MPI Endpoints Semantics \end_layout \begin_layout Standard \align center \begin_inset Graphics filename scheme.png scale 40 \end_inset \end_layout \begin_layout Standard Endpoints are created from one MPI communicator and the number of available threads: \end_layout \begin_layout Verbatim int MPI_Comm_create_endpoints(MPI_Comm parent_comm, int num_ep, \end_layout \begin_layout Verbatim MPI_Info info, MPI_Comm out_comm_hdls[]) \end_layout \begin_layout Verbatim \end_layout \begin_layout Standard \begin_inset Quotes eld \end_inset In this collective call, a single output communicator is created, and an array of \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|num_ep| \end_layout \end_inset handles to this new communicator are returned, where the \begin_inset Formula $i^{th}$ \end_inset handle corresponds to the \begin_inset Formula $i^{th}$ \end_inset rank requested by the caller of \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Comm_create_endpoints| \end_layout \end_inset . Ranks in the output communicator are ordered sequentially and in the same order as the parent communicator. After it has been created, the output communicator behaves as a normal communicator, and MPI calls on each endpoint (i.e., communicator handle) behave as though they originated from a separate MPI process. In particular, collective calls must be made once per endpoint. \begin_inset Quotes erd \end_inset \begin_inset CommandInset citation LatexCommand cite after "" key "Dinan:2013" literal "true" \end_inset \end_layout \begin_layout Standard \begin_inset Quotes eld \end_inset Once created, endpoints behave as MPI processes. For example, all ranks in an endpoints communicator must participate in collective operations. A consequence of this semantic is that endpoints also have MPI process progress requirements; that operations on that endpoint are required to make progress only when an MPI operation (e.g. \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Test| \end_layout \end_inset ) is performed on that endpoint. This semantic enables an MPI implementation to logically separate endpoints, treat them independently within the progress engine, and eliminate synchronization in updating their state. \begin_inset Quotes erd \end_inset \begin_inset CommandInset citation LatexCommand cite after "" key "Sridharan:2014" literal "true" \end_inset \end_layout \begin_layout Section EP types \end_layout \begin_layout Subsection* MPI_Comm \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Comm| \end_layout \end_inset is composed by: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|bool is_ep| \end_layout \end_inset : true \begin_inset Formula $\implies$ \end_inset EP, false \begin_inset Formula $\implies$ \end_inset MPI classic; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int mpi_comm| \end_layout \end_inset : handle to the parent MPI communicator; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|OMPbarrier *ep_barrier| \end_layout \end_inset : openMP barrier, used for in-process synchronization and is different from \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|omp barrier| \end_layout \end_inset ; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int[2] size_rank_info[3]| \end_layout \end_inset : topology information of the current endpoint: \end_layout \begin_deeper \begin_layout Itemize rank of parent MPI process; \end_layout \begin_layout Itemize size of parent MPI communicator; \end_layout \begin_layout Itemize rank of endpoint, returned by \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Comm_rank| \end_layout \end_inset ; \end_layout \begin_layout Itemize size of EP communicator, returned by \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Comm_size| \end_layout \end_inset ; \end_layout \begin_layout Itemize in-process rank of endpoint; \end_layout \begin_layout Itemize in-process size of EP communicator, also noted as the number of endpoints in one MPI process. \end_layout \end_deeper \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Comm *comm_list| \end_layout \end_inset : pointer of the first endpoint communicator of one process; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|Message_list *message_queue| \end_layout \end_inset : location of in-coming messages for each endpoint; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|RANK_MAP *rank_map| \end_layout \end_inset : a map composed by an integer and a pair of integers. The integer key represents the rank of an endpoint. The mapped type (pair of integers) gives the in-process rank of the endpoint and the rank of its parent MPI process: \end_layout \begin_layout Verbatim rank_map->at(ep_rank)=(ep_rank_local, mpi_rank) \end_layout \begin_layout Verbatim \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|BUFFER *ep_buffer| \end_layout \end_inset : buffer (of type \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|float| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|double| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|char| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|long| \end_layout \end_inset , and \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|unsigned long| \end_layout \end_inset ) used for in-process communication. \end_layout \begin_layout Subsection MPI_Request \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Request| \end_layout \end_inset is composed by: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int mpi_request| \end_layout \end_inset : handle to the MPI request; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_datatype| \end_layout \end_inset : data type of the communication; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Comm comm| \end_layout \end_inset : handle to the EP communicator; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_src| \end_layout \end_inset : rank of the source endpoint; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_tag| \end_layout \end_inset : tag of the communication. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int type| \end_layout \end_inset : type of the communication: \end_layout \begin_deeper \begin_layout Itemize 1 \begin_inset Formula $\implies$ \end_inset non-blocking send; \end_layout \begin_layout Itemize 2 \begin_inset Formula $\implies$ \end_inset pending non-blocking receive; \end_layout \begin_layout Itemize 3 \begin_inset Formula $\implies$ \end_inset non-blocking matching receive. \end_layout \end_deeper \begin_layout Subsection MPI_Status \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Status| \end_layout \end_inset consists of: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int mpi_status| \end_layout \end_inset : handle to the MPI status; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_datatype| \end_layout \end_inset : data type of the communication; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_src| \end_layout \end_inset : rank of the source endpoint; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_tag| \end_layout \end_inset : tag of the communication. \end_layout \begin_layout Subsection MPI_Message \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Message| \end_layout \end_inset includes: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int mpi_message| \end_layout \end_inset : handle to the MPI message; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_src| \end_layout \end_inset : rank of the source endpoint; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|int ep_tag| \end_layout \end_inset : tag of the communication. \end_layout \begin_layout Standard Other types, such as \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Info| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Aint| \end_layout \end_inset , and \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Fint| \end_layout \end_inset are defined in the same way. \end_layout \begin_layout Section P2P communication \end_layout \begin_layout Standard All EP point-to-point communication use tag to distinguish the source and destination endpoint. To be able to add these extra information to tag, we require that the tag value is represented using 31 bits in the underlying MPI inmplemention. \end_layout \begin_layout Standard \begin_inset Graphics filename tag.png scale 40 \end_inset \end_layout \begin_layout Standard EP_tag is user defined. MPI_tag is internally computed and used inside MPI calls. Because of the extension of tag, wild-cards as \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_ANY_SOURCE| \end_layout \end_inset and \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_ANY_TAG| \end_layout \end_inset will not be usable directly. An extra step of tag analysis is needed which leads to the message dequeuing mechanism. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename sendrecv.png scale 50 \end_inset \end_layout \begin_layout Standard In MPI environment, each MPI process has an incoming message queue. In EP case, messages for all threads inside one MPI process are stored in this MPI queue. With the MPI 3 standard, we use the \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Improbe| \end_layout \end_inset routine to inquire the message queue and relocate the incoming message in the local message queue for the corresponding thread/endpoint. \end_layout \begin_layout Standard \begin_inset Graphics filename dequeue.png scale 30 \end_inset \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout % Any EP calls will trigger the message dequeuing and the probing, (matched-)receiving operations are performed upon the local message queue. \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % Example: \backslash verb|EP_Recv(src=2, tag=10, comm1)|: \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash begin{itemize} \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash item[1.] Dequeue MPI message queue; \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash item[2.] call \backslash verb|EP_Improb(src=2, tag=10, comm1, message)|; \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash item[3.] if find corresponding triple (src, tag, comm1), call \backslash verb|EP_Mrecv(src=2, tag=10, comm1, message)|; \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash item[4.] else, repeat from step 2. \end_layout \begin_layout Plain Layout \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout % \backslash end{itemize} \end_layout \end_inset \end_layout \begin_layout Paragraph Messages are \shape italic non-overtaking \shape default \end_layout \begin_layout Standard Incoming messages' order is important! If one thread is receiving multiple messages from the same source with the same tag. The receive order should be the same order in which the messages are sent. That is to say, the n-th sent message should be the n-th received message. \end_layout \begin_layout Paragraph Progress \end_layout \begin_layout Standard \begin_inset Quotes eld \end_inset If a pair of matching send and receives have been initiated on two processes, then at least one of these two operations will complete, independently of other actions in the system: the send operation will complete, unless the receive is satisfied by another message, and completes; the receive operation will complete, unless the message sent is consumed by another matching receive that was posted at the same destination process. \begin_inset Quotes erd \end_inset \begin_inset CommandInset citation LatexCommand cite after "" key "MPI" literal "true" \end_inset \end_layout \begin_layout Standard When one \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|EP_Irecv| \end_layout \end_inset is issued, we first dequeue the MPI incoming message queue and distribute all incoming messages to the local queues according to the destination identifier. Next, the nonblocking receive request is added at the end of the request pending list. Third, the pending list is checked and requests with matching source, tag, and communicator will be accomplished. \end_layout \begin_layout Standard Because of the importance of message order, some communication completion functions must be discussed here such as \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Test| \end_layout \end_inset and \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Wait| \end_layout \end_inset . \begin_inset Quotes eld \end_inset The functions \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Wait| \end_layout \end_inset and \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Test| \end_layout \end_inset are used to complete a nonblocking communication. The completion of a send operation indicates that the sender is now free to update the locations in the send buffer (the send operation itself leaves the content of the send buffer unchanged). It does not indicate that the message has been received, rather, it may have been buffered by the communication subsystem. However, if a synchronous mode send was used, the completion of the send operation indicates that a matching receive was initiated, and that the message will eventually be received by this matching receive. The completion of a receive operation indicates that the receive buffer contains the received message, the receiver is now free to access it, and that the status object is set. It does not indicate that the matching send operation has completed (but indicates, of course, that the send was initiated). \begin_inset Quotes erd \end_inset \begin_inset CommandInset citation LatexCommand cite after "" key "MPI" literal "true" \end_inset \end_layout \begin_layout Paragraph Example 1 \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Test(MPI_Request *request, int *flag, MPI_Status *status)| \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|request->type == 1| \end_layout \end_inset , communication to be tested is indeed issued from a non-blocking send. The completion status is returned by: \end_layout \begin_layout Verbatim MPI_Test(& request->mpi_request, flag, & status->mpi_status) \end_layout \begin_layout Verbatim \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|request->type == 2| \end_layout \end_inset , it means that a non-blocking receive is called but the corresponding message is not yet probed. The request is in the pending list thus not yet completed. All incoming message is once again probed and all pending requests are checked. If after the second check, the matching message is found, thus a \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Imrecv| \end_layout \end_inset is called and the type is set to 3. Otherwise, the type is still 2, then \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|flag = false| \end_layout \end_inset is returned. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|request->type == 3| \end_layout \end_inset , this indcates that the request is issued from a non-blocking receive call and the matching message is probed thus the status of the communication lies in the status of the \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Imrecv| \end_layout \end_inset function. The completion result is returned by: \end_layout \begin_layout Verbatim MPI_Test(& request->mpi_request, flag, & status->mpi_status) \end_layout \begin_layout Verbatim \end_layout \begin_layout Paragraph Example 2 \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Wait(MPI_Request *request, MPI_Status *status)| \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|request->type == 1| \end_layout \end_inset , communication to be tested is indeed issued from a non-blocking send. Jump to step 4. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|request->type == 2| \end_layout \end_inset , it means that a non-blocking receive is called but the corresponding message is not yet probed. The request is in the pending list thus not yet completed. We repeat the incoming message probing and the pending request checking until the matching message is found, thus a \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Imrecv| \end_layout \end_inset is called and the type is set to 3. Jump to step 4. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|request->type == 3| \end_layout \end_inset , this indcates that the request is issued from a non-blocking receive call and the matching message is probed thus the status of the communication lies in the status of the \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Imrecv| \end_layout \end_inset function. Jump to step 4. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 4. \end_layout \end_inset We force the completion by calling: \end_layout \begin_layout Verbatim MPI_Wat(& request->mpi_request, & status->mpi_status) \end_layout \begin_layout Verbatim \end_layout \begin_layout Section Collective communication \end_layout \begin_layout Standard All MPI classic collective communications are performed as the following pattern: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset Intra-process communication using OpenMP. \shape italic e.g. \shape default Collect data from slave threads to master thread. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset Inter-process communication using MPI collective calls on master threads. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset Intra-process communication using OpenMP. \shape italic e.g. \shape default Distribute data from master thread to slave threads. \end_layout \begin_layout Paragraph Example 1 \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|EP_Bcast(buffer, count, datatype, root = 4, comm)| \end_layout \end_inset with \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|comm| \end_layout \end_inset composed by 4 MPI processes and 3 threads per process: \end_layout \begin_layout Standard We can consider the communicator as \begin_inset Formula $\{\underbrace{(0,1,2)}_\textrm{proc 0} \quad \underbrace{(3,\textcolor{red}{4},5)}_\textrm{proc 1}\quad \underbrace{(6,7,8)}_\textrm{proc 2}\quad \underbrace{(9,10,11)}_\textrm{proc 3}\}$ \end_inset . \end_layout \begin_layout Standard This collective communication is performed by the following three steps: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset EP process with rank 4 send the buffer to EP process rank 3 which is a master thread. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset We call \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Bcast(buffer, count, datatype, mpi_root = 1, mpi_comm) | \end_layout \end_inset . \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset All master threads send the buffer to its slaves. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename bcast.png scale 30 \end_inset \end_layout \begin_layout Paragraph Example 2 \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|EP_Allreduce(sendbuf, recvbuf, count, datatype, op, comm)| \end_layout \end_inset with \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|comm| \end_layout \end_inset the same as in example 1. \end_layout \begin_layout Standard This collective communication is performed by the following three steps: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset We perform a intra-process \begin_inset Quotes eld \end_inset allreduce \begin_inset Quotes erd \end_inset operation: master threads collect data from its slaves and perform the reduce operation. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset Master threads call the classic \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Allreduce| \end_layout \end_inset routine. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset All master threads send the updated reduced data to its slaves. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename allreduce.png scale 30 \end_inset \end_layout \begin_layout Standard Other collective communications have the similar execution pattern. \end_layout \begin_layout Section Inter-communicator \end_layout \begin_layout Standard In XIOS, inter-communicator is an very important component. Thus, our EP library must support inter-communications. \end_layout \begin_layout Subsection The splitting of intra-communicator \end_layout \begin_layout Standard Before talking about the inter-communicator, we will start by splitting intra-communicator. The C prototype of the splitting routine is \end_layout \begin_layout Verbatim int MPI_Comm_split(MPI_Comm comm, int color, int key, \end_layout \begin_layout Verbatim MPI_Comm *newcomm) \end_layout \begin_layout Verbatim \end_layout \begin_layout Standard \begin_inset Quotes eld \end_inset This function partitions the group associated with \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|comm| \end_layout \end_inset into disjoint subgroups, one for each value of \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|color| \end_layout \end_inset . Each subgroup contains all processes of the same color. Within each subgroup, the processes are ranked in the order defined by the value of the argument \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|key| \end_layout \end_inset , with ties broken according to their rank in the old group. A new communicator is created for each subgroup and returned in \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|newcomm| \end_layout \end_inset . A process may supply the color value \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_UNDEFINED| \end_layout \end_inset , in which case \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|newcomm| \end_layout \end_inset returns \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_COMM_NULL| \end_layout \end_inset . This is a collective call, but each process is permitted to provide different values for color and key. \begin_inset Quotes erd \end_inset \begin_inset CommandInset citation LatexCommand cite after "" key "MPI" literal "true" \end_inset \end_layout \begin_layout Standard By definition of the routine, in the case of EP, each thread participating the split operation will have only one color ( \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_UNDEFINED| \end_layout \end_inset is also considered to be one color). However, in the process's point of view, it can have multiple colors as shown in the following figure. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename split.png scale 40 \end_inset \end_layout \begin_layout Standard This figure shows the result of the EP communicator splitting. Here we used the EP rank as key to assign the new rank of the thread in the resulting split intra-communicator. If the key is anything else than the EP rank, we follow the convention that the key takes effect only inside a process. This means that the threads are at first ordered by the MPI process rank and then by the value of key. \end_layout \begin_layout Standard Due to the fact that one process can have multiple colors for its threads, the splitting operation is executed by the following steps: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset Master threads collect all colors from its slaves and communicate with each other to determine the total number of colors across the communicator. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset For each color, the master thread check all its slave threads to obtain the number of threads having the same color. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset If at least one of the slave threads holds the color, then the master thread takes this color. If not, the master thread takes color \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_UNDEFINED| \end_layout \end_inset . All master threads call classic communicator splitting routine with key \begin_inset Formula $=$ \end_inset MPI rank. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 4. \end_layout \end_inset For master threads holding a defined color, we execute the endpoint creation routine according to the number of slave threads holding the same color. The resulting EP communicators are then assigned to these slave threads. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename split2.png scale 40 \end_inset \end_layout \begin_layout Subsection The creation of inter-communicator \end_layout \begin_layout Standard In XIOS, the inter-communicators are create by the routine \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Intercomm_create| \end_layout \end_inset which is used to bind two intra-communicators into an inter-communicator. The C prototype is \end_layout \begin_layout Verbatim int MPI_Intercomm_create(MPI_Comm local_comm, int local_leader, \end_layout \begin_layout Verbatim MPI_Comm peer_comm, int remote_leader, \end_layout \begin_layout Verbatim int tag, MPI_Comm *newintercomm) \end_layout \begin_layout Verbatim \end_layout \begin_layout Standard According to the MPI standard, \begin_inset Quotes eld \end_inset an inter-communication is a point-to-point communication between processes in different groups \begin_inset Quotes erd \end_inset . \begin_inset Quotes eld \end_inset All inter-communicator constructors are blocking except for \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_COMM_IDUP| \end_layout \end_inset and require that the local and remote groups be disjoint. \begin_inset Quotes erd \end_inset \end_layout \begin_layout Standard As in EP the threads are considered as processes, the non-overlapping condition can be translated to \begin_inset Quotes eld \end_inset non-overlapping \begin_inset Quotes erd \end_inset at the thread level which means that one thread can not belong to the local group and the remote group. However, the parent process of the thread can be overlapped. As the EP library is built upon an existing MPI implementation which follows the non-overlapping condition at the process level, we can have an issue in the case. \end_layout \begin_layout Standard Before digging into this issue, we shall at first look at the case where the non-overlapping condition is perfectly respected. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename intercomm.png scale 30 \end_inset \end_layout \begin_layout Standard As shown in the figure, we have two intra-communicators A and B and they are totally disjoint both at the thread and process level. Each of the communicators has a local leader. We also assume that both leaders belong to a peer communicator and have rank 4 and 9 respectively. \end_layout \begin_layout Standard To create the inter-communicator, all threads from the left intra-comm call: \end_layout \begin_layout Verbatim MPI_Intercomm_create(commA, local_leader = 2, peer_comm, \end_layout \begin_layout Verbatim remote_leader = 9, tag, inter_comm) \end_layout \begin_layout Standard and for threads of the right intra-comm, they call: \end_layout \begin_layout Verbatim MPI_Intercomm_create(commB, local_leader = 3, peer_comm, \end_layout \begin_layout Verbatim remote_leader = 4, tag, inter_comm) \end_layout \begin_layout Standard To perform the inter-communicator creation, we follow the 3 steps: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset Determine the leaders and ranks at the process level; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset Call classic \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Intercomm_create| \end_layout \end_inset ; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset Create endpoints from process and assigned to threads. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename intercomm_step.png scale 25 \end_inset \end_layout \begin_layout Standard If we have overlapped process in the creation of inter-communicator, we should add an \shape italic priority check \shape default to assign the process to only one intra-communicator. Several possibilities: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset Process is shared and contains no local leader \begin_inset Formula $\implies$ \end_inset process belongs to group with higher rank in peer comm; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset Process is shared and contains one local leader \begin_inset Formula $\implies$ \end_inset process belongs to group with the leader; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset Process is shared and contains both local leaders : leader change is performed and the peer communicator is \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_COMM_WORLD| \end_layout \end_inset and we note \begin_inset Quotes eld \end_inset group A \begin_inset Quotes erd \end_inset the group with smaller peer rank and \begin_inset Quotes eld \end_inset group B \begin_inset Quotes erd \end_inset the group with higher peer rank. \end_layout \begin_deeper \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3a. \end_layout \end_inset If group A has at least two processes, the leader of group A is changed to the master thread of the process with smallest rank except the overlapped process. The overlapped process belongs to group B. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3b. \end_layout \end_inset If group A has only one processes, and group B has at least two processes, then the leader of group B is changed to the master thread of the process with smallest rank except the overlapped process. The overlapped process belongs to group A. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3c. \end_layout \end_inset If both group A and group B have only one process, then an one-process intra-communicator is created though it will be considered (labeled) as an inter-communicator. \end_layout \end_deeper \begin_layout Standard \align center \begin_inset Graphics filename intercomm2.png scale 25 \end_inset \end_layout \begin_layout Subsection The merge of inter-communicators \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Intercomm_Merge(MPI_Comm intercomm, int high, MPI_Comm *newintracomm)| \end_layout \end_inset creates an intra-communicator by merging the local and remote groups of an inter-communicator. All processes should provide the same \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|high| \end_layout \end_inset value within each of the two groups. If processes in one group provided the value \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|high=false| \end_layout \end_inset and processes in the other group provided the value \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|high=true| \end_layout \end_inset then the union orders the “low” group before the “high” group. If all processes provided the same high argument then the order of the union is arbitrary. This call is blocking and collective within the union of the two groups. \begin_inset CommandInset citation LatexCommand cite after "" key "MPI" literal "true" \end_inset \end_layout \begin_layout Standard This routine can be considered as the inverse of \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Intercomm_create| \end_layout \end_inset . In the intercommunicator create function, all 5 cases are eventually transformed into the case where no MPI process is shared by two groups. It is from this case that the merge funtion takes place. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset The classic \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Intercomm_merge| \end_layout \end_inset is called and an MPI intracommunicator is created from the two disjoint groups and MPI processes are ordered by the high value of the local leader. \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset Endpoints are created based on the MPI intracommunicator and the new EP ranks are orderd firstly according to the high value of each thread and then to the origianl EP ranks in the intercommunicators. \end_layout \begin_layout Standard \align center \begin_inset Graphics filename merge.png scale 25 \end_inset \end_layout \begin_layout Section P2P communication on inter-communicators \end_layout \begin_layout Standard In case of the intercommunicators, the \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Comm| \end_layout \end_inset class has 3 members to determine the topology along with the original \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|rank_map| \end_layout \end_inset : \end_layout \begin_layout Itemize \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|RANK_MAP local_rank_map[size of commA]| \end_layout \end_inset : composed of the EP rank in commA' or commB'; \end_layout \begin_layout Itemize \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|RANK_MAP remote_rank_map[size of commB]| \end_layout \end_inset : = \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|local_rank_map| \end_layout \end_inset of remote group; \end_layout \begin_layout Itemize \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|RANK_MAP intercomm_rank_map[size of commB']| \end_layout \end_inset : = \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|rank_map| \end_layout \end_inset of remote group'; \end_layout \begin_layout Itemize \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|RANK_MAP rank_map| \end_layout \end_inset : rank map of commA' or commB'. \end_layout \begin_layout Standard For example, in the following configuration: \end_layout \begin_layout Standard \align center \begin_inset Graphics filename ranks.png scale 30 \end_inset \end_layout \begin_layout Standard For all endpoints in commA, \end_layout \begin_layout Verbatim local_rank_map={(rank in commA' or commB', \end_layout \begin_layout Verbatim rank of leader in MPI_Comm_world)} \end_layout \begin_layout Verbatim ={(1,0), (0,1), (2,1), (4,1)} \end_layout \begin_layout Verbatim \end_layout \begin_layout Verbatim remote_rank_map={(remote endpoints' rank in commA' or commB', \end_layout \begin_layout Verbatim rank of remote leader in MPI_Comm_world)} \end_layout \begin_layout Verbatim ={(0,0), (1,1), (3,1), (5,1)} \end_layout \begin_layout Standard For all endpoints in commA' \end_layout \begin_layout Verbatim intercomm_rank_map={(remote endpoints local rank in commA' or commB', \end_layout \begin_layout Verbatim remote endpoints MPI rank in commA' or commB')} \end_layout \begin_layout Verbatim ={(0,0), (1,0)} \end_layout \begin_layout Verbatim rank_map={(local rank in commA', mpi rank in commA')} \end_layout \begin_layout Verbatim ={(0,0), (1,0), (0,1), (1,1), (0,2), (1,2)} \end_layout \begin_layout Standard For all endpoints in comm B, \end_layout \begin_layout Verbatim local_rank_map={(rank in commA' or commB', \end_layout \begin_layout Verbatim rank of leader in MPI_Comm_world)} \end_layout \begin_layout Verbatim ={(0,0), (1,1), (3,1), (5,1)} \end_layout \begin_layout Verbatim \end_layout \begin_layout Verbatim remote_rank_map={(remote endpoints' rank in commA' or commB', \end_layout \begin_layout Verbatim rank of remote leader in MPI_Comm_world)} \end_layout \begin_layout Verbatim ={(1,0), (0,1), (2,1), (4,1)} \end_layout \begin_layout Standard For all endpoints in commB' \end_layout \begin_layout Verbatim intercomm_rank_map={(remote endpoints local rank in commA' or commB', \end_layout \begin_layout Verbatim remote endpoints MPI rank in commA' or commB')} \end_layout \begin_layout Verbatim ={(0,0), (1,0), (0,1), (1,1), (0,2), (1,2)} \end_layout \begin_layout Verbatim rank_map={(local rank in commB', mpi rank in commB')} \end_layout \begin_layout Verbatim ={(0,0), (1,0)} \end_layout \begin_layout Standard When calling a p2p communication on an inter-communicator, we should: \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 1. \end_layout \end_inset Determine if the source and the destination endpoints are in a same group by checking the \begin_inset Quotes eld \end_inset labels \begin_inset Quotes erd \end_inset . \end_layout \begin_deeper \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|src_label = local_rank_map->at(src).second| \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|dest_label = remote_rank_map->at(dest).second| \end_layout \end_inset \end_layout \end_deeper \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 2. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|src_label == dest_label| \end_layout \end_inset , then the communication is in fact a intra-communication. The new source rank and destination rank, as well as the local ranks, are deduced by: \end_layout \begin_layout Verbatim src_rank = local_rank_map->at(src).first \end_layout \begin_layout Verbatim dest_rank = remote_rank_map->at(dest).first \end_layout \begin_layout Verbatim src_rank_local = rank_map->at(src_rank).first \end_layout \begin_layout Verbatim dest_rank_local = rank_map->at(dest_rank).first \end_layout \begin_layout Verbatim \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 3. \end_layout \end_inset If \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|src_label != dest_label| \end_layout \end_inset , then the inter-communication is required. The new ranks are obtained by: \end_layout \begin_layout Verbatim src_rank = local_rank_map->at(src).first \end_layout \begin_layout Verbatim dest_rank = remote_rank_map->at(dest).first \end_layout \begin_layout Verbatim src_rank_local = intercomm_rank_map->at(src_rank).first \end_layout \begin_layout Verbatim dest_rank_local = rank_map->at(dest_rank).first \end_layout \begin_layout Verbatim \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout 4. \end_layout \end_inset Call MPI P2P function to start the communication. \end_layout \begin_deeper \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset If intra-communication, \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|mpi_comm = commA'_mpi or commB'_mpi| \end_layout \end_inset ; \end_layout \begin_layout Itemize \begin_inset Argument item:1 status open \begin_layout Plain Layout \begin_inset Formula $\bullet$ \end_inset \end_layout \end_inset If inter-communication, \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|mpi_comm = inter_comm_mpi| \end_layout \end_inset . \end_layout \end_deeper \begin_layout Standard \align center \begin_inset Graphics filename sendrecv2.png scale 30 \end_inset \end_layout \begin_layout Section One-sided communications \end_layout \begin_layout Standard The one-sided communication is a type of communcation which involves only one process to specify all communication parameters, both for the sending side and the receiving side \begin_inset CommandInset citation LatexCommand cite after "Chapter~11" key "MPI" literal "true" \end_inset . To extend this type of communication in the context of endpoints, we encounter some limitations. In the current work, the one-sided communication can only be used in the client-server mode which means that RMA(remote memory access) can occur only between a server and a client. \end_layout \begin_layout Standard The construction of RMA windows is illustrated by the following figure: \end_layout \begin_layout Standard \align center \begin_inset Graphics filename RMA_schema.pdf scale 50 \end_inset \end_layout \begin_layout Itemize we determin the max number of threads N in the endpoint environment (N=3 in the example); \end_layout \begin_layout Itemize on the server side, N windows are declared and asociated with the same memory adress; \end_layout \begin_layout Itemize we start a loop : i = 0, ..., N-1 \end_layout \begin_deeper \begin_layout Itemize each endpoint with thread number i declares an RMA window; \end_layout \begin_layout Itemize the link between windows on the client side and the i-th window on the server side are created via \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Win_created| \end_layout \end_inset ; \end_layout \begin_layout Itemize if the number of threads on a certain process is less than N, then a \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|NULL| \end_layout \end_inset pointer is used as memory adress. \end_layout \end_deeper \begin_layout Standard With the RMA windows created, we can then perform some communications: \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Put| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Get| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Accumulate| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Get_accumulate| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Fetch_and_op| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Compare_and_swap| \end_layout \end_inset , \shape italic etc \shape default . \end_layout \begin_layout Standard The main idea of any of the mentioned communications is to identify the threads which are involved in the connection. For example, we want to perform a put operation from EP 2 to the server. We know that EP 2 is the thread 0 of process 1. Thus the 0-th window (win A) of the server side should be used. Once the sender and the receiver are identified, the \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Put| \end_layout \end_inset communication can be established. \end_layout \begin_layout Standard Other RMA functions, such as \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Win_allocate| \end_layout \end_inset , \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_win_Fence| \end_layout \end_inset , and \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash verb|MPI_Win_free| \end_layout \end_inset , remain nearly the same and we will skip the detail in this document. \end_layout \begin_layout Standard \begin_inset CommandInset bibtex LatexCommand bibtex bibfiles "reference" options "plain" \end_inset \end_layout \end_body \end_document