Introduction to Parallel Programming with MPI: Glossary

Key Points

Introduction to Parallel Computing	Many problems can be distributed across several processors and solved faster. `mpirun` runs copies of the program. The copies are distinguished by MPI rank.
Serial and Parallel Regions	Algorithms can have parallelisable and non-parallelisable sections. A highly parallel algorithm may be slower on a single processor. The theoretical maximum speed is determined by the serial sections. The other main restriction is communication speed between the processes.
MPI_Send and MPI_Recv	Use `MPI_Send` to send messages. And `MPI_Recv` to receive them. `MPI_Recv` will block the program until the message is received.
Parallel Paradigms and Parallel Algorithms	Two major paradigms, message passing and data parallel. MPI implements the Message Passing paradigm. Several standard patterns: Trivial, Queue, Master / Worker, Domain Decomposition, All-to-All.
Non-blocking Communication	Non-blocking functions allows interleaving communication and computation.
Collective Operations	Use `MPI_Barrier` for global synchronisation. All-to-All, One-to-All and All-to-One communications have efficient implementation in the library. There are functions for global reductions. Don’t write your own.
(Optional) Serial to Parallel	Start from a working serial code. Write a parallel implementation for each function or parallel region. Connect the parallel regions with a minimal amount of communication. Continuously compare with the working serial code.
(Optional) Profiling Parallel Applications	Use a profiler to find the most important functions and concentrate on those. The profiler needs to understand MPI. Your cluster probably has one. If a lot of time is spent in communication, maybe it can be rearranged.
(Optional) Do it yourself	Start from a working serial code. Write a parallel implementation for each function or parallel region. Connect the parallel regions with a minimal amount of communication. Continuously compare with the working serial code.
Tips and Best Practices

Glossary

MPI functions:

`MPI_Init`	Initialize MPI. Every rank must call in the beginning.
`MPI_Finalize`	Tear down and free memory. Every rank should call at the end.
`MPI_Comm_rank`	Find the number of the current running process.
`MPI_Comm_size`	Find the total number of ranks started by the user.
`MPI_Send`	Send data to one specified rank.
`MPI_Recv`	Receive data from one specified rank.
`MPI_Isend`	Start sending data to one specified rank.
`MPI_Irecv`	Start receiving data from one specified rank.
`MPI_Wait`	Wait for a transfer to complete.
`MPI_Test`	Check if a transfer is complete.
`MPI_Barrier`	Wait for all the ranks to arrive at this point.
`MPI_Bcast`	Send the same data to all other ranks.
`MPI_Scatter`	Send different data to all other ranks.
`MPI_Gather`	Collect data from all other ranks.
`MPI_Reduce`	Perform a reduction on data from all ranks and communicate the result to one rank.
`MPI_Allreduce`	Perform a reduction on data from all ranks and communicate the result to all ranks.

MPI Types in C

`char`	`MPI_CHAR`
`unsigned char`	`MPI_UNSIGNED_CHAR`
`signed char`	`MPI_SIGNED_CHAR`
`short`	`MPI_SHORT`
`unsigned short`	`MPI_UNSIGNED_SHORT`
`int`	`MPI_INT`
`unsigned int`	`MPI_UNSIGNED`
`long`	`MPI_LONG`
`unsigned long`	`MPI_UNSIGNED_LONG`
`float`	`MPI_FLOAT`
`double`	`MPI_DOUBLE`
`long double`	`MPI_LONG_DOUBLE`

MPI Types in Fortran

`character`	`MPI_CHARACTER`
`logical`	`MPI_LOGICAL`
`integer`	`MPI_INTEGER`
`real`	`MPI_REAL`
`double precision`	`MPI_DOUBLE_PRECISION`
`complex`	`MPI_COMPLEX`
`complex*16`	`MPI_DOUBLE_COMPLEX`
`integer*1`	`MPI_INTEGER1`
`integer*2`	`MPI_INTEGER2`
`integer*4`	`MPI_INTEGER4`
`real*2`	`MPI_REAL2`
`real*4`	`MPI_REAL4`
`real*8`	`MPI_REAL8`