Introduction to Parallel Computing
|
Many problems can be distributed across several processors and solved faster.
mpirun runs copies of the program.
The copies are distinguished by MPI rank.
|
Serial and Parallel Regions
|
Algorithms can have parallelisable and non-parallelisable sections.
A highly parallel algorithm may be slower on a single processor.
The theoretical maximum speed is determined by the serial sections.
The other main restriction is communication speed between the processes.
|
MPI_Send and MPI_Recv
|
Use MPI_Send to send messages.
And MPI_Recv to receive them.
MPI_Recv will block the program until the message is received.
|
Parallel Paradigms and Parallel Algorithms
|
Two major paradigms, message passing and data parallel.
MPI implements the Message Passing paradigm.
Several standard patterns: Trivial, Queue, Master / Worker, Domain Decomposition, All-to-All.
|
Non-blocking Communication
|
|
Collective Operations
|
Use MPI_Barrier for global synchronisation.
All-to-All, One-to-All and All-to-One communications have efficient implementation in the library.
There are functions for global reductions. Don’t write your own.
|
(Optional) Serial to Parallel
|
Start from a working serial code.
Write a parallel implementation for each function or parallel region.
Connect the parallel regions with a minimal amount of communication.
Continuously compare with the working serial code.
|
(Optional) Profiling Parallel Applications
|
Use a profiler to find the most important functions and concentrate on those.
The profiler needs to understand MPI. Your cluster probably has one.
If a lot of time is spent in communication, maybe it can be rearranged.
|
(Optional) Do it yourself
|
Start from a working serial code.
Write a parallel implementation for each function or parallel region.
Connect the parallel regions with a minimal amount of communication.
Continuously compare with the working serial code.
|
Tips and Best Practices
|
|