CCF (Core Computational Facility) @ UQ run by ITS / SMP
[Home] [why HPC?] [UQ Resources] [getafix] [dogmatix] [asterix] [ghost] [contacts/help]
howto: [data] [Slurm] [OpenMP/MKL] [Nvidia GPU] [Intel Phi] [MPI] [matlab] [mathematica] [FAQ]
Howto - Run MPI Calculations
This is not a whyto/howto parallelisation. There are many reasons to do this. MPI (Message Passing Interface) is a simple/as-complicated-as-you-can-imagine protocol that can be added into computer programs to enable multiple calculations to talk and share data between each other.
The easiest way to set (and change) which MPI-enabled compiler/launcher you are using on linux machines is to use environment-modules. Most *nix/HPC systems use environment-modules these days.
MPI Setup - quick method
The first thing you need to do on the SMP systems is to find an MPI-capable C or C++ or fortran compiler.
Generally called mpicc
or mpicxx
or mpif90
.
module list
module avail
locate mpicc
locate mpirun
With modules the way to compile MPI code is to load the compiler then the MPI application. eg. use the default system GCC compiler with Open MPI
module purge
module load mpi/openmpi-x86_64
which mpicc
mpicc -v
which mpirun
mpirun --version
ompi_info
which will show you what compiler and OpenMPI version has been loaded.
To compile your code then
mpif90 mycode.f90
mpicc mycode.c
mpicxx mycode.cpp
you will need to add the same module load
commands to your slurm batch file.
After compiling you then use the related launcher mpirun
(or mpiexec
).
Running MPI code - interactively
So you've been able to write/compile your codes with mpicc
etc.
In general once you've added the correct MPI module
,
you will already have in your path the MPI launcher mpirun
(, mpiexec
and orterun
are all synonyms for each other).
For example, to debug a calculation with 8 processors on a standalone machine:
srun --nodes=1 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=1 --mem=4GB --pty bash
module load mpi/openmpi-x86_64
mpirun ./yourmpiprogram.exe
exit
Note that doing the following might confuse slurm
mpirun -n ${SLURM_NPROCS} ./yourmpiprogram.exe
and result in being allocated with the wrong number of tasks on a node/s.
(although circa 2020 this works on some nodes and the other one does not! - MB. Why???)
Running MPI code - batch
To run on getafix or dogmatix requires you to submit a Slurm job such as that
The following is a sample slurm script ( goslurm.sh )
which for 4 MPI processes (and OpenMP turned off) you might need:
#!/bin/bash
#SBATCH --partition=smp
#SBATCH --job-name=go1x4x4x1
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
echo "This is job '$SLURM_JOB_NAME' (id: $SLURM_JOB_ID) running on the following nodes:"
echo $SLURM_NODELIST
echo "running with OMP_NUM_THREADS= $OMP_NUM_THREADS "
echo "running with SLURM_TASKS_PER_NODE= $SLURM_TASKS_PER_NODE "
echo "running with SLURM_NPROCS= $SLURM_NPROCS "
module load mpi/openmpi-x86_64
time mpirun ./yourmpiprogram.exe
### the following might fix OR break the slurm scheduler...
## time mpirun -n ${SLURM_NPROCS} ./yourmpiprogram.exe
Word of warning: mixing threads
Note that you could potentially link your mpicc
or mpic++
or mpif90
compiled code
against the Intel MKL library. Or your code might have threading/OpenMP enabled.
In these cases running with MPI is dangerous, in that, you must request as many
slurm resources (cores) as your MPI system runs on when each MPI process may spawn
multiple threads.
MPI Versions - open source
Both getafix and dogmatix are setup for MPI-based calculations, and both connect to the same ethernet network switches (min 10 Gb/s, 40 Gb/s interconnects).
There are multiple types of open-source implementations of the MPI protocol
on SMP systems (MPI-3.1 is the latest standard). There is:
-
Open MPI which has incongruous version numbering,
eg. even v1.X implements full MPI-3 standards. Instructions to load the CentOS defaults are
module load mpi/openmpi-x86_64
We also have modules for Open MPI v3.X updates that are built against newer GNU compiler available via
module load gnu openmpi3_eth
and to use the intel compiler with Open MPI v3.X, eg.
module load intel openmpi3_eth
or you can also load older versions via
openmpi2_eth
(for v2.X) oropenmpi_eth
(for v1.X).
Note that to compile with either compilers just usempicc
,mpicxx
,mpif90
, etc which are automatically going to be in your path. (on other systems, eg. goliath you can usempigcc
ormpigxx
for GNU compilers, ormpiicc
ormpiicpc
for Intel.) -
MVAPICH is also on our systems,
which is supposed to be optimised for faster networks,
and possibly for running MPI on Intel Xeon Phi co-processors.
Note that MVAPICH2 implements full MPI-3 standards.
module load gnu mvapich2_eth
or with the intel compiler
module load intel mvapich2_eth
Do not load one of the following:
module load mpi/mvapich2-x86_64
are the same modules... they enable
module load mpi/mvapich2-2.0-x86_64
/usr/lib64/mvapich2/bin/mpirun
but nompicc
. -
MPICH (V3.0 installed). Load one of the following:
module load gnu mpi/mpich-x86_64
which are the same modules... to enable CentOS default
module load gnu mpi/mpich-3.0-x86_64
/usr/lib64/mpich/bin/mpirun
and/usr/lib64/mpich/bin/mpicc
. Note that on our systems there are/opt/mpich3/gnu/bin/mpicc
compiler and/opt/mpich3/gnu/bin/mpirun
... how to load them with modules though???
-
MPICH during V2.X was called MPICH2.
ie. some systems may have references to both (older systems may also have implementations of MPICH V1.X).
This is no longer on getafix, although dogmatix has
/opt/mpich2/gnu/bin/mpirun
... how to load that with current modules though???
MPI Versions - closed source
There are also closed-source implementations of the MPI protocol:
-
Intel MPI library and the Intel compiler 18.0.1.163 are installed:
module load intel intelmpi
To also build with the Intel Math Kernel Library which has built in OpenMP parallelism
module load intel mkl intelmpi
(see https://software.intel.com/en-us/intel-parallel-studio-xe for comparison of Intel products). Note that to compile with just use
mpicc
,mpicxx
,mpif90
, etc which are automatically going to be in your path. (on other systems, eg. goliath you can usempigcc
ormpigxx
for GNU compilers, ormpiicc
ormpiicpc
for Intel.) -
On dogmatix we also have Oracle Message Passing Toolkit (Formerly known as Sun HPC ClusterTools).
http://www.oracle.com/technetwork/documentation/hpc-clustertools-193010.html
This is installed in
/opt/SUNWhpc/HPC8.2.1/
directory... (Need to add some reasons to use this...) (Need to add some instructions for this below...)
Installing the MPI compilers and environment-modules - other linux systems
On your own linux system/s you can install MPI implementations, eg.
sudo dnf install environment-modules openmpi openmpi-devel blacs-openmpi scalapack-openmpi mpich mpich-devel mpich-doc blacs-mpich scalapack-mpich
note that openmpi is daemon free, just requires passwordless ssh between machines.
To run mpich on your own machines needs a daemon via:
mpi -d &
There used to be a distinct package called mpich2
etc, which still exists in some older repositories.
The latest mpich
etc packages actually implement up to MPI-3 and so are the default now.
Compiling MPI code - other linux systems
To see which module to use:
[user@linux ~]$ module list
No Modulefiles Currently Loaded.
[user@linux ~]$ module avail
--------------------- /etc/modulefiles ---------------------------
mpi/mpich-x86_64
---------------------------- /usr/share/modulefiles --------------
mpi/openmpi-x86_64
and then load one of them
[user@linux ~]$ module load mpi/openmpi-x86_64
Note that this can also be added to your
[user@linux ~]$ module load mpi/mpich-x86_64
~/.bashrc
to automatically load at login.
Now you should be able to compile your MPI-capable C
or fortran
code.
For example, to run the calculation with 8 processors on a standalone linux machine:
mpirun -np 8 ./a.out
If you use the machinefile options, eg
mpirun -np 8 -npernode 1 --host ...
you could run these
calcs across, eg. multiple linux machines.
[Note that adding -np option works for standalone systems, but might break the slurm MPI scheduling if using slurm/scripts]
This page last updated 3rd July 2020. [Contacts/help]