Message Passing Interface (MPI)

The Message Passing Interface (MPI) is a library specification that allows HPC to pass information between its various nodes and clusters. HPC uses MPICH, an open-source, portable implementation of the MPI standard. MPICH contains a complete implementation of version 1.2 of the MPI standard and also significant parts of MPI-2, particularly in the area of parallel I/O. MPICH is both a runtime and compile-time environment for MPI-compliant code.

MPICH-MX is MPICH ported by Myricom for use on its MX transport for its Myrinet networking architecture. For more information on MPICH-MX,  visit www.myricom.com/support/downloads/mx/mpich-mx.html.

Transport Device

MPICH provides an abstraction layer for inter-process communication, regardless of whether the operating environment is a single, massively parallelized shared-memory machine or a cluster of distributed machines. Different operating environments use different devices for communication, and each device functions as an underlying transport layer for message passing. Examples of these devices are p4 for clusters of machines on normal TCP/IP networks, mx for clusters of machines on Myrinet networks, shmem for shared memory SMP machines, and p4shmem for clusters of SMP machines on TCP/IP networks.

HPC’s Linux builds have support for mx and p4 devices. The mx device is Myrinet’s high-speed transport layer and is available on all HPC nodes.

Compilers

MPICH supports several different compiler sets, but the user must specify which they wish to use for each job.

MPICH supports C, C++, F77, and F90. However not all compiler sets support all languages. Portland Group’s CDK and Intel support all four languages. KCC supports only C++. Absoft supports F77 and F90. GNU supports C, C++, and F77.

MPICH is tightly integrated with the operating system and the compiler set being used.

HPC’s version of MPICH was built with the following sets of compilers:

  • GNU32 – Open source compiler from the GNUproject (3.4.3)
  • PGI – Portland Group’s “Cluster Development Kit” (6.0-2)
  • INTEL – Intel’s C++ and Fortran compilers (8.1)

Bindings

Each transport device and compiler set is called a binding and each one is very distinct from the others. You can only use one binding at a time. Care must be taken to ensure that the same binding is used at both compile-time and runtime.

Each binding is installed into:

/usr/usc/mpich/version/device-compiler[-subarch]/

On Linux, the following bindings exist:

  •   mx-gnu34 – mx with GNU 3.4
  •   mx-gnu4  – mx with GNU 4
  •   mx-intel – mx with Intel compilers
  •   mx-pgi   – mx with PGI compilers (default)
  •   p4-gnu32 – p4 with GNU 3.4.3
  •   p4-intel – p4 with Intel compilers

MPICH Tutorials

Setting Up Your Job

With so many device and compiler combinations, setting up your environment can seem complicated. HPC has defined preferred bindings to simplify your environment initialization. The preferred bindings work well on almaak and the Linux cluster; however, you are free to choose the best setup for your needs.

For most ITS software packages, it is recommended that you use the default link, which points to the default version of the software. For MPICH, this is not recommended. Since changing the default link after installing a newer version of MPICH will always break currently running or waiting jobs, it is recommended that you always use the path that is associated with that version. For compatibility, and to inform users of ITS’s intent for the versioning of its software applications, default, old, and new links are installed. See the mpiexec note below for a work-around.

You should look at the links in /usr/usc/mpich to find the appropriate version. Next, choose a specific binding from those available in that version. You should then use the most specific link in your .profile if you use the bash (or sh) shell, or .login if  you use the csh (or tcsh) shell.

To add the preferred device/compiler sets to your environment automatically, add the following to your .login:

if ( -e /usr/usc/mpich/1.2.6..14a/setup.csh) then 
source /usr/usc/mpich/1.2.6..14a/setup.csh 
endif

If you use the bash or sh shell, add the following lines to your .profile:

if [ -e /usr/usc/mpich/1.2.6..14a/setup.sh ]; then 
source /usr/usc/mpich/1.2.6..14a/setup.sh
fi

Some examples of other available setup files are:

/usr/usc/mpich/1.2.6..14a/gm-intel-P4/setup.csh
/usr/usc/mpich/1.2.6..14a/gm-cdk/setup.csh
/usr/usc/mpich/1.2.6..14a/p4-gnu/setup.csh
/usr/usc/mpich/1.2.6..14a/shmem-spac/setup.csh

Compiling Your Job

MPICH includes wrapper scripts designed to hide the details of compiling with any particular build or compiler. Simply source an MPICH build into your shell environment and use mpicc, mpiCC, mpif77, or mpif90 to compile your code. The arguments to the script are passed to the compiler and interpreted normally. The wrapper script will link in the correct libraries and add rpaths as necessary.

Running Your Job

MPICH includes a wrapper script called mpirun(1) to hide the details of getting your program to run. To use mpirun, it is critical that the exact same MPICH build that was used to build the program is also used to run it (see the mpiexec note below to get around this restriction). After sourcing the correct MPICH build, mpirun requires a few arguments to tell it which nodes to run on. The two most common options are -machinefile and -np. Under PBS or TORQUE, simply pass $PBS_NODEFILE to -machinefile. The number of processors will need to be passed to -np.

Under PBS or TORQUE, the easiest way to get the value for -np is:

cat $PBS_NODEFILE | wc -l

With the 1.2.6..14a and later builds of MPICH, you do not have to source setup files before running mpiexec. This greatly simplifies the job submission scripts and gets around the need to define a MPICH build. Users can source any MPICH build in their shell’s initialization script.

mpiexec

Linux cluster users are encouraged to use mpiexec(1) instead of mpirun. Using mpiexec is far simpler because it senses the correct transport layer and integrates the directory into TORQUE, the job manager on the cluster. By default, mpiexec will use $PBS_NODEFILE and run one process on every allocated processor. See the mpiexec(1) man page for more information on mpiexec.

Math Libraries

Several math-related libraries are included in each MPICH build for easy linking. At this time, goto (an optimized BLAS), mpiblacs, and scalapack are available. Simply add -llibraryname to your compile commands to link them into your executable file. The various MPICH builds know where to find the libraries.

If you are linking with goto, also link in xerblac (if you are using C code) or xerblaf (if you are using Fortran code). Likewise, mpiblacs users should link in mpiblacsCinit or mpiblacsF77init.

Users of goto may choose from have several versions. Options such as -lgoto, -lgoto_p4, -lgoto_p3, -lgoto_p4_512, and others may be available. Please look at /usr/usc/math/1.2.6..14a/<binding>/lib/ for specific options.

These libraries have not been thoroughly tested on every MPICH build. If you encounter any problems with these libraries, please send an email to hpc@usc.edu.

Getting Help

Additional information about MPI can be found on the Official Message Passing Interface (MPI) website. To get help with using MPI on HPC, send an email to hpc@usc.edu.