Parallel MATLAB (R2018)

The following documentation assumes you are using release R2018a or R2018b. We recommend that you use R2018 or higher, going forward. If you must use R2016b, please see the README file in /usr/usc/matlab/R2016b/parallel_scripts/matlab-slurm-master/README. R2017 releases have not been configured to er under Slurm on HPC. Please send email to hpc@usc.edu if you need to use another version.

HPC maintains software licenses for MATLAB’s Parallel Computing Toolbox™ (PCT) and Parallel Server (formerly named the Distributed Computing Server). PCT has support for high-level constructs like parallel for-loops and special array types that let you parallelize code without CUDA or MPI programming. PCT supports [single node] multicore and GPU processing. The Parallel Server lets you do parallel and distributed computation on multi-node clusters. It includes a built-in job scheduler that has been integrated with HPC’s Slurm resource manager.

Cluster Configuration

To run MATLAB on HPC, you will first create a Cluster Profile. There are two types of clusters, LOCAL and REMOTE. A local cluster is sufficient for a single-node job (think of running a multi-core program on your personal computer). A remote cluster is necessary when you want to run MATLAB across multiple compute nodes.

Local Cluster (Single Node)

To create a cluster object for a single-node job, add the following lines to your script.

cluster=parallel.cluster.Local();
cluster.JobStorageLocation='storage_dir';
[a,b]=evalc('system(''nproc --all'')');
cluster.NumWorkers=str2num(a);

Where storage_dir is the path to the directory you have created for temporary job storage. We recommend that you make this directory at the top level of your project directory, e.g., /home/rcf-proj/tt1/trojan/matlabJobStorage.

Remote Cluster (Multi-Node)

To create a cluster object for a multi-node job, add the following lines to your script. Change the Slurm options as desired.

cluster = parallel.cluster.Generic;
set(cluster,'JobStorageLocation', 'storage_dir');
set(cluster,'HasSharedFilesystem', true);
set(cluster,'IntegrationScriptsLocation','scripts_dir');
cluster.AdditionalProperties.SlurmArgs='sbatch_args';

Where storage_dir, scripts_dir, and sbatch_args are defined below.

storage_dir Path to directory for temporary job storage. Create at the top level of your project directory, e.g., /home/rcf-proj/tt1/ttrojan/matlabJobStorage
scripts_dir Path to slurm integration scripts, of the form /usr/usc/matlab/version/SlurmIntegrationScripts
sbatch_args A string containing Slurm SBATCH arguments to pass to the job scheduler. For example:

  ––partition=<queue_name>
  ––account=<account_name>
  ––constraint=<constraint>

Do not include processor requests like ––ntasks and ––nodes, MATLAB will do this for you.

By configuring a cluster profile in this way, MATLAB will submit a job to Slurm on your behalf when you start up a pool of workers. Slurm integration is supported using MATLAB scripts located in /usr/usc/matlab/<version>/SlurmIntegrationScripts/, starting with version R2018a.

Starting a Pool of Workers (Labs)

Once a cluster profile has been configured, the process for starting up a pool of workers for both a LOCAL and REMOTE cluster is the same. MATLAB refers to its workers as “labs”; the command “labindex” returns the indexes of the workers.

Assuming you have created a cluster object named cluster you can start up a pool of workers using the parpool command.

pool=parpool(cluster,N)

Where N is the number of workers to use. If starting a LOCAL cluster, N must be at least 1 less than the total number of available CPUs because MATLAB needs one CPU to delegate work.

If using a REMOTE cluster, MATLAB will submit a job to the job scheduler that will request enough resources for N workers.

Make sure that you close your parallel pool when you are done with it.

delete(pool)

You may receive warnings about improper timezone formatting when you start a pool. These can be safely ignored. To suppress the warnings, add the following line to your job script.

export TZ=America/Los_Angeles

You can use the cluster.Jobs function to display the status of your jobs.

...
>> cluster.NumWorkers=8;
>> pool=parpool(cluster, 7)
Starting parallel pool (parpool) ...
connected to 7 workers.
...
>> cluster.Jobs
...
         ID           Type        State   FinishDateTime  Username  Tasks
       -----------------------------------------------------------------
    1     2           pool      running                   ttrojan     31
    2     3     concurrent      running                   ttrojan      7

Monitoring Your Job

MATLAB will submit a job on your behalf when using the REMOTE cluster. You can track its progress like you would with any other job you submit with the command squeue -u username. If you are on a head node, you can use the HPC squeue wrapper myqueue. Note that the job name is generated by MATLAB.

$ squeue -u ttrojan
JOBID   PARTITION     NAME  USER     ST    TIME  NODES  NODELIST(REASON)
2767401  scavenge     Job1  ttrojan  R     0:58      3  hpc[0681-0683]

$ myqueue
JOBID    USER  ACCOUNT  PARTITION  NAME  TASKS  CPUS_PER_TASK  MIN_MEMORY  START_TIME           TIME  TIME_LIMIT  STATE    NODELIST(REASON)
2767401  ttrojan  lc_tt1  scavenge   Job1  24     1              1G          2019-02-26T10:17:19  1:20  1:00:00     RUNNING  hpc[0681-0683]

To check job information after a job completes you can use Slurm’s sacct command.

$ sacct -j <job_id> --format=account,partition%10,jobname%20,state,exitcode%4,elapsed%10,start,ntasks%4,nnodes%4,reqcpus,reqmem%6,maxrss%10,maxvmsize%10,nodelist
   Account  Partition              JobName      State Exit    Elapsed               Start NTas NNod  ReqCPUS ReqMem     MaxRSS  MaxVMSize        NodeList 
---------- ---------- -------------------- ---------- ---- ---------- ------------------- ---- ---- -------- ------ ---------- ---------- --------------- 
   lc_tt1      quick      matlab_launcher  COMPLETED  0:0   00:01:26 2018-12-17T16:58:34         2       32    1Gc                        hpc[1119,1411] 
   lc_tt1                           batch  COMPLETED  0:0   00:01:26 2018-12-17T16:58:34    1    1       24    1Gc    600060K    295556K         hpc1119 
   lc_tt1                          extern  COMPLETED  0:0   00:01:26 2018-12-17T16:58:34    2    2       32    1Gc          0    107952K  hpc[1119,1411] 

Where <job_id> is the job Id from Slurm.

HPC Helper Scripts

HPC has created two helper scripts, get_LOCAL_cluster.m and get_SLURM_cluster.m, that are described here. You may find it more convenient to use these functions instead of having similar lines of code repeated in your programs.

Single Node Cluster

To use MATLAB on a single node (with multiple cores), create the following file, get_LOCAL_cluster.m, or copy it from /home/rcf-proj/workshop/matlab/get_LOCAL_cluster.m to your working directory.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% get_LOCAL_cluster(storage_dir)
% HPC helper function to create a single node MATLAB cluster under Slurm.
% Place this file in MATLAB's search path and call it from your program.
% Or place the lines of the function directly within your program.
%
% Arguments:
%  storage_dir: The path to an *existing* directory where MATLAB can store
%               files, e.g., "/home/rcf-proj/tt/ttrojan/matlab_storage".
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function cluster = get_LOCAL_cluster(storage_dir)
   cluster=parallel.cluster.Local();
   cluster.JobStorageLocation=storage_dir;
   [a,b]=evalc('system(''nproc --all'')');
   cluster.NumWorkers=str2num(a);
end
Multi-Node Cluster

To use MATLAB across multiple nodes, create the following file, get_SLURM_cluster.m, or copy it from /home/rcf-proj/workshop/matlab/get_SLURM_cluster.m to your working directory.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% get_SLURM_cluster(storage_dir,scripts_dir,sbatch_args)
% HPC helper function to create a multi-node MATLAB cluster under Slurm.
% Place this file in MATLAB's search path and call it from your program.
% Or place the lines of the function directly within your program.
%
% Arguments:
%  storage_dir: The path to an *existing* directory where MATLAB can store
%               files, e.g., "/home/rcf-proj/tt/ttrojan/matlab_storage".
%  scripts_dir: The path to the /usr/usc/matlab//SlurmIntegrationScripts
%               directory, e.g., "/usr/usc/matlab/R2018a/SlurmIntegrationScripts".
%  sbatch_arg:  A string to pass to slurm with all slurm options *except*
%               ntasks, e.g., '--time=12:00:00 --partition=scec --mem-per-cpu=2G'.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function cluster = get_SLURM_cluster(storage_dir,scripts_dir,sbatch_args)
   cluster = parallel.cluster.Generic;
   set(cluster,'JobStorageLocation', storage_dir);
   set(cluster,'HasSharedFilesystem', true);
   set(cluster,'IntegrationScriptsLocation',scripts_dir);
   cluster.AdditionalProperties.SlurmArgs=sbatch_args;
end

HPC Example Scripts

There are a number of example scripts in /home/rcf-proj/workshop/matlab/ that have been tested under 2018a. Copy these to your working directory to use. An existing, writable storage directory must be provided for MATLAB. Create the directory and then edit the examples, replacing the test storage directory with your own.

  • submit.slurm – a slurm script to batch process the examples
  • estimatePi.m – example that calculates PI (test with increasing number of cores)
  • labIndex.m – simple labindex example (labindex is the number of a worker)
  • broadcastReceive.m – example of SPMD, labindex, and MPI communication
  • submitTasks.m – example of creating job tasks and running them in parallel

You can test the examples interactively or use sbatch submit.slurm to run them remotely. Edit the submit.slurm script and select the example you would like to run. Change the job and output file names to reflect your choice.

$ cat submit.slurm
#!/bin/bash

#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:20:00
#SBATCH --export=none
#SBATCH --job-name=matlab-ex1
#SBATCH --output=output-ex1

#Examples tested using R2018a
source /usr/usc/matlab/R2018a/setup.sh

#Suppress time warning
export TZ=America/Los_Angeles

#Call an example in this directory
matlab  -nodisplay -r "estimatePi"       #single+multi-node
#matlab -nodisplay -r "labIndex"          #multi-node only
#matlab -nodisplay -r "broadcastReceive"  #multi-node only
#matlab -nodisplay -r "submitTasks"       #single+multi-node

Getting Help

For additional information on MATLAB, please visit the Mathworks help website. For any questions related specifically to using Parallel MATLAB on HPC, send an email to hpc@usc.edu.