Parallel R on HPC

You may have times where you need to run a single program on multiple data sets. There are a number of ways to do this—the simplest way may be to use Slurm’s srun command with the ––multi-prog option.

For example, below we will create 4 data files, each with one line:

$ ls mydata*
mydata1
mydata2
mydata3
mydata4

The first line of mydata1 will be 111, that of mydata2 will be 222, and so on.

$ cat mydata*
111
222
333
444

Paste the following into a file named myjob.slurm. (For this exercise, an example from https://www.tchpc.tcd.ie/node/167 was used and modified to run on HPC.)

#!/bin/bash

#SBATCH --ntasks=12 
#SBATCH --nodes=1
#SBATCH --time 00:10:00
#SBATCH --job_name=rtest
#SBATCH --export=none
#SBATCH --mem-per-cpu=2g

srun --label --multi-prog myjob.config

Paste the following lines into a file named myjob.config. This tells Slurm what tasks to run on each of the CPUs that will be allocated. (On HPC, cpus-per-task=1 by default.)

NOTE: On HPC, the ––multi-prog option requires that the number of tasks in the configuration list must exactly equal the number of tasks allocated. The %t and %o will be populated with the task number and task offset, respectively.

0-3 /usr/bin/hostname
4,5 /usr/bin/echo task:%t
6 /usr/bin/echo task:%t-%o
7 /usr/bin/echo task:%o
8 /usr/bin/cat data1
9 /usr/bin/cat data2
10 /usr/bin/cat data3
11 /usr/bin/cat data4
12 /usr/bin/hostname

While the syntax “0-3” and “4,5” can be used as shown, you will more likely use something like the last four lines to run a program with different data sets. If you wanted task1 to process data1, etc., you could use:

1-4 /usr/bin/cat data%t

Once you have your file correctly configured, it’s time to submit the job:

sbatch myjob.slurm

When the jobs ends, you should see output similar to the following:

---------- Begin SLURM Prolog ----------
Job ID:        2542141
Username:      erinshaw
Accountname:   lc_hpcc
Name:          rtest
Partition:     quick
Nodelist:      hpc1118
TasksPerNode:  16
CPUsPerTask:   Default[1]
TMPDIR:        /tmp/2542141.quick
SCRATCHDIR:    /staging/scratch/2542141
Cluster:       uschpc
HSDA Account:  false
---------- 2019-01-14 10:07:00 ---------
 3: hpc1118
 6: task:6-0
 5: task:5
 2: hpc1118
 7: task:0
 4: task:4
 0: hpc1118
 8: 111
 9: 222
11: 444
 1: hpc1118
10: 333

When ordered, tasks 8-11 look like this:

 8: 111
 9: 222
10: 333
11: 444

Parallel R Packages

Several parallel packages have been developed for R, including parallel, doParallel, rslurm, and BiocParallel. See the CRAN HPC Page for a full description of options for running R in parallel.

NOTE: You must utilize a standard head node (e.g., hpc-login3.usc.edu) to install R packages. To test parallel code, you must be on a compute node.

Here are some examples of running R on HPC using different parallel packages. For the examples, we’ll use the birthday function from the BiocParallel blog post by Leonardo Collado-Torres.

To get started with this tutorial, copy and paste the following lines into R:

> birthday <- function(n) {
+     m <- 10000
+     x <- numeric(m)
+     for(i in seq_len(m)) {
+         b <- sample(seq_len(365), n, replace = TRUE)
+         x[i] <- ifelse(length(unique(b)) == n, 0, 1)
+     }
+     mean(x)
+ }

You can use lapply() or a for loop to calculate the results. In the example below, we use lapply():

system.time( lapply(seq_len(100), birthday) )

We can now test the birthday function with some parallel options for R. While you are downloading and installing these libraries (parallel, rslurm, BiocParallel), you should also install the package foreach for testing purposes.

Package: parallel

See the following R manual for R's parallel library: Package 'parallel'.

## 20(x2) cores detected on hpc4515
> library(parallel)
# Find out how many cores are available (if you don't already know)
> detectCores()
[1] 20
# Create cluster with desired number of cores
cl <- makeCluster(3)
# Find out how many cores are being used
> detectCores()
[1] 20
> print("Hello World!")
"Hello World!"
> stopCluster(cl)

Package: rslurm

See the manual for R's rslurm library at Package 'rslurm'. The following example is from Parallelize R code on a Slurm cluster, and was modified for use with our birthday function test script.

> library(rslurm)
> pars <- data.frame(n=seq_len(100))
> sjob <- slurm_apply(birthday, pars, jobname='myjob',cpus=8,submit=TRUE)
Submitted batch job 2398557
> list.files('_rslurm_myjob', 'results')
[1] "results_0.RDS" "results_1.RDS"
> res <- get_slurm_out(sjob, outtype='table')
> res

Output will be placed in a subdirectory named "_rslurm_myjob":

$ ls _rslurm_myjob
f.RDS  params.RDS  results_0.RDS  results_1.RDS  
slurm_0.out  slurm_1.out  slurm_run.R  submit.sh

Package: BiocParallel

See the manual for R's BiocParallel library at Package 'BiocParallel'. This example is from the BiocParallel blog post by Leonardo Collado-Torres.

Once you have loaded the BiocParallel library, request a compute node with 8 CPUs (e.g., ––ntasks=8), run R, and try the following:

## Load library
> library('BiocParallel')

## Test the birthday() function
> system.time( lapply(seq_len(100), birthday) )
   user  system elapsed 
 33.394   0.153  33.600 

## The results of registered() will depend on the compute node you are allocated.
> registered()
$MulticoreParam
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 22; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE; bpexportglobals: TRUE
  bpRNGseed: 
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

$SnowParam
class: SnowParam
  bpisup: FALSE; bpnworkers: 22; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE; bpexportglobals: TRUE
  bpRNGseed: 
  bplogdir: NA
  bpresultdir: NA
  cluster type: SOCK

$SerialParam
class: SerialParam
  bpisup: TRUE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE; bpexportglobals: TRUE
  bplogdir: NA

## Try Mulitcore
> system.time( y.snow <- bplapply(1:10, birthday, BPPARAM = MulticoreParam(workers = 4)) )
   user  system elapsed 
  0.034   0.186   1.450 

## Try Snow
> system.time( y.snow <- bplapply(1:10, birthday, BPPARAM = SnowParam(workers = 4)) )
   user  system elapsed 
  0.049   0.043   5.238 

## Try Serial
> system.time( y.snow <- bplapply(1:10, birthday, BPPARAM = SerialParam()) )
   user  system elapsed 
  2.957   0.037   2.996 

(See Running R on HPC page for more details on how to do this.)

OMP_NUM_THREADS

Parallel libraries with OpenMP (e.g., OpenBLAS, Intel MKL) support use of the environment variable OMP_NUM_THREADS to set the maximum number of threads to use for parallel processing. We recommended that you set this variable to 1.

export OMP_NUM_THREADS=1

For optimization, the value can be set automatically using a Slurm environment variable (e.g., $SLURM_CPUS_ON_NODE, $SLURM_CPUS_PER_TASK, or $SLURM_TASKS_PER_NODE/$SLURM_JOBS_CPUS_PER_NODE). Some experimentation may be needed.

Getting Help

For assistance with running parallel jobs on HPC, see our Getting Help page or send an email to hpc@usc.edu.

USC Biostatistics Resources

Researchers from USC's Biostatistics community have developed addtional training, code, and documentation for running parallel R on HPC. These resources are available at https://github.com/USCbiostats.

Questions about these resources should be addressed to USC Biostatistics' GitHub developers and not to HPC.