R on HPC

The following is meant to be a brief guide to running R jobs on HPC. For more details, please see the New User Guide or Running a Job on HPC using Slurm.

Initialization Step (Before Installing R Packages)

HPC researchers are encouraged to install their own R packages if they have not been already installed for the version of R you will be using on HPC.

By default, R will install local or user packages in your home directory under ~/R. To avoid filling up your home directory, you must perform a one-time initialization step to change the install location for R packages.

From your home directory, create a new R_packages directory in your project directory. Then create a symbolic link to the R_packages directory and name it “R”.

cd ~
mkdir /home/rcf-proj/<project>/<user>/R_packages
ln -s /home/rcf-proj/<project>/<user>/R_packages R

A symbolic link appears to be identical to the file or directory it links to. You can see that it is actually a link by typing ls -l.

[ttrojan@hpc-login3]$ ls -l R
lrwxr-xr-x  1 ttrojan hpc  Apr 10 R -> /home/rcf-proj/tt/ttrojan/R_packages/

Now when you install R packages, R will still place them in ~/R by default, but the link will reroute the files to the new R_packages directory in your project directory, which has a much higher disk quota.

Installing R

Installing R Packages from CRAN

After the initialization step, you can install R packages by sourcing the environment setup file for the selected version, starting R, installing the package, and loading the library.

[ttrojan@hpc-login3]$ source /usr/usc/R/default/setup.sh
[ttrojan@hpc-login3]$ R
:
> install.packages("<package_name>")
> library("<package_name>")

If you want to use R packages that are stored in other locations, for example, a shared project directory, you must specify the package path, e.g.:

> install.packages("<package_name>", lib="/path/to/packages") 
> library (<package_name>, lib.loc="/path/to/packages")

You can explicitly set a package path using R’s R_LIBS_USER environment variable in your Slurm or environment setup script with a line like the following:

export R_LIBS_USER=/home/rcf-proj/tt/trojan/R/parallel:$R_LIBS_USER

You can display the current library paths with .libPaths(). Note: R packages are version-specific so that a library may exist for one version and not for another.

>.libPaths()
[1] "/auto/rcf-proj/tt/ttrojan/R_packages/x86_64-pc-linux-gnu-library/3.3" 
[2] "/auto/usc/R/3.3.1/lib64/R/library"

To investigate the packages and versions of packages installed under /usr/usc/R/default (or under a particular version), use the command installed.packages and specify the location of the library.

> installed.packages(lib.loc="/usr/usc/R/3.5.0/lib64/R/library") 
> installed.packages(lib.loc="~/R/x86_64-pc-linux-gnu-library/3.3")

To check if a specific package is installed, use system.file(package=”<package_name>”) or packageDescription(“<package_name>”), e.g.:

> system.file(package="parallel")
[1] "/usr/usc/R/3.5.0/lib64/R/library/parallel"

> packageDescription("parallel")
Package: parallel
Version: 3.5.0
Priority: base
Title: Support for Parallel computation in R
Author: R Core Team
Maintainer: R Core Team <R-core@r-project.org>
Description: Support for parallel computation, including by forking
        (taken from package multicore), by sockets (taken from package
        snow) and random-number generation.
License: Part of R 3.5.0
Imports: tools, compiler
Suggests: methods
Enhances: snow, nws, Rmpi
NeedsCompilation: yes
Built: R 3.5.0; x86_64-pc-linux-gnu; 2018-06-22 22:14:04 UTC; unix

-- File: /auto/usc/R/3.5.0/lib64/R/library/parallel/Meta/package.rds 

Installing R Packages from Other Repositories

If you want to install packages from non-default repositories, such as BioConductor, use setRepositories() and supply the number of the repository listed. The next time you type setRepositories(), you should see a “+” next to the new repository. This example was run using R 3.5.0.

> setRepositories()
Repositories

1: + CRAN
2:   BioC software
3:   BioC annotation
4:   BioC experiment
5:   CRAN (extras)
6:   Omegahat
7:   R-Forge
8:   rforge.net

Enter one or more numbers separated by spaces, or an empty line to cancel
1: 2 3 4
>
> install.packages("BiocManager")
> BiocManager::install("BiocParallel")
> library('BiocParallel')

Running R

Running R Interactively

It is a good idea to test your R program on an “interactive” compute node, before submitting a batch (remote) job. The following will request a compute node with 8 CPUs, each with 2GB of memory, for 1 hour and, when the resource is allocated, log you into the node.

[ttrojan@hpc-login3]$ salloc --ntasks=8 --mem-per-cpu=2g --time=1:00:00
salloc: Pending job allocation 2377051
salloc: job 2377051 queued and waiting for resources
salloc: job 2377051 has been allocated resources
salloc: Granted job allocation 2377051
salloc: Waiting for resource configuration
salloc: Nodes hpc3676 are ready for job
---------- Begin SLURM Prolog ----------
Job ID:        2377051
Username:      ttrojan
Accountname:   lc_hpcc
Name:          sh
Partition:     quick
Nodelist:      hpc3676
TasksPerNode:  8
CPUsPerTask:   Default[1]
TMPDIR:        /tmp/2377051.quick
SCRATCHDIR:    /staging/scratch/2377051
Cluster:       uschpc
HSDA Account:  false
---------- 2018-12-12 17:59:36 ---------
[ttrojan@hpc3676]$ 

Once you are on a compute node, select the version of R you wish to run from /usr/usc/R. “Sourcing” the script /usr/usc/R/<version>/setup.sh will configure your environment to find (and use) that version of R and Rscript. You can run your program on the command line with Rscript. For example:

[ttrojan@hpc3676]$ source /usr/usc/R/3.5.0/setup.sh
[ttrojan@hpc3676]$ Rscript hello.R
[1] "Hello Tommy"

Alternatively, you can run your program within R or RStudio (if you have it installed). If your program is not in the same directory as the one in which you opened R, you will need to either specify a path or set R’s working directory to the location of the program before you call it.

[ttrojan@hpc-login3]$ R
R version 3.5.0 (2018-04-23) -- "Joy in Playing"
:
# If code in same directory where R was invoked
> source("hello.R")
[1] "Hello Tommy"

# If not, using an absolute path will work
> source("/home/rcf-proj/tt/ttrojan/R/hello.R")
[1] "Hello Tommy"

Running R Remotely

When you have tested your R program and are ready to run it remotely, i.e., in batch mode, you will create a new text file, called a job script, where you will specify the compute resources and commands needed to run your job. The following job script, myjob.slurm, will request 16 CPUs on a single compute node, each with 2GB of memory, for 4 hours, and will then set up R 3.5.0 and run myscript.R.

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=2GB
#SBATCH --time=4:00:00
#SBATCH --export=none #Ensures job gets a fresh login environment

source /usr/usc/R/3.5.0/setup.sh
Rscript myscript.R

After your R job script has been created you can use the command sbatch <job_script> to submit your job:

[ttrojan@hpc-login3]$ sbatch example_R.slurm
Submitted batch job 1131075

To check on the status of your job use the squeue -u <username> command. If you are on a head node, you can use the HPC wrapper, myqueue.

[ttrojan@hpc-login3]$ myqueue
JOBID    USER  ACCOUNT  PARTITION  NAME             TASKS  CPUS_PER_TASK  MIN_MEMORY  START_TIME           TIME  TIME_LIMIT  STATE    NODELIST(REASON)
1131075  ttrojan  lc_tt1  quick      example_R  16     1              1G          2018-07-02T11:14:46  1:49  30:00       RUNNING  hpc1046

[ttrojan@hpc-login3]$ squeue -u ttrojan
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
1131095     quick example_R     ttrojan PD       0:00      2 (Resources)

By default, all output sent to the console, including error messages and print statements, is directed to a file named “slurm-%j.out”, where the “%j” is replaced with the job ID number. The file will be generated on the first node of the job allocation.

Any files created by the R program itself will be created as specified by the program.

Viewing R Plots on HPC

The following code produces a jpeg file

> x <- rnorm(50) > y <- rnorm(x) ## If testing interactively with Xforwarding enabled, this command will display an interactive plot > plot(x,y)

## If this is a batch job, the following three commands can be used to save your plot to a file.
> png('rplot.png')
> plot(x,y)
> dev.off()
:
> quit()

## Check that plot was created
$ ls *.png
rplot.png

## If X-forwarding is enabled, you can use the ImageMagick utility /usr/bin/display to view the plot, or download it to your personal computer and view it there.
$ display rplot.png

Running RStudio on HPC

Researchers from USC’s Biostatistics community have developed instructions for installing and running RStudio on HPC. These are available under https://github.com/USCbiostats.

Questions about regarding RStudio should be addressed to the Biostatistics GitHub developers and not to HPC staff.

Reproducibility

If devtools is installed, you can save the session information for the purposes of reproducibility.

> library('devtools')
> session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 os       CentOS Linux 7 (Core)       
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       US/Pacific                  
 date     2018-12-18                  

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version date       lib source        
 assertthat     0.2.0   2017-04-11 [2] CRAN (R 3.5.0)
 backports      1.1.3   2018-12-14 [1] CRAN (R 3.5.0)
 BiocParallel * 1.16.2  2018-11-28 [1] Bioconductor  
 callr          3.1.0   2018-12-10 [1] CRAN (R 3.5.0)
 cli            1.0.0   2017-11-05 [2] CRAN (R 3.5.0)
 crayon         1.3.4   2017-09-16 [2] CRAN (R 3.5.0)
 desc           1.2.0   2018-05-01 [1] CRAN (R 3.5.0)
 devtools     * 2.0.1   2018-10-26 [1] CRAN (R 3.5.0)
 digest         0.6.18  2018-10-10 [1] CRAN (R 3.5.0)
 fs             1.2.6   2018-08-23 [1] CRAN (R 3.5.0)
 glue           1.2.0   2017-10-29 [2] CRAN (R 3.5.0)
 magrittr       1.5     2014-11-22 [2] CRAN (R 3.5.0)
 memoise        1.1.0   2017-04-21 [1] CRAN (R 3.5.0)
 pkgbuild       1.0.2   2018-10-16 [1] CRAN (R 3.5.0)
 pkgload        1.0.2   2018-10-29 [1] CRAN (R 3.5.0)
 prettyunits    1.0.2   2015-07-13 [1] CRAN (R 3.5.0)
 processx       3.2.1   2018-12-05 [1] CRAN (R 3.5.0)
 ps             1.2.1   2018-11-06 [1] CRAN (R 3.5.0)
 R6             2.2.2   2017-06-17 [2] CRAN (R 3.5.0)
 Rcpp           0.12.17 2018-05-18 [2] CRAN (R 3.5.0)
 remotes        2.0.2   2018-10-30 [1] CRAN (R 3.5.0)
 rlang          0.2.1   2018-05-30 [2] CRAN (R 3.5.0)
 rprojroot      1.3-2   2018-01-03 [1] CRAN (R 3.5.0)
 sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 3.5.0)
 snow           0.4-2   2016-10-14 [2] CRAN (R 3.5.0)
 usethis      * 1.4.0   2018-08-14 [1] CRAN (R 3.5.0)
 withr          2.1.2   2018-03-15 [1] CRAN (R 3.5.0)

[1] /auto/rcf-proj/ess/erinshaw/myRPackages/x86_64-pc-linux-gnu-library/3.5
[2] /auto/usc/R/3.5.0/lib64/R/library

Getting Help

For assistance with running R on HPC, see our Getting Help page or send an email to hpc@usc.edu.