R on HPC
Initialization Step (Before Installing R Packages)
HPC researchers are encouraged to install their own R packages if they have not been already installed for the version of R you will be using on HPC.
By default, R will install local or user packages in your home directory under ~/R. To avoid filling up your home directory, you must perform a one-time initialization step to change the install location for R packages.
From your home directory, create a new R_packages directory in your project directory. Then create a symbolic link to the R_packages directory and name it “R”.
cd ~ mkdir /home/rcf-proj/<project>/<user>/R_packages ln -s /home/rcf-proj/<project>/<user>/R_packages R
A symbolic link appears to be identical to the file or directory it links to. You can see that it is actually a link by typing ls -l.
[ttrojan@hpc-login3]$ ls -l R lrwxr-xr-x 1 ttrojan hpc Apr 10 R -> /home/rcf-proj/tt/ttrojan/R_packages/
Now when you install R packages, R will still place them in ~/R by default, but the link will reroute the files to the new R_packages directory in your project directory, which has a much higher disk quota.
Installing R Packages from CRAN
After the initialization step, you can install R packages by sourcing the environment setup file for the selected version, starting R, installing the package, and loading the library.
[ttrojan@hpc-login3]$ source /usr/usc/R/default/setup.sh [ttrojan@hpc-login3]$ R : > install.packages("<package_name>") > library("<package_name>")
If you want to use R packages that are stored in other locations, for example, a shared project directory, you must specify the package path, e.g.:
> install.packages("<package_name>", lib="/path/to/packages") > library (<package_name>, lib.loc="/path/to/packages")
You can explicitly set a package path using R’s R_LIBS_USER environment variable in your Slurm or environment setup script with a line like the following:
You can display the current library paths with .libPaths(). Note: R packages are version-specific so that a library may exist for one version and not for another.
>.libPaths()  "/auto/rcf-proj/tt/ttrojan/R_packages/x86_64-pc-linux-gnu-library/3.3"  "/auto/usc/R/3.3.1/lib64/R/library"
To investigate the packages and versions of packages installed under /usr/usc/R/default (or under a particular version), use the command installed.packages and specify the location of the library.
> installed.packages(lib.loc="/usr/usc/R/3.5.0/lib64/R/library") > installed.packages(lib.loc="~/R/x86_64-pc-linux-gnu-library/3.3")
To check if a specific package is installed, use system.file(package=”<package_name>”) or packageDescription(“<package_name>”), e.g.:
> system.file(package="parallel")  "/usr/usc/R/3.5.0/lib64/R/library/parallel" > packageDescription("parallel") Package: parallel Version: 3.5.0 Priority: base Title: Support for Parallel computation in R Author: R Core Team Maintainer: R Core Team <Rfirstname.lastname@example.org> Description: Support for parallel computation, including by forking (taken from package multicore), by sockets (taken from package snow) and random-number generation. License: Part of R 3.5.0 Imports: tools, compiler Suggests: methods Enhances: snow, nws, Rmpi NeedsCompilation: yes Built: R 3.5.0; x86_64-pc-linux-gnu; 2018-06-22 22:14:04 UTC; unix -- File: /auto/usc/R/3.5.0/lib64/R/library/parallel/Meta/package.rds
Installing R Packages from Other Repositories
If you want to install packages from non-default repositories, such as BioConductor, use setRepositories() and supply the number of the repository listed. The next time you type setRepositories(), you should see a “+” next to the new repository. This example was run using R 3.5.0.
> setRepositories() Repositories 1: + CRAN 2: BioC software 3: BioC annotation 4: BioC experiment 5: CRAN (extras) 6: Omegahat 7: R-Forge 8: rforge.net Enter one or more numbers separated by spaces, or an empty line to cancel 1: 2 3 4 > > install.packages("BiocManager") > BiocManager::install("BiocParallel") > library('BiocParallel')
Running R Interactively
It is a good idea to test your R program on an “interactive” compute node, before submitting a batch (remote) job. The following will request a compute node with 8 CPUs, each with 2GB of memory, for 1 hour and, when the resource is allocated, log you into the node.
[ttrojan@hpc-login3]$ salloc --ntasks=8 --mem-per-cpu=2g --time=1:00:00 salloc: Pending job allocation 2377051 salloc: job 2377051 queued and waiting for resources salloc: job 2377051 has been allocated resources salloc: Granted job allocation 2377051 salloc: Waiting for resource configuration salloc: Nodes hpc3676 are ready for job ---------- Begin SLURM Prolog ---------- Job ID: 2377051 Username: ttrojan Accountname: lc_hpcc Name: sh Partition: quick Nodelist: hpc3676 TasksPerNode: 8 CPUsPerTask: Default TMPDIR: /tmp/2377051.quick SCRATCHDIR: /staging/scratch/2377051 Cluster: uschpc HSDA Account: false ---------- 2018-12-12 17:59:36 --------- [ttrojan@hpc3676]$
Once you are on a compute node, select the version of R you wish to run from /usr/usc/R. “Sourcing” the script /usr/usc/R/<version>/setup.sh will configure your environment to find (and use) that version of R and Rscript. You can run your program on the command line with Rscript. For example:
[ttrojan@hpc3676]$ source /usr/usc/R/3.5.0/setup.sh [ttrojan@hpc3676]$ Rscript hello.R  "Hello Tommy"
Alternatively, you can run your program within R or RStudio (if you have it installed). If your program is not in the same directory as the one in which you opened R, you will need to either specify a path or set R’s working directory to the location of the program before you call it.
[ttrojan@hpc-login3]$ R R version 3.5.0 (2018-04-23) -- "Joy in Playing" : # If code in same directory where R was invoked > source("hello.R")  "Hello Tommy" # If not, using an absolute path will work > source("/home/rcf-proj/tt/ttrojan/R/hello.R")  "Hello Tommy"
Running R Remotely
When you have tested your R program and are ready to run it remotely, i.e., in batch mode, you will create a new text file, called a job script, where you will specify the compute resources and commands needed to run your job. The following job script, myjob.slurm, will request 16 CPUs on a single compute node, each with 2GB of memory, for 4 hours, and will then set up R 3.5.0 and run myscript.R.
#!/bin/bash #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16 #SBATCH --mem-per-cpu=2GB #SBATCH --time=4:00:00 #SBATCH --export=none #Ensures job gets a fresh login environment source /usr/usc/R/3.5.0/setup.sh Rscript myscript.R
After your R job script has been created you can use the command sbatch <job_script> to submit your job:
[ttrojan@hpc-login3]$ sbatch example_R.slurm Submitted batch job 1131075
To check on the status of your job use the squeue -u <username> command. If you are on a head node, you can use the HPC wrapper, myqueue.
[ttrojan@hpc-login3]$ myqueue JOBID USER ACCOUNT PARTITION NAME TASKS CPUS_PER_TASK MIN_MEMORY START_TIME TIME TIME_LIMIT STATE NODELIST(REASON) 1131075 ttrojan lc_tt1 quick example_R 16 1 1G 2018-07-02T11:14:46 1:49 30:00 RUNNING hpc1046 [ttrojan@hpc-login3]$ squeue -u ttrojan JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1131095 quick example_R ttrojan PD 0:00 2 (Resources)
By default, all output sent to the console, including error messages and print statements, is directed to a file named “slurm-%j.out”, where the “%j” is replaced with the job ID number. The file will be generated on the first node of the job allocation.
Any files created by the R program itself will be created as specified by the program.
Viewing R Plots on HPC
The following code produces a jpeg file
> x <- rnorm(50) > y <- rnorm(x) ## If testing interactively with Xforwarding enabled, this command will display an interactive plot > plot(x,y) ## If this is a batch job, the following three commands can be used to save your plot to a file. > png('rplot.png') > plot(x,y) > dev.off() : > quit() ## Check that plot was created $ ls *.png rplot.png ## If X-forwarding is enabled, you can use the ImageMagick utility /usr/bin/display to view the plot, or download it to your personal computer and view it there. $ display rplot.png
Running RStudio on HPC
Researchers from USC’s Biostatistics community have developed instructions for installing and running RStudio on HPC. These are available under https://github.com/USCbiostats.
Questions about regarding RStudio should be addressed to the Biostatistics GitHub developers and not to HPC staff.
If devtools is installed, you can save the session information for the purposes of reproducibility.
> library('devtools') > session_info() ─ Session info ─────────────────────────────────────────────────────────────── setting value version R version 3.5.0 (2018-04-23) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz US/Pacific date 2018-12-18 ─ Packages ─────────────────────────────────────────────────────────────────── package * version date lib source assertthat 0.2.0 2017-04-11  CRAN (R 3.5.0) backports 1.1.3 2018-12-14  CRAN (R 3.5.0) BiocParallel * 1.16.2 2018-11-28  Bioconductor callr 3.1.0 2018-12-10  CRAN (R 3.5.0) cli 1.0.0 2017-11-05  CRAN (R 3.5.0) crayon 1.3.4 2017-09-16  CRAN (R 3.5.0) desc 1.2.0 2018-05-01  CRAN (R 3.5.0) devtools * 2.0.1 2018-10-26  CRAN (R 3.5.0) digest 0.6.18 2018-10-10  CRAN (R 3.5.0) fs 1.2.6 2018-08-23  CRAN (R 3.5.0) glue 1.2.0 2017-10-29  CRAN (R 3.5.0) magrittr 1.5 2014-11-22  CRAN (R 3.5.0) memoise 1.1.0 2017-04-21  CRAN (R 3.5.0) pkgbuild 1.0.2 2018-10-16  CRAN (R 3.5.0) pkgload 1.0.2 2018-10-29  CRAN (R 3.5.0) prettyunits 1.0.2 2015-07-13  CRAN (R 3.5.0) processx 3.2.1 2018-12-05  CRAN (R 3.5.0) ps 1.2.1 2018-11-06  CRAN (R 3.5.0) R6 2.2.2 2017-06-17  CRAN (R 3.5.0) Rcpp 0.12.17 2018-05-18  CRAN (R 3.5.0) remotes 2.0.2 2018-10-30  CRAN (R 3.5.0) rlang 0.2.1 2018-05-30  CRAN (R 3.5.0) rprojroot 1.3-2 2018-01-03  CRAN (R 3.5.0) sessioninfo 1.1.1 2018-11-05  CRAN (R 3.5.0) snow 0.4-2 2016-10-14  CRAN (R 3.5.0) usethis * 1.4.0 2018-08-14  CRAN (R 3.5.0) withr 2.1.2 2018-03-15  CRAN (R 3.5.0)  /auto/rcf-proj/ess/erinshaw/myRPackages/x86_64-pc-linux-gnu-library/3.5  /auto/usc/R/3.5.0/lib64/R/library