Transferring Very Large Files to HPC

HPC supports IBM’s Aspera and Globus’ Globus Connect to transfer very large and/or proprietary files from an external repository to HPC.

Transferring Large Files with Aspera and Globus

HPC maintains versions of IBM’s Aspera and Globus’ Globus Connect under /usr/usc/aspera and /usr/usc/globus, respectively.

Aspera Secure Copy

Aspera Secure Copy (ASCP) is a high-speed file transfer utility. For complete instructions on how to use ASCP, refer to the official documentation available at https://download.asperasoft.com/download/docs/ascp/3.5.2/html/index.html.

To use Aspera on HPC, login to hpc-transfer and setup the environment.

source /usr/usc/aspera/default/setup.sh

After the environment has been set use the following command line template to transfer a file(s).

ascp <options> <user@source_hostname:source_file1[,source_file2,...]> <user@destination_hostname:target_path>

Aspera requires a private key file for authentication, use the -i option to specify the file located at /usr/usc/aspera/3.5.1/etc/asperaweb_id_dsa.openssh.

The following example transfers over 8 gigabytes of data from the directory /1000genomes/ftp/data/HG00116 on the remote host, anonftp@ftp.ncbi.nlm.nih.gov, to HPC user ttrojan’s staging directory. This took under 8 minutes on hpc-transfer.

$ source /usr/usc/aspera/default/setup.sh

$ ascp -Tr -k1 -i /usr/usc/aspera/3.5.1/etc/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/1000genomes/ftp/data/HG00116  /staging/tt1/ttrojan/data
HG00116.alt_bwamem_GRCh38DH.20150718.GBR.low_                                                                     100% 1353   (skipped)   --:--    
HG00116.alt_bwamem_GRCh38DH.20150718.GBR.low_                                                                     100%    9GB  145Mb/s    07:55    
HG00116.alt_bwamem_GRCh38DH.20150718.GBR.low_                                                                     100%  252KB  145Mb/s    07:55    
Completed: 8443997K bytes transferred in 476 seconds
(145201K bits/sec), in 3 files, 2 directories; 1 file skipped or empty.

Globus Connect (Available on HPC)

Globus documentation is available at https://docs.globus.org/how-to/get-started/. Globus can perform unattended transfers with automatic retries.

Local Transfers

HPC has a Globus endpoint, uschpc#hpc-transfer.usc.edu. To use Globus, first install its client application on your personal computer by going to https://www.globus.org/globus-connect-personal and choosing “Install Globus Connect Personal” for your own computer’s operating system.

Under Install, follow the link to create an “endpoint”. Choose “University of Southern California” and provide your login information when prompted.

Globus has a command line interface for advanced users.

Third Party Transfers (between remote computers)

HPC researchers who are interested in using Globus for third-party transfers between remote computers will need to authenticate at both endpoints.

Anonymous endpoints available for testing. See <https://www.globus.org/blog/use-test-endpoints-anticipate-your-data-transfer-rates” target=”_blank”>, https://icnwg.llnl.gov/data-transfer-nodes.html, and https://fasterdata.es.net/data-transfer-tools/globus/.

For researchers transferring data from a National Institute of Health database to HPC, see Globus Platform-as-a-Service for Collaborative Science Applications for a description of Globus services that are supported NIH.

A FAQ is available for more information.