Transferring Very Large Files to HPC
HPC supports IBM’s Aspera and Globus’ Globus Connect to transfer very large and/or proprietary files from an external repository to HPC.
Transferring Large Files with Aspera and Globus
HPC maintains versions of IBM’s Aspera and Globus’ Globus Connect under /usr/usc/aspera and /usr/usc/globus, respectively.
Aspera Secure Copy
Aspera Secure Copy (ASCP) is a high-speed file transfer utility. For complete instructions on how to use ASCP, refer to the official documentation available at https://download.asperasoft.com/download/docs/ascp/3.5.2/html/index.html.
To use Aspera on HPC, login to hpc-transfer and setup the environment.
After the environment has been set use the following command line template to transfer a file(s).
ascp <options> <user@source_hostname:source_file1[,source_file2,...]> <user@destination_hostname:target_path>
Aspera requires a private key file for authentication, use the -i option to specify the file located at /usr/usc/aspera/3.5.1/etc/asperaweb_id_dsa.openssh.
The following example transfers over 8 gigabytes of data from the directory /1000genomes/ftp/data/HG00116 on the remote host, firstname.lastname@example.org, to HPC user ttrojan’s staging directory. This took under 8 minutes on hpc-transfer.
$ source /usr/usc/aspera/default/setup.sh $ ascp -Tr -k1 -i /usr/usc/aspera/3.5.1/etc/asperaweb_id_dsa.openssh email@example.com:/1000genomes/ftp/data/HG00116 /staging/tt1/ttrojan/data HG00116.alt_bwamem_GRCh38DH.20150718.GBR.low_ 100% 1353 (skipped) --:-- HG00116.alt_bwamem_GRCh38DH.20150718.GBR.low_ 100% 9GB 145Mb/s 07:55 HG00116.alt_bwamem_GRCh38DH.20150718.GBR.low_ 100% 252KB 145Mb/s 07:55 Completed: 8443997K bytes transferred in 476 seconds (145201K bits/sec), in 3 files, 2 directories; 1 file skipped or empty.
Globus Connect (Available on HPC)
Globus documentation is available at https://docs.globus.org/how-to/get-started/. Globus can perform unattended transfers with automatic retries.
HPC has a Globus endpoint, uschpc#hpc-transfer.usc.edu. To use Globus, first install its client application on your personal computer by going to https://www.globus.org/globus-connect-personal and choosing “Install Globus Connect Personal” for your own computer’s operating system.
Under Install, follow the link to create an “endpoint”. Choose “University of Southern California” and provide your login information when prompted.
Globus has a command line interface for advanced users.
Third Party Transfers (between remote computers)
HPC researchers who are interested in using Globus for third-party transfers between remote computers will need to authenticate at both endpoints.
Anonymous endpoints available for testing. See <https://www.globus.org/blog/use-test-endpoints-anticipate-your-data-transfer-rates” target=”_blank”>, https://icnwg.llnl.gov/data-transfer-nodes.html, and https://fasterdata.es.net/data-transfer-tools/globus/.
For researchers transferring data from a National Institute of Health database to HPC, see Globus Platform-as-a-Service for Collaborative Science Applications for a description of Globus services that are supported NIH.
A FAQ is available for more information.