Running a Job on HPC using PBS
To run a job on the HPC cluster, you will need to set up a Portable Batch System (PBS) file. This PBS file defines the commands and cluster resources used for the job. A PBS file is a simple text file that can be easily edited with a UNIX editor, such as vi, pico, or emacs. Information on UNIX editors can be found in the UNIX Overview section of the ITS website.
Submitting a Job
In order to use the HPC compute nodes, you must first log in to one of the head nodes, hpc-login1 or hpc-login2, and submit a PBS job. The qsub command is used to submit a job to the PBS queue and to request additional resources. The qstat command is used to check on the status of a job already in the PBS queue. To simplify submitting a job, you can create a PBS script and use the qsub and qstat commands to interact with the PBS queue.
Creating a PBS Script
To set the parameters for your job, you can create a control file that contains the commands to be executed. Typically, this is in the form of a PBS script. This script is then submitted to PBS using the qsub command.
Here is a sample PBS file, named myjobs.pbs, followed by an explanation of each line of the file.
#!/bin/bash #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:00:59 cd /home/rcf-proj3/pv/test/ source /usr/usc/sas/default/setup.sh sas my.sas
- The first line in the file identifies which shell will be used for the job. In this example, bash is used but csh or other valid shells would also work.
- The second line specifies the number of nodes and processors desired for this job. In this example, one node with two processors is being requested.
- The third line in the PBS file states how much wall-clock time is being requested. In this example 59 seconds of wall time have been requested.
- The fourth line tells the HPC cluster to access the directory where the data is located for this job. In this example, the cluster is instructed to change the directory to the /home/rcf-proj3/pv/test/ directory.
- The fifth line tells the cluster which program you would like to use to analyze your data. In this example, the cluster sources the environment for SAS.
- The sixth line tells the cluster to run the program. In this example, it runs SAS, specifying my.sas as the argument in the current directory, /home/rcf-proj3/pv/test, as defined in the previous line.
Using the qsub Command
To submit your job without requesting additional resources, issue the command
If you have the myjob.pbs set up as explained in the example above and you want to override the default options in the myjob.pbs file, then you can use the -l parameter on the qsub command line to override the option specified in the file.
Below are some examples of these overrides.
Requesting Additional Wall Time
If you need to request more or less wall time after you have already created your PBS script, you can do this by using the qsub command.
In the example script above, we have requested 59 seconds of wall time. If you realize later that your job actually requires five minutes to complete, the command
qsub -l walltime=0:05:00 myjob.pbs
will ask PBS for a limit of five minutes of wall time. If your job does not finish within the specified time, it will be terminated.
Requesting Nodes and Processors
You may also alter the number of nodes and processors requested for a job by using the qsub command. In the example script, we have requested one node with two processors, or one dual-processor node.
If you later decide that you need four HPC nodes for your job but you are going to use only one of the dual-processors on each node, then use the following command:
qsub -l walltime=0:05:00,nodes=4 myjob.pbs
If you want to use both processors on each HPC node, you should use the following command:
qsub -l walltime=0:05:00,nodes=4:ppn=2 myjob.pbs
Requesting GPU nodes
You can submit jobs directly to the graphical processing unit (GPU) cluster by specifying the number of GPUs your job requires in your PBS request.
For example, if you need four nodes with 16 cores per node and two GPUs per node, then your command should look like this:
qsub -l walltime=0:05:00,nodes=4:ppn=16:gpus=2
Requesting a Specific Network (Myrinet or Infiniband)
To set which network your job should run on, add the myri or IB feature to your PBS script.
If you want to use the Myrinet network, your script should look like this:
#PBS -l nodes=1:ppn=2:myri
NOTE: If you are using the mpich or mpich2 libraries, you must use the Myrinet cluster for your job.
If you wish to use the Infiniband network, your script should look like this:
#PBS -l nodes=1:ppn=2:IB
MPI jobs using OpenMPI 1.6.4 or later can run on the Infiniband network.
NOTE: Only one network should be specified for each job. If no network is specified. the job will be scheduled to run on whichever network is available.
Checking Job Status
To check on the status of your job, you will use the qstat command. The command
qstat –u [your username]
will show you the current status of all your submitted jobs.
Submitting Jobs from Multiple Directories and Allocations
The script you submit to the PBS queue may need to run on a system that requires a new login. If so, any setup that you require for your program to run will have to be duplicated in the PBS script. This includes changing your current directory from the home directory to the directory required for your job. PBS provides a variable, PBS_O_WORKDIR, that tells you what your directory was when you executed the qsub command. If you want your job to start from that same directory, you can insert the following line in your qsub script right after the #PBS lines:
Access to HPC is governed by the HPC allocation policy. You must submit a request and be granted an allocation for each project. If you are assigned more than one allocation because you are involved in more than one project or because you are taking a class in parallel programming, you may have multiple allocations. If so, you must specify which account you are using to submit a job on either in your qsub command or in a #PBS line in your PBS script.
For example, the following command would submit your job to PBS using the account lc_drs:
qsub -A lc_drs
The account is specified in the command with an uppercase A.
You can find out which accounts you have access to and what their compute hour balances are with the mybalance command: