The Scavenge Partition

As part of our ongoing efforts to improve cluster efficiency, HPC is pleased to announce a new queue that will enable you to take advantage of idle condo compute nodes. The scavenge partition provides all HPC researchers access to these idle condo compute resources. While this new queue will allow longer run times than other general-access queues, restrictions do apply. This document will detail those restrictions and provide additional information on how to use the scavenge partition.

NOTE: HSDA research accounts may not use the scavenge partition in order to maintain the data security requirements of the accounts.

Configuration Limitations

Jobs submitted to this partition will run on idle condo compute nodes; however, the condo owner’s usage needs will preempt any scavenge jobs. In addition, as the nodes associated with the scavenge partition are not general-partition compute nodes, any researchers who misuse this partition may have their access revoked. If you wish to run compute jobs on the scavenge partition, please be aware of the following core-based rule sets and other configuration limitations:

Partition Rule Sets

Unlike general partitions, the scavenge partition allows multiple jobs to run on a single node, so you will need to submit your jobs with the appropriate core and memory resource requirements.

If you do not specify resource requirements, the default values for your job will be:

  • Default memory: 1 gigabyte per core
  • Default cores per task: 1
  • Default tasks per job: 1

The maximum resources you can request are:

  • Maximum runtime per job: 7 days
  • Maximum core count per user: 500
  • Maximum number of jobs submitted by a single user: 200

NOTE: These are the initial configuration limitations. They may be re-evaluated and modified to ensure fair usage and resource optimization.

Job Start Times and Resource Availability

Job start times are not guaranteed, and jobs are run on a first-in, first-out basis. The nodes for this partition will only be available if the condo owners are not using them. It is not possible to determine when these resources will be available.

Job Preemption

If a condo owner requires the nodes/resources assigned to your job, your job will be preempted. Preempted jobs will not be requeued, you will need to resubmit the job if it is preempted. This means that a scavenge job can run for as little as one second to as long as the partition’s maximum walltime.

In order to not lose work on preempted jobs, it is highly recommended that all jobs running on the scavenge partition:

  1. Be able to checkpoint. Otherwise work could be lost when the job is preempted. See HPC checkpointing documentation for more about options (https://hpcc.usc.edu/support/documentation/checkpointing//.You can also use Slurm’s –requeue option to automatically re-start a preempted job. If used properly with Slurm constraints and checkpointing, the job can be re-started from its interrupted state on an equivalent node.
  2. Include signal trapping and process clean-up code in your job scripts in order to clean-up on trap of any job receiving the TERM signal. Slurm batch scripts that use the bash shell are able to trap a TERM signal, allowing a clean-up function to run before the job is terminated. This must be a minimum function as Slurm only waits 30 seconds before forcibly killing the job. See the Capturing Termination Signals section below for an example.

Capturing Termination Signals

When you use the scavenge partition, your job may be interrupted at any time. When this happens, a termination signal is sent to the Slurm process. Signals are a form of interprocess communication on UNIX systems.

Examples of these termination signals include:

  • SIGTERM: The SIGTERM signal is sent directly to a process to request its termination. This signal can be caught and interpreted or ignored by the process. This allows the process to perform nice termination, cleaning up all resources and saving state, if appropriate.
  • SIGINT: The SIGINT signal is sent to a process by its controlling terminal when a user wishes to interrupt the process. This signal is typically initiated by pressing Ctrl+C.

You can trap these signals in your Slurm script. As an example, the bash script below traps a signal and calls a function. You can copy and paste these lines into a file named trap.sh and then run the script using ./trap.sh. Type Ctrl-C after you start the script to terminate it.

#!/bin/bash                                                                            
# Usage: ./trap.sh                                                                                                          

# Trap termination signals
trap 'clean_up' SIGINT SIGTERM

# Call clean_up function if users cancels the job 
function clean_up {
    echo
    echo "Caught SIGINT or SIGTERM!"
    echo "Cleaning up process $$."
    echo "Bye!"
    exit
}

# Main program 
echo "I am running -- try to stop me with a Ctrl-C."
while :
do
    sleep 1
    echo "I am still running.."
done

To use the code above as part of a Slurm job, insert the trap and clean up function into your Slurm script and write the clean up code. Then add your own job commands.

#!/bin/bash
#SBATCH --export=none
#SBATCH --ntasks=1
#SBATCH --time=00:05:00 

# Trap termination signals
trap 'clean_up' SIGINT SIGTERM

# Call clean_up function if SLURM cancels the job  
function clean_up {
    cp $TMPDIR/myresults.log /staging/tt/ttrojan
    exit
}

# Main program (place your own remote job commands here)

Requeuing and Restarting Your Job

You can trap the TERM signal to checkpoint within 30 seconds and then resume your job from the checkpoint image. Note that this will require the processes to be restored onto an identical node and processor type. These can be specified as an allocation constraint.

For example:

$ sbatch --constraint='sl160&X5650' --partition=scavenge ...

where the original node type (sl160) and processor (X5650) are determined by inspecting the features of the first node that the job was run on, e.g.

$ squeue --user ttrojan
   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 2309896     quick       sh  ttrojan  R       0:25      2 hpc[0950,0951]

$ sinfo -N -o "%10N %45f %10G" | grep hpc0950 | uniq
hpc0950    myri,xeon,X5650,sl160 

Getting Help

For questions about the scavenge partition, please contact hpc@usc.edu.