SLURM Queuing System
Resources on getafix are controlled using the Slurm Workload Manager. In order to run a job on getafix you will need to request resources. This page will give you a list of commonly used commands for requesting resources.
This page is meant as a quick start guide, for more information you should consult the man (manual) pages for the various commands. If you are already familiar with other job scheduling tools including LSF, PBS/Torque or SGE, torque-slurm-sge or this (pdf) comparison between Slurm and other scheduling tools might be useful.
General commands
Slurm is controlled through a number of command line tools. Most tasks can be performed using the following tools:
salloc
– Allocate resources.sbatch
– Submit a batch job to the cluster.squeue
– View information about jobs in the queue.scancel
– Cancel or signal a Slurm job.sinfo
– View information about nodes/partitionsscontrol
– Advanced control and configurationsrun
– Run a job on allocated resources.
To view information about any of these commands, you can access the manual page by logging into getafix and running man <command>
.
Running Jobs
Slurm supports two types of jobs: interactive and batch jobs. Interactive jobs are useful when you need to continuously supply your program with input from a user, such as when running a interactive simulation in Matlab with the graphical interface. Jobs that can be run without any user input can be submitted as batch jobs. After submitting a request for an interactive session or a batch job your request will be placed in the queue. Once adequate resources become available your job will start running.
Different resources are grouped into different partitions. getafix has been setup so users will typically not need to specify a partition. If you need a specific node/hardware you can explicitly request specific hardware
with the --constraint
argument to salloc
, srun
and sbatch
.
You can also include or exclude specific nodes using the
--nodelist
and --excludes
arguments although
specifying a constraint is usually preferable.
If you really need to specify a specific partition, you can
provide a comma separated list using the --partition
argument.
To view partitions available to the current user use
sinfo
or to view all partitions use
sinfo --all
Interactive Jobs
Running jobs interactively is not the most efficient use of cluster resources or of your time if the cluster is busy. However if you need to, you can launch interactive jobs with srun. For a single job you can request the resources you need with the srun command:
srun --cpus-per-task=4 --mem=8G --pty bash
If you're running a series of jobs interactively it's more efficient to allocate the resources you need first with salloc
and then use srun to launch each job:
salloc --cpus-per-task=4 --mem=8G\\ srun --pty ...\\ srun --pty ...
will request an allocation with 4 CPUs and 8 GB of memory in total. You then use srun to run commands on the allocated resources. Note that to release the reserved resources back to the queue you need to
exit
Interactive Jobs - Graphical User Interface
To log onto the cluster to run a program via a graphical user interface you have two main choices:
1. For those with windows connect via the RDP (Remote Desktop Protocol) using the in-built windows program "Remote Desktop Connection"
(for those with MacOSX install the "Microsoft Remote Desktop" app, eg. via UQ's "Self Service" app, you do not need to buy the "Apple Remote Desktop" app from the App store)
and login to either getafix1.smp.uq.edu.au
or getafix2.smp.uq.edu.au
front-end nodes. You can then run the "Terminal" program.
2. For those with unix systems (eg. linux or MacOSX)- first need
to enable X11-forwarding. If you logged into getafix using ssh
you
can modify your login command by adding -X or -Y:
ssh -X user@getafix.smp.uq.edu.au
Either way, once logged in, start the interactive job as above adding arguments --x11 --pty
to the srun
command, eg.
srun --cpus-per-task=4 --mem=8G --x11 --pty matlab
Quitting the program will release your reserved resources back to the system. For more examples see Matlab or Mathematica.
NOTE: x11 forwarding on getafix requires a ssh public key. If you are authenticating by public key you just need to enable agent forwarding. Otherwise you'll need to generate a ssh key pair on the cluster and add the public key to your authorized_keys file. To do that run
ssh-keygen
Use the default options, and leave the password blank. Then add the key to your authorized_keys file, with
ssh-copy-id localhost
Batch Jobs
Jobs should mostly be submitted using the sbatch
command.
The command requires a script describing the job to be run, the
script can be provided on the command line or as a separate file.
Once the script is provided, the job is submitted to the queue until
resources become available.
As with salloc
, options can be provided on the
command line to specify the resources to request.
Alternatively, options can be provided inside the script using
lines preceded by #SBATCH
before any executable commands
in the script.
The remainder of this subsection contains examples of common tasks.
A sample slurm script is available at including options for MPI.
Hello World
The following example script starts a single task with 1 CPU that
writes Hello World
to STDERR
,
generates a series of random numbers, and
then writes the sorted random numbers to STDOUT
.
The script explicitly specifies the number of tasks as 1, the
number of CPUs per task, the memory per CPU and the execution time.
It is advisable to always specify the maximum memory and time your
program will use, under certain circumstances Slurm may run
jobs with smaller resource requirements sooner!
The script also explicitly specifies the output files for
STDOUT
and STDERR
as files with
the node name (%N
) and job allocation number
(%a
).
If you do not specify these arguments, the default is to redirect
STDERR
to STDOUT
and write them to a
file with slurm-%j.out
pattern.
@@#!/bin/bash #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=200 # memory (MB) #SBATCH --time=0-2:00 # time (D-HH:MM) #SBATCH -o slurm.%N.%j.out # STDOUT #SBATCH -e slurm.%N.%j.err # STDERR echo Hello World >&2 for i in {1..100000}; do echo $RANDOM >> SomeRandomNumbers.txt; done sort SomeRandomNumbers.txt @@
Assuming you saved the above script in a file named
myscript.sh
, you can submit your job with the command:
sbatch myscript.sh
Submit an array of tasks
Slurm supports submitting job arrays. Job arrays can be useful when
submitting multiple tasks that don't need to communicate with each other.
Job arrays can be created by using the --array
option
either on the command line or as an additional line in the
batch script (i.e. add #SBATCH --array=0-5
at
the start of the script).
It is preferable to use job arrays rather than submitting individual
jobs, primarily because it puts less strain on the system but
also for the convenience and additional control when submitting
jobs and the ability to refer to an array of jobs by a single job id.
The following is an example job script that runs R, a statistical tool,
on 30 separate transcript files named rscript1.r
,
rscript2.r
, rscript3.r
, etc..
The script includes the --job-name
command, that allows
you to specify a text name to refer
to jobs, the default value is the name of the batch script.
The output files have also been modified to include the job id
of the master (primary) job (%A
) and the job array
id (%a
).
Inside the script, the job id and array id can be refered to using
the environment variables SLURM_ARRAY_JOB_ID
and SLURM_ARRAY_TASK_ID
.
@@#!/bin/bash #SBATCH --job-name=Rarray #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=4G # memory (MB) #SBATCH --time=0-2:00 # time (D-HH:MM) #SBATCH -o Rarray_%A_%a.out # Standard output #SBATCH -e Rarray_%A_%a.err # Standard error module load R module load r-modules R CMD BATCH --quiet --no-restore --no-save \ rscript"${SLURM_ARRAY_TASK_ID}".r rscript"${SLURM_ARRAY_TASK_ID}".out @@
This job can then be launched by running
sbatch --array=1-30 Rarray.sh
If you latter created an additional 5 files and wanted to rerun
the last 5 files and the new 5 files you could then run
sbatch --array=26-35 Rarray.sh
If you have a lot of work to run (hundreds of array tasks),
it is often a good idea to limit the number of simultaneously running
tasks, you can do this using
sbatch --array=1-1000%100 Rarray.sh
which will limit Slurm to only run 100 tasks at once.
More information about job arrays can be found here.
Run an MPI task across multiple nodes
This example runs an MPI? job on 3 nodes with 10 CPUs per node. Each node can have multiple sockets, so we specify we only want 1 socket per node. We also request 2 GB of memory per node.
@@#!/bin/bash #SBATCH --ntasks=1 #SBATCH --nodes=3 #SBATCH --sockets-per-node=1 #SBATCH --cores-per-socket=10 #SBATCH --mem=2G #SBATCH --time=0-2:00 mpirun ./program @@
This program uses mpirun
to setup interprocess
communication.
If you setup your own interprocess communication (using ssh
for example), you should use srun
from within your
batch script to launch tasks within your job.
Run a OpenMP task on a single node
This example requests a single node with 10 cpus to execute a single task. Instead of specifying the amount of memory per CPU, this example specifies the amount of memory required per node. For OpenMP to know how many threads are available we need to set the appropriate environment variable.
The following is a minimal C++ program to demonstrate the functionality.
@@#include <iostream> #include <omp.h> int main(int argc, char** argv) { #pragma omp parallel for ordered for (int i = 0; i < omp_get_num_threads(); ++i) { #pragma omp ordered std::cout << i << std::endl; } return 0; } @@
Assuming this source code is saved in test.cpp
,
it can be compiled with
@@module load gnu g++ test.cpp -fopenmp -o my_omp_program.exe @@
Environment variables are not copied from your login shell by default,
instead you will need to load any modules and define any variables your
program needs inside your batch script.
If you need to load/source a particular variable/script you can
use the -l
flag to bash or the --export
flag to
sbatch
.
@@#!/bin/bash #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --cpus-per-task=10 #SBATCH --mem=2G #SBATCH --time=0-2:00 module load gnu export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} ./my_omp_program.exe @@
Run multiple OpenMP tasks on multiple nodes
Jobs that involve a single task that occupies all resources can be
invoked in the same way as a task on the command line.
However, more complex jobs, jobs with multiple parts (job steps), and jobs
with multiple tasks need to use srun
to launch
each step of the job.
srun
takes a similar set of arguments to sbatch
describing the resources to use for each job step.
The following is an example of the previous OpenMP program being
run on two different nodes.
@@#!/bin/bash #SBATCH --ntasks=2 #SBATCH --ntasks-per-node=1 #SBATCH --nodes=2 #SBATCH --cpus-per-task=10 #SBATCH --mem=2G #SBATCH --time=0-2:00 module load gnu export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun ./my_omp_program.exe @@
Nodes, tasks and cpus
If you're running multi-threaded jobs, multi-process jobs or a mixture of those you need to specify precisely the CPU resources required.
A task is a process so a multi-process job involves more than one task. By contrast, a multi-threaded job involves only a single task using more than one CPU. Tasks are requested with the flag --ntasks
whereas CPUs are requested with --cpus-per-task
. If it doesn't matter how your job is spread across nodes, those paramters are sufficient but if you need to control the allocation further you can use --nodes
to specify how many nodes you want for your job and --ntasks-per-node
to specify how many tasks should run on each node.
For more information see the
slurm guide or the man page:
man sbatch
Information on jobs
List all current jobs for a user:
squeue -u <username>
List all running jobs for a user:
squeue -u <username> -t RUNNING
List all jobs with the given name:
squeue -n <name>
List all pending jobs for a user:
squeue -u <username> -t PENDING
List priority order of jobs for the current user (you)
in a given partition:
showq -o -u -q <partition>
List all current jobs in the smp partition for a user:
squeue -u <username> -p smp
List detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd <jobid>
List status info for a currently running job:
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
Once your job has completed, you can get additional information
that was not available during the run. This includes run time,
memory used, etc.
To get statistics on completed jobs by jobID:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed
To view the same information for all jobs of a user:
sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed
Controlling jobs
To cancel one job:
scancel <jobid>
To cancel jobs with a certain job name (not unique):
scancel -n <jobname>
To cancel all the jobs for a user:
scancel -u <username>
To cancel all the pending jobs for a user:
scancel -t PENDING -u <username>
To cancel one or more jobs by name:
scancel --name myJobName
To pause a particular job:
scontrol hold <jobid>
To resume a particular job:
scontrol resume <jobid>
To requeue (cancel and rerun) a particular job:
scontrol requeue <jobid>
A whole job array can be cancelled by cancelling the job name
or job id of the master allocation. If you only want to cancel part of a job array you can specify the job id and task id:
scancel <jobid>_<index>
Advanced (but useful!) commands
The following commands work for individual jobs and for job arrays, and allow easy manipulation of large numbers of jobs. You can combine these commands with the parameters shown above to provide great flexibility and precision in job control. (Note that all of these commands are entered on one line)
'''Use caution when suspending and resuming jobs. Read the
manual first, the behaviour might not be what you expect.'''
Suspend all running jobs for a user (takes into account job arrays):squeue -ho %A -t R | xargs -n 1 scontrol suspend
Resume all suspended jobs for a user:squeue -o "%.18A %.18t" -u <username> | awk '{if ($2 =="S"){print $1}}' | xargs -n 1 scontrol resume
After resuming, check if any are still suspended:squeue -ho %A -u $USER -t S | wc -l