Slurm

SLURM Queuing System

Resources on getafix are controlled using the Slurm Workload Manager. In order to run a job on getafix you will need to request resources. This page will give you a list of commonly used commands for requesting resources.

This page is meant as a quick start guide, for more information you should consult the man (manual) pages for the various commands. If you are already familiar with other job scheduling tools including LSF, PBS/Torque or SGE, torque-slurm-sge or this (pdf) comparison between Slurm and other scheduling tools might be useful.

General commands

Slurm is controlled through a number of command line tools. Most tasks can be performed using the following tools:

salloc – Allocate resources.
sbatch – Submit a batch job to the cluster.
squeue – View information about jobs in the queue.
scancel – Cancel or signal a Slurm job.
sinfo – View information about nodes/partitions
scontrol – Advanced control and configuration
srun – Run a job on allocated resources.

To view information about any of these commands, you can access the manual page by logging into getafix and running man <command>.

Running Jobs

Slurm supports two types of jobs: interactive and batch jobs. Interactive jobs are useful when you need to continuously supply your program with input from a user, such as when running a interactive simulation in Matlab with the graphical interface. Jobs that can be run without any user input can be submitted as batch jobs. After submitting a request for an interactive session or a batch job your request will be placed in the queue. Once adequate resources become available your job will start running.

Different resources are grouped into different partitions. getafix has been setup so users will typically not need to specify a partition. If you need a specific node/hardware you can explicitly request specific hardware with the --constraint argument to salloc, srun and sbatch. You can also include or exclude specific nodes using the --nodelist and --excludes arguments although specifying a constraint is usually preferable.

If you really need to specify a specific partition, you can provide a comma separated list using the --partition argument. To view partitions available to the current user use

sinfo

or to view all partitions use

sinfo --all

Interactive Jobs

Running jobs interactively is not the most efficient use of cluster resources or of your time if the cluster is busy. However if you need to, you can launch interactive jobs with srun. For a single job you can request the resources you need with the srun command:

srun --cpus-per-task=4 --mem=8G --pty bash

If you're running a series of jobs interactively it's more efficient to allocate the resources you need first with salloc and then use srun to launch each job:

salloc --cpus-per-task=4 --mem=8G\\
srun --pty ...\\
srun --pty ...

will request an allocation with 4 CPUs and 8 GB of memory in total. You then use srun to run commands on the allocated resources. Note that to release the reserved resources back to the queue you need to

exit

Interactive Jobs - Graphical User Interface

To log onto the cluster to run a program via a graphical user interface you have two main choices:

1. For those with windows connect via the RDP (Remote Desktop Protocol) using the in-built windows program "Remote Desktop Connection" (for those with MacOSX install the "Microsoft Remote Desktop" app, eg. via UQ's "Self Service" app, you do not need to buy the "Apple Remote Desktop" app from the App store) and login to either getafix1.smp.uq.edu.au or getafix2.smp.uq.edu.au front-end nodes. You can then run the "Terminal" program.

2. For those with unix systems (eg. linux or MacOSX)- first need to enable X11-forwarding. If you logged into getafix using ssh you can modify your login command by adding -X or -Y:

ssh -X user@getafix.smp.uq.edu.au

Either way, once logged in, start the interactive job as above adding arguments --x11 --pty to the srun command, eg.

srun --cpus-per-task=4 --mem=8G --x11 --pty matlab

Quitting the program will release your reserved resources back to the system. For more examples see Matlab or Mathematica.

NOTE: x11 forwarding on getafix requires a ssh public key. If you are authenticating by public key you just need to enable agent forwarding. Otherwise you'll need to generate a ssh key pair on the cluster and add the public key to your authorized_keys file. To do that run

ssh-keygen

Use the default options, and leave the password blank. Then add the key to your authorized_keys file, with

ssh-copy-id localhost

Batch Jobs

Jobs should mostly be submitted using the sbatch command. The command requires a script describing the job to be run, the script can be provided on the command line or as a separate file. Once the script is provided, the job is submitted to the queue until resources become available. As with salloc, options can be provided on the command line to specify the resources to request. Alternatively, options can be provided inside the script using lines preceded by #SBATCH before any executable commands in the script. The remainder of this subsection contains examples of common tasks.

A sample slurm script is available at including options for MPI.

Hello World

The following example script starts a single task with 1 CPU that writes Hello World to STDERR, generates a series of random numbers, and then writes the sorted random numbers to STDOUT.

The script explicitly specifies the number of tasks as 1, the number of CPUs per task, the memory per CPU and the execution time. It is advisable to always specify the maximum memory and time your program will use, under certain circumstances Slurm may run jobs with smaller resource requirements sooner! The script also explicitly specifies the output files for STDOUT and STDERR as files with the node name (%N) and job allocation number (%a). If you do not specify these arguments, the default is to redirect STDERR to STDOUT and write them to a file with slurm-%j.out pattern.

@@#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=200   # memory (MB)
#SBATCH --time=0-2:00       # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out  # STDOUT
#SBATCH -e slurm.%N.%j.err  # STDERR

echo Hello World >&2

for i in {1..100000}; do echo $RANDOM &gt;&gt; SomeRandomNumbers.txt; done

sort SomeRandomNumbers.txt
@@

Assuming you saved the above script in a file named myscript.sh, you can submit your job with the command: sbatch myscript.sh

Submit an array of tasks

Slurm supports submitting job arrays. Job arrays can be useful when submitting multiple tasks that don't need to communicate with each other. Job arrays can be created by using the --array option either on the command line or as an additional line in the batch script (i.e. add #SBATCH --array=0-5 at the start of the script). It is preferable to use job arrays rather than submitting individual jobs, primarily because it puts less strain on the system but also for the convenience and additional control when submitting jobs and the ability to refer to an array of jobs by a single job id.

The following is an example job script that runs R, a statistical tool, on 30 separate transcript files named rscript1.r, rscript2.r, rscript3.r, etc.. The script includes the --job-name command, that allows you to specify a text name to refer to jobs, the default value is the name of the batch script. The output files have also been modified to include the job id of the master (primary) job (%A) and the job array id (%a). Inside the script, the job id and array id can be refered to using the environment variables SLURM_ARRAY_JOB_ID and SLURM_ARRAY_TASK_ID.

@@#!/bin/bash
#SBATCH --job-name=Rarray
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G  # memory (MB)
#SBATCH --time=0-2:00       # time (D-HH:MM)
#SBATCH -o Rarray_%A_%a.out # Standard output
#SBATCH -e Rarray_%A_%a.err # Standard error

module load R
module load r-modules

R CMD BATCH --quiet --no-restore --no-save \
    rscript"${SLURM_ARRAY_TASK_ID}".r rscript"${SLURM_ARRAY_TASK_ID}".out
@@

This job can then be launched by running sbatch --array=1-30 Rarray.sh

If you latter created an additional 5 files and wanted to rerun the last 5 files and the new 5 files you could then run sbatch --array=26-35 Rarray.sh

If you have a lot of work to run (hundreds of array tasks), it is often a good idea to limit the number of simultaneously running tasks, you can do this using sbatch --array=1-1000%100 Rarray.sh

which will limit Slurm to only run 100 tasks at once.

More information about job arrays can be found here.

Run an MPI task across multiple nodes

This example runs an MPI ? job on 3 nodes with 10 CPUs per node. Each node can have multiple sockets, so we specify we only want 1 socket per node. We also request 2 GB of memory per node.

@@#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --nodes=3
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=10
#SBATCH --mem=2G
#SBATCH --time=0-2:00

mpirun ./program
@@

This program uses mpirun to setup interprocess communication. If you setup your own interprocess communication (using ssh for example), you should use srun from within your batch script to launch tasks within your job.

Run a OpenMP task on a single node

This example requests a single node with 10 cpus to execute a single task. Instead of specifying the amount of memory per CPU, this example specifies the amount of memory required per node. For OpenMP to know how many threads are available we need to set the appropriate environment variable.

The following is a minimal C++ program to demonstrate the functionality.

@@#include &lt;iostream&gt;
#include &lt;omp.h&gt;

int main(int argc, char** argv)
{
  #pragma omp parallel for ordered
  for (int i = 0; i &lt; omp_get_num_threads(); ++i)
  {
    #pragma omp ordered
    std::cout &lt;&lt; i &lt;&lt; std::endl;
  }

  return 0;
}
@@

Assuming this source code is saved in test.cpp, it can be compiled with

@@module load gnu
g++ test.cpp -fopenmp -o my_omp_program.exe
@@

Environment variables are not copied from your login shell by default, instead you will need to load any modules and define any variables your program needs inside your batch script. If you need to load/source a particular variable/script you can use the -l flag to bash or the --export flag to sbatch.

@@#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=2G
#SBATCH --time=0-2:00

module load gnu

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
./my_omp_program.exe
@@

Run multiple OpenMP tasks on multiple nodes

Jobs that involve a single task that occupies all resources can be invoked in the same way as a task on the command line. However, more complex jobs, jobs with multiple parts (job steps), and jobs with multiple tasks need to use srun to launch each step of the job. srun takes a similar set of arguments to sbatch describing the resources to use for each job step. The following is an example of the previous OpenMP program being run on two different nodes.

@@#!/bin/bash
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=2
#SBATCH --cpus-per-task=10
#SBATCH --mem=2G
#SBATCH --time=0-2:00

module load gnu

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
srun ./my_omp_program.exe
@@

Nodes, tasks and cpus

If you're running multi-threaded jobs, multi-process jobs or a mixture of those you need to specify precisely the CPU resources required.

A task is a process so a multi-process job involves more than one task. By contrast, a multi-threaded job involves only a single task using more than one CPU. Tasks are requested with the flag --ntasks whereas CPUs are requested with --cpus-per-task. If it doesn't matter how your job is spread across nodes, those paramters are sufficient but if you need to control the allocation further you can use --nodes to specify how many nodes you want for your job and --ntasks-per-node to specify how many tasks should run on each node. For more information see the slurm guide or the man page: man sbatch

Information on jobs

List all current jobs for a user: squeue -u <username>

List all running jobs for a user: squeue -u <username> -t RUNNING

List all jobs with the given name: squeue -n <name>

List all pending jobs for a user: squeue -u <username> -t PENDING

List priority order of jobs for the current user (you)

  in a given partition:

showq -o -u -q <partition>

List all current jobs in the smp partition for a user: squeue -u <username> -p smp

List detailed information for a job (useful for troubleshooting): scontrol show jobid -dd <jobid>

List status info for a currently running job: sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.

To get statistics on completed jobs by jobID: sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

To view the same information for all jobs of a user: sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

Controlling jobs

To cancel one job: scancel <jobid>

To cancel jobs with a certain job name (not unique): scancel -n <jobname>

To cancel all the jobs for a user: scancel -u <username>

To cancel all the pending jobs for a user: scancel -t PENDING -u <username>

To cancel one or more jobs by name: scancel --name myJobName

To pause a particular job: scontrol hold <jobid>

To resume a particular job: scontrol resume <jobid>

To requeue (cancel and rerun) a particular job: scontrol requeue <jobid>

A whole job array can be cancelled by cancelling the job name

    or job id of the master allocation.  If you only want to
    cancel part of a job array you can specify the job id and task id:

scancel <jobid>_<index>

Advanced (but useful!) commands

The following commands work for individual jobs and for job arrays, and allow easy manipulation of large numbers of jobs. You can combine these commands with the parameters shown above to provide great flexibility and precision in job control. (Note that all of these commands are entered on one line)

'''Use caution when suspending and resuming jobs. Read the

  manual first, the behaviour might not be what you expect.'''

Suspend all running jobs for a user (takes into account job arrays):
squeue -ho %A -t R | xargs -n 1 scontrol suspend
Resume all suspended jobs for a user:
squeue -o "%.18A %.18t" -u <username> | awk '{if ($2 =="S"){print $1}}' | xargs -n 1 scontrol resume
After resuming, check if any are still suspended:
squeue -ho %A -u $USER -t S | wc -l

Edit | History | Recent Changes (all) | Search

Page last modified on May 05, 2021, at 08:21 PM