CCF (Core Computational Facility) @ UQ run by ITS / SMP
[Home] [why HPC?] [UQ Resources] [getafix] [dogmatix] [asterix] [ghost] [contacts/help]
howto: [data] [Slurm] [OpenMP/MKL] [Nvidia GPU] [Intel Phi] [MPI] [matlab] [mathematica] [FAQ]
Howto - Run GPU Calculations (inc. Tensorflow Keras)
This is not a whyto/howto parallelisation using GPUs. There are many reasons to do this.
Instead, just a quick guide is given here to running GPU calculations on dogmatix.
Specs and status of the getafix GPUs
There are 9 Nvidia Tesla V100's available via the gpu partition on 3 nodes, ie. each with 3 V100 GPUs.
Additionally, the getafix nodes smp-6-4, smp-6-5 and smp-6-6 each have 2 x Tesla K20m GPUs in the smp partition. Each GPU has 5GB memory.
If you want to use the new GPUs select partition gpu in your submission script:
#SBATCH --partition gpu
If you don't care about the type of GPU you can select either partition with:
#SBATCH --partition smp,gpu
Specs and status of the dogmatix GPUs
See the asterix and ghost information for details about the GPUs in those partitions that are available via dogmatix.
CUDA Compiler
The CUDA compiler is nvcc. You can compile CUDA code on the login node (or on any of the compute nodes with GPUs from an interactive login).The default CUDA version on dogmatix is 7.5 and CUDA jobs will need to be compiled to use that version, however earlier versions are also available for compatibility with older code from asterix and ghost. You'll need to load the appropriate cuda module first:
module load cuda
module load cuda-3
module load cuda-4
module load cuda-6
nvcc
has many compiler switches available: run nvcc -h
or
see NVIDIA's documentation.
Submitting jobs
Specify the number of GPUs required for your job with:
#SBATCH --gres=gpu:n
Example submission script:
#!/bin/bash
#
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --mem=1G
#SBATCH --time=05:00:00
module load cuda
srun cudademo
Using Tensorflow Keras on a GPU
The GPUs on getafix can be used to train neural networks. One interface for training networks is TensorFlow however it can be a bit difficult to work with directly. An easier interface is Keras which can run on-top of TensorFlow either on a GPU or a CPU.
To get started with Keras on getafix, you will need Python and Keras installed. The easiest way is to use anaconda. Keras and Tensorflow have been setup in the anaconda2 module. To use Keras, simply create a anaconda environment
conda create -n tensorflow_gpuenv tensorflow-gpu cudatoolkit=9.2 keras=2.2.2
Then in your batch script you will need to load cuda and the anaconda environment you just created
#!/bin/bash
#SBATCH --ntasks=1 # Run 1 task
#SBATCH --nodes=1 # Run the task on a single node
#SBATCH --gres=gpu:2 # Request both GPUs
#SBATCH --cpus-per-task=2 # Request 2 CPUs
#SBATCH --mem=10g
#SBATCH --time=0-01:00 # time (D-HH:MM)
module load compilers/cuda/9.2
module load compilers/anaconda2/5.2.0
. /opt/modules/Anaconda2/5.2.0/etc/profile.d/conda.sh
conda activate tensorflow_gpuenv
python your_job.py
GPU Cluster Help
See the various webpages linked within this document, and many CUDA resources and tutorials are available on the internet. In particular, the GPU Technology Conference talks are available for streaming online, and these include tutorials for people new to CUDA and more advanced seminars. See http://www.gputechconf.com/gtcnew/on-demand-gtc.php.
For specific dogmatix cluster administrative help (for example new accounts, quotas, etc) contact ITS. For all other help email the list ( smp-hpc@lists.science.uq.edu.au )
This page last updated 13th March 2019 by Isaac Lenton. [Contacts/help]