Howto - Run GPU Calculations (inc. <strike>Tensorflow</strike> Keras)
This is not a whyto/howto parallelisation using GPUs. There are many reasons to do this. Instead, just a quick guide is given here to running GPU calculations on <FONT FACE=Courier>dogmatix</FONT>.
Specs and status of the <FONT FACE=Courier>getafix</FONT> GPUs
There are 9 Nvidia Tesla V100's available via the gpu partition on 3 nodes, ie. each with 3 V100 GPUs.
Additionally, the getafix nodes smp-6-4, smp-6-5 and smp-6-6 each have 2 x Tesla K20m GPUs in the smp partition. Each GPU has 5GB memory.
If you want to use the new GPUs select partition gpu in your submission script:
@@
#SBATCH --partition gpu
@@
If you don't care about the type of GPU you can select either partition with: @@
#SBATCH --partition smp,gpu
@@
Specs and status of the <FONT FACE=Courier>dogmatix</FONT> GPUs
See the asterix? and ghost? information for details about the GPUs in those partitions that are available via dogmatix.
CUDA Compiler
The CUDA compiler is nvcc. You can compile CUDA code on the
login node (or on any of the compute nodes with GPUs from an interactive login).
The default CUDA version on dogmatix is 7.5 and CUDA jobs will need to be compiled to use that version, however earlier versions are also available for compatibility with older code from asterix and ghost. You'll need to load the appropriate cuda module first:
@@
module load cuda
module load cuda-3
module load cuda-4
module load cuda-6
@@
nvcc
has many compiler switches available: run nvcc -h
or
see NVIDIA's documentation.
Submitting jobs
Specify the number of GPUs required for your job with:
#SBATCH --gres=gpu:n
Example submission script:
#!/bin/bash
#
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --mem=1G
#SBATCH --time=05:00:00
module load cuda
srun cudademo
!!!! Using <strike>Tensorflow</strike> Keras on a GPU
The GPUs on getafix can be used to train neural networks. One interface for training networks is TensorFlow however it can be a bit difficult to work with directly. An easier interface is Keras which can run on-top of TensorFlow either on a GPU or a CPU.
To get started with Keras on getafix, you will need Python and
Keras installed. The easiest way is to use anaconda. Keras and
Tensorflow have been setup in the anaconda2 module. To use Keras,
simply create a anaconda environment
@@
conda create -n tensorflow_gpuenv tensorflow-gpu cudatoolkit=9.2 keras=2.2.2
@@
Then in your batch script you will need to load cuda and
the anaconda environment you just created
@@
- !/bin/bash
#SBATCH --ntasks=1 # Run 1 task
#SBATCH --nodes=1 # Run the task on a single node
#SBATCH --gres=gpu:2 # Request both GPUs
#SBATCH --cpus-per-task=2 # Request 2 CPUs
#SBATCH --mem=10g
#SBATCH --time=0-01:00 # time (D-HH:MM)
module load compilers/cuda/9.2
module load compilers/anaconda2/5.2.0
. /opt/modules/Anaconda2/5.2.0/etc/profile.d/conda.sh
conda activate tensorflow_gpuenv
python your_job.py
@@
!!!! GPU Cluster Help
See the various webpages linked within this document, and many CUDA resources and tutorials are available on the internet. In particular, the GPU Technology Conference talks are available for streaming online, and these include tutorials for people new to CUDA and more advanced seminars. See http://www.gputechconf.com/gtcnew/on-demand-gtc.php.