GPU

Howto - Run GPU Calculations (inc. <strike>Tensorflow</strike> Keras)

This is not a whyto/howto parallelisation using GPUs. There are many reasons to do this. Instead, just a quick guide is given here to running GPU calculations on <FONT FACE=Courier>dogmatix</FONT>.

  Specs and status of the <FONT FACE=Courier>getafix</FONT> GPUs

There are 9 Nvidia Tesla V100's available via the gpu partition on 3 nodes, ie. each with 3 V100 GPUs.

Additionally, the getafix nodes smp-6-4, smp-6-5 and smp-6-6 each have 2 x Tesla K20m GPUs in the smp partition. Each GPU has 5GB memory.

If you want to use the new GPUs select partition gpu in your submission script:

@@

   #SBATCH --partition gpu

@@

If you don't care about the type of GPU you can select either partition with: @@

   #SBATCH --partition smp,gpu

@@

  Specs and status of the <FONT FACE=Courier>dogmatix</FONT> GPUs

See the asterix? and ghost? information for details about the GPUs in those partitions that are available via dogmatix.

CUDA Compiler

  The CUDA compiler is nvcc.  You can compile CUDA code on the 

login node (or on any of the compute nodes with GPUs from an interactive login).

The default CUDA version on dogmatix is 7.5 and CUDA jobs will need to be compiled to use that version, however earlier versions are also available for compatibility with older code from asterix and ghost. You'll need to load the appropriate cuda module first: @@ module load cuda
module load cuda-3
module load cuda-4
module load cuda-6
@@

nvcc has many compiler switches available: run nvcc -h or

  see NVIDIA's documentation.

Submitting jobs

Specify the number of GPUs required for your job with: #SBATCH --gres=gpu:n

Example submission script:

#!/bin/bash
#
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --mem=1G
#SBATCH --time=05:00:00
module load cuda
srun cudademo

!!!! Using <strike>Tensorflow</strike> Keras on a GPU

The GPUs on getafix can be used to train neural networks. One interface for training networks is TensorFlow however it can be a bit difficult to work with directly. An easier interface is Keras which can run on-top of TensorFlow either on a GPU or a CPU.

To get started with Keras on getafix, you will need Python and Keras installed. The easiest way is to use anaconda. Keras and Tensorflow have been setup in the anaconda2 module. To use Keras, simply create a anaconda environment @@ conda create -n tensorflow_gpuenv tensorflow-gpu cudatoolkit=9.2 keras=2.2.2
@@
Then in your batch script you will need to load cuda and the anaconda environment you just created @@

  1. !/bin/bash
    #SBATCH --ntasks=1 # Run 1 task
    #SBATCH --nodes=1 # Run the task on a single node
    #SBATCH --gres=gpu:2 # Request both GPUs
    #SBATCH --cpus-per-task=2 # Request 2 CPUs
    #SBATCH --mem=10g
    #SBATCH --time=0-01:00 # time (D-HH:MM)

    module load compilers/cuda/9.2
    module load compilers/anaconda2/5.2.0
    . /opt/modules/Anaconda2/5.2.0/etc/profile.d/conda.sh
    conda activate tensorflow_gpuenv

    python your_job.py
    @@
    !!!! GPU Cluster Help

See the various webpages linked within this document, and many CUDA resources and tutorials are available on the internet. In particular, the GPU Technology Conference talks are available for streaming online, and these include tutorials for people new to CUDA and more advanced seminars. See http://www.gputechconf.com/gtcnew/on-demand-gtc.php.

Page last modified on January 17, 2021, at 08:17 PM
Powered by PmWiki