CCF (Core Computational Facility) @ UQ run by ITS / SMP
[Home] [why HPC?] [UQ Resources] [getafix] [dogmatix] [asterix] [ghost] [contacts/help]
howto: [data] [Slurm] [OpenMP/MKL] [Nvidia GPU] [Intel Phi] [MPI] [matlab] [mathematica] [FAQ]
SMP dogmatix cluster
The SMP dogmatix cluster is accessed through the login node
dogmatix.smp.uq.edu.au
. The cluster merges the former
obelix, asterix and ghost clusters with new hardware purchased in 2016
by SMP as a means to continue running, consolidate and extend existing computational resources.
- Dell FX2 blade chassis with FC430 compute nodes and FD332 disk nodes.
- Master node has been virtualised with VMware and is running the SLURM queuing system. An additonal VM running as a login node has been installed to allow better utilisation of the existing resources.
- In total 1372 CPU cores on 84 compute nodes with 13376GB memory.
- Total of 68 x Nvidia GPUs
- Total of 10 x Intel Phi Mics
- Located in Prentice DC1 and Parnell chill-out room 7-127.
System status
You can see the CPU loads on the cluster via the webpage
http://faculty-cluster.soe.uq.edu.au/ganglia/
This page can only be accessed from a computer within the UQ domain
(or after starting a UQ VPN session)
Software
- Running Rocks 6.2 (Sidewinder)
- The usual GNU-based free software is installed.
- The default gnu compilers are version 4.4.7 but versions 4.9.4, 5.4.0 and 6.3.0 are available as modules.
- Intel Composer XE 2016 is installed (compiler v16.0.2 / MKL) (see https://software.intel.com/en-us/intel-parallel-studio-xe for comparison of Intel products). Due to the cost/benefit we are currently not paying the extra to license the Parallel edition for the Intel MPI library however intel compatible openmpi and mpich modules are installed.
- The CUDA compiler v7.5 is available as a module (
module load cuda
) with cuda-3, cuda-4 and cuda-6 modules also available for backward compatibility with existing code. - matlab is R2017a (9.2.0.538062) 64-bit (glnxa64)
- mathematica is v11.0.1
- anything else worth mentioning?
System documentation
dogmatix system documentation is found below, but see also the various howto pages starting with the SLURM queue.
Connecting to dogmatix
The login host of the cluster is dogmatix.smp.uq.edu.au (alt. smp-login-0.smp.uq.edu.au ). You should be able to log in to this via ssh with your UQ login name and password, once you have contacted ITS to get an account. If you're off campus, you can either (a) ssh in on port 2022, or (b) run a UQ VPN.
Slave node types
There are 91 compute/slave nodes in total in the dogmatix cluster. These are divided into 3 partitions:
- smp
- the default partition including the new nodes and the former obelix cluster nodes.
- asterix
- all the nodes from the former asterix cluster useful primarily for GPU jobs
- ghost
- nodes from the old ghost cluster used exclusively for GPU jobs and restricted to staff and students in astrophysics
Obelix had separate queues for different hardware but this isn't needed on dogmatix. If you want your job to run on a high memory node, specify the amount of memory you need with: --mem=memory
. If you want your job to run on a specific type of hardware specify the hardware type as a constraint.
Recommendations
All jobs should be submitted through the SLURM queueing system to ensure optimal use of resources.
Always specify a maximum run time (--time=[days-]hh:mm:ss
). The default run time is unlimited to stop jobs being killed prematurely but the scheduler will favour jobs with a shorter maximum run time.
Always specify the memory you need (--mem=memory
). The default memory allocation is small to ensure optimal use of resources and your job will stall or fail if it requires more memory than requested.
If you specify an appropriate memory limit, your job will likely run sooner and it keeps the large memory nodes free for jobs that really need large memory.
If you want your job to run on a specific type of hardware, specify the hardware to run on with: --constraint=hardware
.
Storage
It is important to remember that the files stored on the dogmatix cluster are NOT BACKED UP! This means that you need to keep a backup copy of any important data that you have on the cluster.
You have space to store your files in:
-
/home/uqname/
- your home directory on the 2TB of storage attached to the master node. The default quota for the home directory is 40GB (soft) and 50GB (hard). Do not run calculations from this directory (always use/data/uqname/
). -
/data/uqname/
- your data directory on the 16TB of storage attached to the master node. The default quota for the home directory is 400GB (soft) and 500GB (hard). (purchased with obelix I.) -
/data1/uqname/
- your data directory on the 15TB of storage attached to the master node. (older ACQAOkanuka
disks.) -
/data2/uqname/
- your data directory on the 6.5TB of storage attached to the master node. (purchased with obelix I.) -
/data3/uqname/
,/data4/uqname
,/data5/uqname
,/data6/uqname
- large 4 x 9.5TB data archives are attached to dedicated file servers. Best to use these if performing heavy I/O disk operations.
Note that to get access to /data1/
, /data2/
, /data3/
,
/data4/
, /data5/
, /data6/
, you need to specifically
request it from ITS and provide a brief justification.
Obelix and Asterix data
You can find your obelix and asterix files in these partitions on dogmatix:- /data/username
- the same as on obelix
- /data[2-6]/username
- the same as on obelix
- /obelix-home/username
- your home directory from obelix (read only)
- /asterix-home/username
- your home directory from asterix (read only)
- /asterix-data/username
- your data directory from asterix (read only)
cp /obelix-home/username/path path
This page last updated 13th March 2019. [Contacts/help]