NOTE: We are constantly sourcing new hardware to support our users. For now, we DO NOT provide any stability guarantee. Check your emails frequently for updates. We are sorry for strong words in some parts of our guide, but please bare with us.
By login to our cluster, you agree that you have fully read our guidelines and agree to our usage terms, including but not limited to our fairshare and queuing policy. Violating our Usage Guidelines with or without knowledge will lead to account suspension and/or disciplinary actions.
This repository serves as a knowledge base to help users get started in utilizing a GPU cluster.
The main use case of this cluster is to run your GPU training code, presumably written in Python and managed by conda. We DO NOT provide graphical access and provide only shell access via SSH. Execution of anything irrelevant to your study/research at NTU is considered an offense and can lead to disciplinary actions and/or more.
It is possible to use VSCode and PyCharm to access the cluster. Other possibilities exist, but we cannot cover all of them.
We expect all users to be highly familiarized with our Usage Guidelines and will act accordingly, including issuing warnings and/or account bans.
We also highly recommend going through the Cluster Overview as we have many customized functions that may be different from other clusters you might have used in the past.
Refer to other parts of our documentation as necessary.
To keep things manageable, we have split this guide into multiple files.
- Login to login node.
- Do simple setup on login node (create your Conda env and install packages)
- Request GPU node(s) to debug/run your code.
- I am super impatient. Quick Start
- I have used a HPC before. What's the tl;dr? Cluster Overview
- What are things that I should look out for? Usage Guidelines
- How do I access more storage? Storage Manager Usage
- What is a GPU cluster? How is it different from running codes locally on my laptop/desktop? Linux & Cluster Basics
- Having trouble logging in? Login Guide
- How do I run IDEs and debug? Debugging Guide
- How do I access GPU node(s)? Slurm Introduction
- How do I setup my environments? Setup Conda
- What GPUs do I have access to? Cluster Overview
- I am encountering an error. Troubleshooting Guide
Written with <3 by the EEE Cluster Admins.