NOTE: We are still in beta testing. We DO NOT provide any stability guarantee. Check your emails frequently for updates.
This repository serves as a knowledge base to help users get started in utilizing a GPU cluster.
The main use case of this cluster is to run your GPU training code, presumably written in Python and managed by conda. We DO NOT provide graphical access and provide only shell access via SSH.
It is possible to use VSCode and PyCharm to access the cluster. Other possibilities exist, but we cannot cover all of them, so please adapt instructions as necessary.
We expect all users to be highly familiarized with our Usage Guidelines and will act accordingly, including issuing warnings and/or account bans.
We also highly recommend going through the Cluster Overview as we have many customized functions that may be different from other HPCs you have used in the past.
Refer to other parts of our documentation as necessary.
To keep things manageable, we have split this guide into multiple files.
- Login to login node.
- Do simple setup on login node (create your Conda env and install packages)
- Request GPU node(s) to debug/run your code.
- What is a GPU cluster? How is it different from running codes locally on my laptop/desktop? Linux & Cluster Basics
- I have used a HPC before. What's the tl;dr? Cluster Overview
- What are things that I should look out for? Usage Guidelines
- Having trouble logging in? Login Guide
- How do I access GPU node(s)? Slurm Introduction
- What GPUs do I have access to? Cluster Overview
- How to manage my storage space? Storage Manager Usage
- How to setup the environments? Setup Conda
- How to run IDEs and debug? Debugging Guide
- I am encountering an error. Troubleshooting Guide
Written with <3 by the EEE Cluster Admins.