NTU EEE Cluster 02 Guide

NOTE: We are still in beta testing. We DO NOT provide any stability guarantee. Check your emails frequently for updates.

What is this?

This repository serves as a knowledge base to help users get started in utilizing a GPU cluster.

The main use case of this cluster is to run your GPU training code, presumably written in Python and managed by conda. We DO NOT provide graphical access and provide only shell access via SSH.

It is possible to use VSCode and PyCharm to access the cluster. Other possibilities exist, but we cannot cover all of them, so please adapt instructions as necessary.

What is the bare minimum that I need to know?

We expect all users to be highly familiarized with our Usage Guidelines and will act accordingly, including issuing warnings and/or account bans.

We also highly recommend going through the Cluster Overview as we have many customized functions that may be different from other HPCs you have used in the past.

Refer to other parts of our documentation as necessary.

List of Guides

To keep things manageable, we have split this guide into multiple files.

Supported Workflow

Login to login node.
Do simple setup on login node (create your Conda env and install packages)
Request GPU node(s) to debug/run your code.

All Guides

What is a GPU cluster? How is it different from running codes locally on my laptop/desktop? Linux & Cluster Basics
I have used a HPC before. What's the tl;dr? Cluster Overview
What are things that I should look out for? Usage Guidelines
Having trouble logging in? Login Guide
How do I access GPU node(s)? Slurm Introduction
- What GPUs do I have access to? Cluster Overview
How to manage my storage space? Storage Manager Usage
How to setup the environments? Setup Conda
How to run IDEs and debug? Debugging Guide
I am encountering an error. Troubleshooting Guide

Written with <3 by the EEE Cluster Admins.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md
basics.md		basics.md
cluster.md		cluster.md
conda.md		conda.md
debugging.md		debugging.md
guideline.md		guideline.md
login.md		login.md
sbatch-example.sh		sbatch-example.sh
slurm.md		slurm.md
storaged.md		storaged.md
troubleshooting.md		troubleshooting.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NTU EEE Cluster 02 Guide

What is this?

What is the bare minimum that I need to know?

List of Guides

Supported Workflow

All Guides

About

Uh oh!

Releases

Packages

Languages

wangxingxing/docs

Folders and files

Latest commit

History

Repository files navigation

NTU EEE Cluster 02 Guide

What is this?

What is the bare minimum that I need to know?

List of Guides

Supported Workflow

All Guides

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages