GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring
-
Updated
Jan 29, 2026 - Python
GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring
Clusterscope is a CLI and python library to extract information from HPC Clusters and Jobs.
Aegis is an LLM-powered AI cluster autonomous operations system, focused on intelligent capabilities such as Fault Diagnosis, Self-healing, Root Cause Analysis, Cluster Inspection, and Alert Optimization.
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Scalable Open Source AI cluster
AI OS for multi-node cluster systems powering Gideon this is just the demo coded in python, the rust powerd version is also an repo, we need an team of hardworking engineers and programmers who are able and wanting to help create this project and let it reach its full potential
Add a description, image, and links to the ai-cluster topic page so that developers can more easily learn about it.
To associate your repository with the ai-cluster topic, visit your repo's landing page and select "manage topics."