GPUStack

gpustack Public

Performance-Optimized AI Inference on Your GPUs. Unlock it by selecting and tuning the optimal inference engine for your model.

Python 4.4k 452

runner Public

Collection of Dockerfiles to build images for various inference services across different accelerated backends.

Dockerfile 8 7

runtime Public

Provides a unified interface to detect GPU resources and manages GPU workloads.

Python 8 9

gguf-parser-go Public

Review/Check GGUF files and estimate the memory usage and maximum tokens per second.

vox-box Public

A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

Python 192 29

Provide feedback