Simple and efficient implementation of 671B DeepSeek V3 that trainable with FSDP+EP and minimal requirement of 256x A100/H100, targeted for HuggingFace ecosystem
-
Updated
Jan 15, 2026 - Python
Simple and efficient implementation of 671B DeepSeek V3 that trainable with FSDP+EP and minimal requirement of 256x A100/H100, targeted for HuggingFace ecosystem
Add a description, image, and links to the expert-parallel topic page so that developers can more easily learn about it.
To associate your repository with the expert-parallel topic, visit your repo's landing page and select "manage topics."