This MCP server is designed for planning with Claude Code, Cline, or Cursor and making changes with Cerebras to maximize speed and intelligence while avoiding API limits. Use your preferred AI for planning and strategy, then leverage Cerebras for high-quality code generation.
It will use the Qwen 3 Coder model, and can be embedded in IDEs like Claude Code and Cline, with beta support for Cursor.
- Multiple API Keys with Rate Limiting: Support for parallel API keys with automatic rate limit tracking
- Intelligent Request Routing: Automatically routes requests to available keys to avoid throttling
- Cost Optimization: Prefers free tier keys when available, falls back to paid tier as needed
- Project Restructure: Organized project into smaller, more manageable components for DX purposes
- Stronger Instruction: Improved
writeusage count among models - Claude Code - Enhanced Visual Diffs: Displays changes/edits in a pretty format
- Hide User API Key: For security, doesn't display entered API keys in the terminal
- Update Config Wizard for Messy Configs: Added a removal wizard that helps uninstall
npm install -g cerebras-code-mcpVisit cloud.cerebras.ai and create an API key
[OPTIONAL] Add OpenRouter as a backup in case you hit your Cerebras rate limits Visit OpenRouter and get a key to use as a fallback provider.
You can set this key in your MCP settings under OPENROUTER_API_KEY, and it will trigger automatically if anything goes wrong with calling Cerebras.
cerebras-mcp --configUse the setup wizard to configure the tool on your machine.
If you're using Cursor, it will ask you to copy and paste a prompt into your Cursor User Rules.
cerebras-mcp --removeUse the removal wizard to clean up configurations for any IDE or perform a complete cleanup.
The MCP tool will appear as write in your tool list. It supports:
- Natural language prompts: Just describe what you want in plain English
- Context files: Include multiple files as context for better code understanding
- Visual diffs: See changes with Git-style diffs
Example usage:
Create a REST API with Express.js that handles user authentication
The server now supports using multiple Cerebras API keys in parallel to avoid rate limit errors. This is especially useful when working with models that have restrictive limits like qwen-3-coder-480b.
Rate limiting and multi-key support activates ONLY when both keys are configured:
# BOTH keys must be set to enable rate limiting
export CEREBRAS_FREE_KEY=your-free-key-here
export CEREBRAS_PAID_KEY=your-paid-key-here
# Routing strategy (optional, defaults to 'performance')
export ROUTING_STRATEGY=performance # Options: 'performance', 'cost', 'balanced', 'roundrobin'Backward Compatibility:
- If you only set
CEREBRAS_API_KEY: Original behavior (no rate limiting) - If you set both
CEREBRAS_FREE_KEYandCEREBRAS_PAID_KEY: Rate limiting activated with automatic failover
- Automatic Rate Tracking: The system tracks request counts per model per key across minute/hour/day windows
- Intelligent Routing: Requests are automatically routed to available keys based on the selected strategy
- Seamless Failover: When one key hits its limit, requests automatically shift to other available keys
- Performance by Default: Default strategy prefers paid tier for larger context windows and higher limits
The system respects your Cerebras account's rate limits, which vary by:
- Model: Different models have different limits
- Tier: Paid tiers typically have higher limits
- Subscription: Your specific plan determines exact numbers
To configure your specific limits, see Updating Rate Limits below.
performance(default): Uses paid tier first for 2x context window and higher limits, falls back to freecost: Uses free tier first to minimize costs, falls back to paid when neededbalanced: Distributes load based on available capacityroundrobin: Alternates between available keys
With both free and paid keys configured (performance mode):
- Requests start with paid tier (often has larger context window)
- When paid tier hits rate limits → Automatically shifts to free tier
- When limits reset → Returns to paid tier priority
Benefits of paid-first strategy:
- Larger context windows on paid tiers (e.g., 2x for some models)
- Higher rate limits (varies by subscription)
- Better performance for large codebases
Use ROUTING_STRATEGY=cost if you want to maximize free tier usage first.
This ensures you never hit rate limit errors while maximizing free tier usage!
The system includes default rate limits based on Cerebras' free tier. If your account has different limits, you can update them using the configuration script.
- Copy your rate limits from the Cerebras dashboard
- Run the update script:
node scripts/update-rate-limits.js --tier free # or --tier paid - Paste the table and type
ENDwhen done
$ node scripts/update-rate-limits.js --tier free
📋 Paste the FREE tier rate limits table from Cerebras dashboard
(Type "END" on a new line when done)
[Paste your rate limits table here]
END
✅ Successfully updated rate limitsThe script will parse your pasted rate limits and update the configuration automatically.