Cerebras Code MCP Server v1.2.3

This MCP server is designed for planning with Claude Code, Cline, or Cursor and making changes with Cerebras to maximize speed and intelligence while avoiding API limits. Use your preferred AI for planning and strategy, then leverage Cerebras for high-quality code generation.

It will use the Qwen 3 Coder model, and can be embedded in IDEs like Claude Code and Cline, with beta support for Cursor.

✨ New in v1.2.3

Multiple API Keys with Rate Limiting: Support for parallel API keys with automatic rate limit tracking
Intelligent Request Routing: Automatically routes requests to available keys to avoid throttling
Cost Optimization: Prefers free tier keys when available, falls back to paid tier as needed

✨ Previous in v1.2.2

Project Restructure: Organized project into smaller, more manageable components for DX purposes
Stronger Instruction: Improved write usage count among models
Claude Code - Enhanced Visual Diffs: Displays changes/edits in a pretty format
Hide User API Key: For security, doesn't display entered API keys in the terminal
Update Config Wizard for Messy Configs: Added a removal wizard that helps uninstall

1. Install the NPM Package

npm install -g cerebras-code-mcp

2. Get Cerebras API key

Visit cloud.cerebras.ai and create an API key

[OPTIONAL] Add OpenRouter as a backup in case you hit your Cerebras rate limits Visit OpenRouter and get a key to use as a fallback provider.

You can set this key in your MCP settings under OPENROUTER_API_KEY, and it will trigger automatically if anything goes wrong with calling Cerebras.

3. Run the Setup Wizard for Claude Code / Cursor / Cline

cerebras-mcp --config

Use the setup wizard to configure the tool on your machine.

If you're using Cursor, it will ask you to copy and paste a prompt into your Cursor User Rules.

4. Removal/Cleanup (Optional)

cerebras-mcp --remove

Use the removal wizard to clean up configurations for any IDE or perform a complete cleanup.

5. Usage

The MCP tool will appear as write in your tool list. It supports:

Natural language prompts: Just describe what you want in plain English
Context files: Include multiple files as context for better code understanding
Visual diffs: See changes with Git-style diffs

Example usage:

Create a REST API with Express.js that handles user authentication

6. Multiple API Keys & Rate Limiting (Advanced)

The server now supports using multiple Cerebras API keys in parallel to avoid rate limit errors. This is especially useful when working with models that have restrictive limits like qwen-3-coder-480b.

Configuration

Rate limiting and multi-key support activates ONLY when both keys are configured:

# BOTH keys must be set to enable rate limiting
export CEREBRAS_FREE_KEY=your-free-key-here
export CEREBRAS_PAID_KEY=your-paid-key-here

# Routing strategy (optional, defaults to 'performance')
export ROUTING_STRATEGY=performance  # Options: 'performance', 'cost', 'balanced', 'roundrobin'

Backward Compatibility:

If you only set CEREBRAS_API_KEY: Original behavior (no rate limiting)
If you set both CEREBRAS_FREE_KEY and CEREBRAS_PAID_KEY: Rate limiting activated with automatic failover

How It Works

Automatic Rate Tracking: The system tracks request counts per model per key across minute/hour/day windows
Intelligent Routing: Requests are automatically routed to available keys based on the selected strategy
Seamless Failover: When one key hits its limit, requests automatically shift to other available keys
Performance by Default: Default strategy prefers paid tier for larger context windows and higher limits

Rate Limits

The system respects your Cerebras account's rate limits, which vary by:

Model: Different models have different limits
Tier: Paid tiers typically have higher limits
Subscription: Your specific plan determines exact numbers

To configure your specific limits, see Updating Rate Limits below.

Routing Strategies

performance (default): Uses paid tier first for 2x context window and higher limits, falls back to free
cost: Uses free tier first to minimize costs, falls back to paid when needed
balanced: Distributes load based on available capacity
roundrobin: Alternates between available keys

Example Scenario

With both free and paid keys configured (performance mode):

Requests start with paid tier (often has larger context window)
When paid tier hits rate limits → Automatically shifts to free tier
When limits reset → Returns to paid tier priority

Benefits of paid-first strategy:

Larger context windows on paid tiers (e.g., 2x for some models)
Higher rate limits (varies by subscription)
Better performance for large codebases

Use ROUTING_STRATEGY=cost if you want to maximize free tier usage first.

This ensures you never hit rate limit errors while maximizing free tier usage!

7. Updating Rate Limits

The system includes default rate limits based on Cerebras' free tier. If your account has different limits, you can update them using the configuration script.

How to Update Your Rate Limits

Copy your rate limits from the Cerebras dashboard

Run the update script:

node scripts/update-rate-limits.js --tier free  # or --tier paid

Paste the table and type END when done

Example Usage

$ node scripts/update-rate-limits.js --tier free
📋 Paste the FREE tier rate limits table from Cerebras dashboard
   (Type "END" on a new line when done)

[Paste your rate limits table here]
END

✅ Successfully updated rate limits

The script will parse your pasted rate limits and update the configuration automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
scripts		scripts
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
demo-rate-limiting.js		demo-rate-limiting.js
package-lock.json		package-lock.json
package.json		package.json
test-backward-compat.js		test-backward-compat.js
test-debug-keys.js		test-debug-keys.js
test-integration-real.js		test-integration-real.js
test-rate-limiting-real.js		test-rate-limiting-real.js
vitest.config.js		vitest.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerebras Code MCP Server v1.2.3

✨ New in v1.2.3

✨ Previous in v1.2.2

1. Install the NPM Package

2. Get Cerebras API key

3. Run the Setup Wizard for Claude Code / Cursor / Cline

4. Removal/Cleanup (Optional)

5. Usage

6. Multiple API Keys & Rate Limiting (Advanced)

Configuration

How It Works

Rate Limits

Routing Strategies

Example Scenario

7. Updating Rate Limits

How to Update Your Rate Limits

Example Usage

About

Uh oh!

Releases

Packages

Languages

coderashed/cerebras-code-mcp

Folders and files

Latest commit

History

Repository files navigation

Cerebras Code MCP Server v1.2.3

✨ New in v1.2.3

✨ Previous in v1.2.2

1. Install the NPM Package

2. Get Cerebras API key

3. Run the Setup Wizard for Claude Code / Cursor / Cline

4. Removal/Cleanup (Optional)

5. Usage

6. Multiple API Keys & Rate Limiting (Advanced)

Configuration

How It Works

Rate Limits

Routing Strategies

Example Scenario

7. Updating Rate Limits

How to Update Your Rate Limits

Example Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages