AI-powered code review with deterministic gates
LLMs are expensive. Don't waste tokens on code that fails linting or is too large to review meaningfully.
This agent implements a gate-first architecture:
- Deterministic checks first - Size limits and linting catch obvious issues for free
- AI reasoning second - Only well-formed PRs reach the LLM
- Model selection - Small PRs use Haiku (cheap/fast), large PRs use Sonnet (thorough)
The result: lower costs, faster feedback, and better signal-to-noise in reviews.
┌──────────┐ ┌────────────┐ ┌────────────┐ ┌──────────────┐
│ PR │───▶│ Size Gate │───▶│ Lint Gate │───▶│ Model Select │
└──────────┘ └────────────┘ └────────────┘ └──────────────┘
│ │ │
▼ ▼ ▼
REJECT REJECT ┌────────────┐
(>500 lines) (lint errors) │ LLM Review │
└────────────┘
│
┌──────────────────────────────────────┴───┐
▼ ▼
┌─────────────┐ ┌───────────────┐
│ Post Comment│ │ Log to │
│ to GitHub │ │ Supabase │
└─────────────┘ └───────────────┘
# Clone the repo
git clone https://github.com/YOUR_USERNAME/code-review-agent.git
cd code-review-agent
# Install with uv
uv syncexport GITHUB_TOKEN="ghp_..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Optional: for metrics logging
export SUPABASE_URL="https://xxx.supabase.co"
export SUPABASE_KEY="eyJ..."uv run pr-review-agent --repo owner/repo --pr 123
# To post a comment to the PR:
uv run pr-review-agent --repo owner/repo --pr 123 --post-commentCreate .ai-review.yaml in your repo root:
version: 1
# Size limits - PRs exceeding these are rejected
limits:
max_lines_changed: 500
max_files_changed: 20
# Files to exclude from linting and review
ignore:
- "*.lock"
- "*.json"
- "*.md"
- "package-lock.json"
# Linting configuration
linting:
enabled: true
tool: ruff
fail_on_error: true
fail_threshold: 10 # Max errors before rejection
# LLM configuration
llm:
provider: anthropic
default_model: claude-sonnet-4-20250514 # For larger PRs
simple_model: claude-haiku-4-5-20251001 # For small PRs (<50 lines)
simple_threshold_lines: 50
max_tokens: 4096
# Confidence thresholds for auto-approve/escalate
confidence:
high: 0.8 # Above this = can auto-approve
low: 0.5 # Below this = escalate to human
# What the LLM should focus on
review_focus:
- logic_errors
- security_issues
- missing_tests
- code_patterns
- naming_conventions| Section | Key | Default | Description |
|---|---|---|---|
limits |
max_lines_changed |
500 | Max lines (added + removed) |
limits |
max_files_changed |
20 | Max files in PR |
linting |
enabled |
true | Run linting gate |
linting |
tool |
ruff | Linter to use |
linting |
fail_threshold |
10 | Max lint errors |
llm |
simple_threshold_lines |
50 | Lines below this use Haiku |
confidence |
high |
0.8 | Auto-approve threshold |
confidence |
low |
0.5 | Escalation threshold |
Rejects PRs that exceed configurable limits:
- Default: 500 lines changed, 20 files
- Large PRs are hard to review well - this encourages smaller, focused changes
- Rejected PRs get a comment suggesting they split the PR
Runs Ruff on changed Python files:
- Catches syntax errors, unused imports, formatting issues
- Configurable error threshold (default: 10 errors = rejection)
- Why lint before LLM? Saves tokens on obviously broken code
Picks the right model for the job:
- Haiku for small PRs (<50 lines): Fast, cheap, good enough
- Sonnet for larger PRs: More thorough analysis
Claude analyzes the diff looking for:
- Logic errors and bugs
- Security vulnerabilities
- Missing test coverage
- Code pattern issues
- Naming convention violations
Each review gets a confidence score (0.0-1.0):
- High (>0.8): AI is confident in its assessment
- Medium (0.5-0.8): Some uncertainty, human review recommended
- Low (<0.5): Complex code, definitely needs human review
- Console: Formatted results for local runs
- GitHub Comment: Markdown comment on the PR
- Supabase: Metrics logged for the dashboard
The dashboard shows:
- Total reviews and cost over time
- Gate effectiveness (how many tokens saved)
- Average confidence scores
- Recent review history
- Go to vercel.com and import from GitHub
- Set root directory to
dashboard - Add environment variables:
NEXT_PUBLIC_SUPABASE_URLNEXT_PUBLIC_SUPABASE_ANON_KEY
- Deploy
Copy .github/workflows/pr-review.yml to your repo:
name: AI PR Review
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
contents: read
pull-requests: write
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- uses: astral-sh/setup-uv@v4
- run: uv sync
- name: Run PR Review Agent
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
SUPABASE_KEY: ${{ secrets.SUPABASE_KEY }}
run: |
uv run pr-review-agent \
--repo ${{ github.repository }} \
--pr ${{ github.event.pull_request.number }} \
--post-commentIn your repo settings (Settings > Secrets and variables > Actions):
| Secret | Description |
|---|---|
ANTHROPIC_API_KEY |
Your Anthropic API key |
SUPABASE_URL |
Supabase project URL (optional) |
SUPABASE_KEY |
Supabase anon key (optional) |
Note: GITHUB_TOKEN is automatically provided by GitHub Actions.
# Install dev dependencies
uv sync
# Run tests
uv run pytest
# Run linting
uv run ruff check .
# Run type checking
uv run ruff check . --select=E,F,I,UP,B,SIM.
├── src/pr_review_agent/
│ ├── main.py # CLI entrypoint
│ ├── config.py # Configuration loading
│ ├── github_client.py # GitHub API client
│ ├── gates/
│ │ ├── size_gate.py # Size limit checking
│ │ └── lint_gate.py # Ruff linting
│ ├── review/
│ │ ├── llm_reviewer.py # Claude API integration
│ │ ├── model_selector.py # Haiku vs Sonnet
│ │ └── confidence.py # Confidence scoring
│ ├── output/
│ │ ├── console.py # Terminal output
│ │ └── github_comment.py # PR comment formatting
│ └── metrics/
│ └── supabase_logger.py # Metrics logging
├── tests/ # Test suite
├── dashboard/ # Next.js metrics dashboard
├── database/
│ └── schema.sql # Supabase schema
└── .github/workflows/
└── pr-review.yml # GitHub Action
# All tests
uv run pytest
# With coverage
uv run pytest --cov=pr_review_agent
# Specific test file
uv run pytest tests/test_size_gate.py -vMIT