A conversational testing framework for MCP (Model Context Protocol) servers. MCProbe validates that MCP servers provide sufficient information for LLM agents to answer real-world questions correctly, using synthetic users and LLM judges.
- Scenario-Based Testing: Define test scenarios in YAML with synthetic user personas and evaluation criteria
- Synthetic Users: Configurable user personas with different patience levels, expertise, and communication styles
- Automated Evaluation: LLM-based judge evaluates correctness, tool usage, and efficiency
- Multiple Agent Types: Test simple LLM agents or Gemini ADK agents with MCP tools
- Prompt & Schema Tracking: Track agent prompts and MCP tool schemas across runs with automatic change detection
- Trend Analysis: Track test performance over time and detect regressions
- Flaky Detection: Identify inconsistent tests automatically
- CI/CD Integration: pytest plugin, JUnit XML reports, GitHub Actions support
- Scenario Generation: Auto-generate test scenarios from MCP server tool schemas
pip install mcprobeFor Gemini ADK agent support:
pip install mcprobe[adk]MCProbe requires an LLM provider for the synthetic user and judge. Choose one:
Option 1: Ollama (Local, Free)
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama and pull a model
ollama serve
ollama pull llama3.2Option 2: OpenAI (Cloud)
# Set your API key
export OPENAI_API_KEY="sk-your-key-here"The OpenAI provider also works with Azure OpenAI, vLLM, LiteLLM, and other OpenAI-compatible services.
Create test-scenario.yaml:
name: Weather Query Test
description: Test that the agent can answer weather questions
synthetic_user:
persona: A user planning a weekend trip
initial_query: What's the weather like in San Francisco this weekend?
max_turns: 5
evaluation:
correctness_criteria:
- Agent provides weather information for San Francisco
- Response includes temperature or conditionsWith Ollama:
mcprobe run test-scenario.yamlWith OpenAI:
mcprobe run test-scenario.yaml --provider openai --model gpt-4Using a Configuration File:
Create mcprobe.yaml:
llm:
provider: ollama
model: llama3.2
base_url: http://localhost:11434
# Optional: Track MCP schema changes
mcp_server:
command: "npx @modelcontextprotocol/server-weather"
# OR for HTTP-based MCP server:
# url: "http://localhost:8080/mcp"
# OR with authentication:
# url: "http://localhost:8080/mcp"
# headers:
# Authorization: "Bearer ${API_TOKEN:-dev}"Then run:
mcprobe run test-scenario.yamlResult: PASSED (score: 0.85)
Reasoning: The agent successfully provided weather information...
- Getting Started
- Quickstart Guide
- Writing Scenarios
- CLI Reference
- pytest Integration
- Analysis & Reporting
# Run scenarios
mcprobe run scenarios/ # Run all scenarios in directory
mcprobe run scenario.yaml -v # Run with verbose output
# Use different providers
mcprobe run scenario.yaml --provider ollama --model llama3.2
mcprobe run scenario.yaml --provider openai --model gpt-4
# Use configuration file
mcprobe run scenario.yaml --config mcprobe.yaml
mcprobe run scenario.yaml -c mcprobe.yaml
# Generate scenarios from MCP server
mcprobe generate-scenarios --server "npx @example/weather-mcp" -o ./scenarios
# Generate reports
mcprobe report --format html --output report.html
# Analyze trends
mcprobe trends --window 20
mcprobe flaky --fail-on-flaky
# Validate scenarios
mcprobe validate scenarios/
# List available providers
mcprobe providersMCProbe includes a pytest plugin for seamless test integration:
# Run scenarios as pytest tests
pytest scenarios/ -v
# Use config file
pytest scenarios/ --mcprobe-config mcprobe.yaml
# Override config with CLI options
pytest scenarios/ --mcprobe-provider openai --mcprobe-model gpt-4
# Save results for analysis
pytest scenarios/ --mcprobe-save-results
# Filter by tags
pytest scenarios/ -m smokename: Multi-City Weather Comparison
description: Test comparing weather across multiple cities
synthetic_user:
persona: A business traveler deciding between meeting locations
initial_query: Compare the weather in New York, Chicago, and Miami for next Tuesday
max_turns: 8
clarification_behavior:
known_facts:
- Meeting is scheduled for next Tuesday
- Prefer outdoor lunch if weather permits
traits:
patience: medium
verbosity: concise
expertise: intermediate
evaluation:
correctness_criteria:
- Provides weather for all three cities
- Includes temperature information
- Makes a recommendation based on weather
failure_criteria:
- Provides weather for wrong cities
- Gives conflicting information
tool_usage:
required_tools:
- get_weather
efficiency:
max_tool_calls: 5# Clone and install
git clone https://github.com/Liquescent-Development/mcprobe.git
cd mcprobe
uv venv
source .venv/bin/activate
uv sync --all-extras
# Run tests
pytest tests/unit/ -v
# Lint and type check
ruff check src/
mypy src/AGPL-3.0 License - see LICENSE for details.