MCProbe

A conversational testing framework for MCP (Model Context Protocol) servers. MCProbe validates that MCP servers provide sufficient information for LLM agents to answer real-world questions correctly, using synthetic users and LLM judges.

Features

Scenario-Based Testing: Define test scenarios in YAML with synthetic user personas and evaluation criteria
Synthetic Users: Configurable user personas with different patience levels, expertise, and communication styles
Automated Evaluation: LLM-based judge evaluates correctness, tool usage, and efficiency
Multiple Agent Types: Test simple LLM agents or Gemini ADK agents with MCP tools
Prompt & Schema Tracking: Track agent prompts and MCP tool schemas across runs with automatic change detection
Trend Analysis: Track test performance over time and detect regressions
Flaky Detection: Identify inconsistent tests automatically
CI/CD Integration: pytest plugin, JUnit XML reports, GitHub Actions support
Scenario Generation: Auto-generate test scenarios from MCP server tool schemas

Quick Start

Installation

pip install mcprobe

For Gemini ADK agent support:

pip install mcprobe[adk]

Prerequisites

MCProbe requires an LLM provider for the synthetic user and judge. Choose one:

Option 1: Ollama (Local, Free)

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama and pull a model
ollama serve
ollama pull llama3.2

Option 2: OpenAI (Cloud)

# Set your API key
export OPENAI_API_KEY="sk-your-key-here"

The OpenAI provider also works with Azure OpenAI, vLLM, LiteLLM, and other OpenAI-compatible services.

Create a Scenario

Create test-scenario.yaml:

name: Weather Query Test
description: Test that the agent can answer weather questions

synthetic_user:
  persona: A user planning a weekend trip
  initial_query: What's the weather like in San Francisco this weekend?
  max_turns: 5

evaluation:
  correctness_criteria:
    - Agent provides weather information for San Francisco
    - Response includes temperature or conditions

Run the Test

With Ollama:

mcprobe run test-scenario.yaml

With OpenAI:

mcprobe run test-scenario.yaml --provider openai --model gpt-4

Using a Configuration File:

Create mcprobe.yaml:

llm:
  provider: ollama
  model: llama3.2
  base_url: http://localhost:11434

# Optional: Track MCP schema changes
mcp_server:
  command: "npx @modelcontextprotocol/server-weather"
  # OR for HTTP-based MCP server:
  # url: "http://localhost:8080/mcp"
  # OR with authentication:
  # url: "http://localhost:8080/mcp"
  # headers:
  #   Authorization: "Bearer ${API_TOKEN:-dev}"

Then run:

mcprobe run test-scenario.yaml

View Results

Result: PASSED (score: 0.85)
Reasoning: The agent successfully provided weather information...

Documentation

CLI Commands

# Run scenarios
mcprobe run scenarios/              # Run all scenarios in directory
mcprobe run scenario.yaml -v        # Run with verbose output

# Use different providers
mcprobe run scenario.yaml --provider ollama --model llama3.2
mcprobe run scenario.yaml --provider openai --model gpt-4

# Use configuration file
mcprobe run scenario.yaml --config mcprobe.yaml
mcprobe run scenario.yaml -c mcprobe.yaml

# Generate scenarios from MCP server
mcprobe generate-scenarios --server "npx @example/weather-mcp" -o ./scenarios

# Generate reports
mcprobe report --format html --output report.html

# Analyze trends
mcprobe trends --window 20
mcprobe flaky --fail-on-flaky

# Validate scenarios
mcprobe validate scenarios/

# List available providers
mcprobe providers

pytest Integration

MCProbe includes a pytest plugin for seamless test integration:

# Run scenarios as pytest tests
pytest scenarios/ -v

# Use config file
pytest scenarios/ --mcprobe-config mcprobe.yaml

# Override config with CLI options
pytest scenarios/ --mcprobe-provider openai --mcprobe-model gpt-4

# Save results for analysis
pytest scenarios/ --mcprobe-save-results

# Filter by tags
pytest scenarios/ -m smoke

Example Scenario

name: Multi-City Weather Comparison
description: Test comparing weather across multiple cities

synthetic_user:
  persona: A business traveler deciding between meeting locations
  initial_query: Compare the weather in New York, Chicago, and Miami for next Tuesday
  max_turns: 8
  clarification_behavior:
    known_facts:
      - Meeting is scheduled for next Tuesday
      - Prefer outdoor lunch if weather permits
    traits:
      patience: medium
      verbosity: concise
      expertise: intermediate

evaluation:
  correctness_criteria:
    - Provides weather for all three cities
    - Includes temperature information
    - Makes a recommendation based on weather
  failure_criteria:
    - Provides weather for wrong cities
    - Gives conflicting information
  tool_usage:
    required_tools:
      - get_weather
  efficiency:
    max_tool_calls: 5

Development

# Clone and install
git clone https://github.com/Liquescent-Development/mcprobe.git
cd mcprobe
uv venv
source .venv/bin/activate
uv sync --all-extras

# Run tests
pytest tests/unit/ -v

# Lint and type check
ruff check src/
mypy src/

License

AGPL-3.0 License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/mcprobe		src/mcprobe
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCProbe

Features

Quick Start

Installation

Prerequisites

Create a Scenario

Run the Test

View Results

Documentation

CLI Commands

pytest Integration

Example Scenario

Development

License

Links

About

Uh oh!

Releases

Packages

Languages

License

Liquescent-Development/mcprobe

Folders and files

Latest commit

History

Repository files navigation

MCProbe

Features

Quick Start

Installation

Prerequisites

Create a Scenario

Run the Test

View Results

Documentation

CLI Commands

pytest Integration

Example Scenario

Development

License

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages