Skip to content

Probe and test MCP servers with configurable Agent vs Agent conversations

License

Notifications You must be signed in to change notification settings

Liquescent-Development/mcprobe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCProbe

A conversational testing framework for MCP (Model Context Protocol) servers. MCProbe validates that MCP servers provide sufficient information for LLM agents to answer real-world questions correctly, using synthetic users and LLM judges.

Features

  • Scenario-Based Testing: Define test scenarios in YAML with synthetic user personas and evaluation criteria
  • Synthetic Users: Configurable user personas with different patience levels, expertise, and communication styles
  • Automated Evaluation: LLM-based judge evaluates correctness, tool usage, and efficiency
  • Multiple Agent Types: Test simple LLM agents or Gemini ADK agents with MCP tools
  • Prompt & Schema Tracking: Track agent prompts and MCP tool schemas across runs with automatic change detection
  • Trend Analysis: Track test performance over time and detect regressions
  • Flaky Detection: Identify inconsistent tests automatically
  • CI/CD Integration: pytest plugin, JUnit XML reports, GitHub Actions support
  • Scenario Generation: Auto-generate test scenarios from MCP server tool schemas

Quick Start

Installation

pip install mcprobe

For Gemini ADK agent support:

pip install mcprobe[adk]

Prerequisites

MCProbe requires an LLM provider for the synthetic user and judge. Choose one:

Option 1: Ollama (Local, Free)

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama and pull a model
ollama serve
ollama pull llama3.2

Option 2: OpenAI (Cloud)

# Set your API key
export OPENAI_API_KEY="sk-your-key-here"

The OpenAI provider also works with Azure OpenAI, vLLM, LiteLLM, and other OpenAI-compatible services.

Create a Scenario

Create test-scenario.yaml:

name: Weather Query Test
description: Test that the agent can answer weather questions

synthetic_user:
  persona: A user planning a weekend trip
  initial_query: What's the weather like in San Francisco this weekend?
  max_turns: 5

evaluation:
  correctness_criteria:
    - Agent provides weather information for San Francisco
    - Response includes temperature or conditions

Run the Test

With Ollama:

mcprobe run test-scenario.yaml

With OpenAI:

mcprobe run test-scenario.yaml --provider openai --model gpt-4

Using a Configuration File:

Create mcprobe.yaml:

llm:
  provider: ollama
  model: llama3.2
  base_url: http://localhost:11434

# Optional: Track MCP schema changes
mcp_server:
  command: "npx @modelcontextprotocol/server-weather"
  # OR for HTTP-based MCP server:
  # url: "http://localhost:8080/mcp"
  # OR with authentication:
  # url: "http://localhost:8080/mcp"
  # headers:
  #   Authorization: "Bearer ${API_TOKEN:-dev}"

Then run:

mcprobe run test-scenario.yaml

View Results

Result: PASSED (score: 0.85)
Reasoning: The agent successfully provided weather information...

Documentation

CLI Commands

# Run scenarios
mcprobe run scenarios/              # Run all scenarios in directory
mcprobe run scenario.yaml -v        # Run with verbose output

# Use different providers
mcprobe run scenario.yaml --provider ollama --model llama3.2
mcprobe run scenario.yaml --provider openai --model gpt-4

# Use configuration file
mcprobe run scenario.yaml --config mcprobe.yaml
mcprobe run scenario.yaml -c mcprobe.yaml

# Generate scenarios from MCP server
mcprobe generate-scenarios --server "npx @example/weather-mcp" -o ./scenarios

# Generate reports
mcprobe report --format html --output report.html

# Analyze trends
mcprobe trends --window 20
mcprobe flaky --fail-on-flaky

# Validate scenarios
mcprobe validate scenarios/

# List available providers
mcprobe providers

pytest Integration

MCProbe includes a pytest plugin for seamless test integration:

# Run scenarios as pytest tests
pytest scenarios/ -v

# Use config file
pytest scenarios/ --mcprobe-config mcprobe.yaml

# Override config with CLI options
pytest scenarios/ --mcprobe-provider openai --mcprobe-model gpt-4

# Save results for analysis
pytest scenarios/ --mcprobe-save-results

# Filter by tags
pytest scenarios/ -m smoke

Example Scenario

name: Multi-City Weather Comparison
description: Test comparing weather across multiple cities

synthetic_user:
  persona: A business traveler deciding between meeting locations
  initial_query: Compare the weather in New York, Chicago, and Miami for next Tuesday
  max_turns: 8
  clarification_behavior:
    known_facts:
      - Meeting is scheduled for next Tuesday
      - Prefer outdoor lunch if weather permits
    traits:
      patience: medium
      verbosity: concise
      expertise: intermediate

evaluation:
  correctness_criteria:
    - Provides weather for all three cities
    - Includes temperature information
    - Makes a recommendation based on weather
  failure_criteria:
    - Provides weather for wrong cities
    - Gives conflicting information
  tool_usage:
    required_tools:
      - get_weather
  efficiency:
    max_tool_calls: 5

Development

# Clone and install
git clone https://github.com/Liquescent-Development/mcprobe.git
cd mcprobe
uv venv
source .venv/bin/activate
uv sync --all-extras

# Run tests
pytest tests/unit/ -v

# Lint and type check
ruff check src/
mypy src/

License

AGPL-3.0 License - see LICENSE for details.

Links

About

Probe and test MCP servers with configurable Agent vs Agent conversations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published