Skip to content

synechron-code/text2sql-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Synechron Text2SQL MCP Server

Synechron's Text2SQL MCP Server is a Model Context Protocol (MCP) server. It provides MCP clients like Cursor, VSCode, and Claude Desktop, with natural language access to relational databases of various types – enabling users to ask questions of their data using natural language capabilities.

The server optimizes the quality of the results it generates by using a combination of schema and row-aware semantic search. Schemas and a sample of rows are indexed, enabling the server to more effectively generate suitable queries.

Features and Capabilities

The Text2SQL MCP server provides the following capabilities:

  • Natural Language to SQL: Natural language questions are converted into optimized SQL queries using advanced language models.
  • Multi-Database Support: Compatibility with PostgreSQL, MySQL, SQLite and Microsoft Fabric databases.
  • Retrieval Augmented Generation (RAG): Enhanced query generation by combining schema metadata and sampled row content.
  • Table Indexing: Enables embedding for database schemas and sample data, improving query relevance.
  • Response Synthesis: Generates natural language summaries of query results.
  • Markdown Formatting: Creates more readable responses.
  • Multiple Model Support: Works with OpenAI, Azure OpenAI, AWS Bedrock, and Ollama models.

Sample Data

The server ships with two example datasets.

  • Cyber Threats
  • ESG Ratings

Sample Client

A sample REPL is also included.


Quick Start with Docker

Note: To use your own databases, further configuration is required, see the Configuration section below.

Running the Server with Docker

docker run -it -p8000:8000 \
  -v ~/.aws:/home/appuser/.aws:ro \
  -e MODEL_API_TYPE=Bedrock \
  -e AWS_PROFILE=AWSAdministratorAccess-880502554482 \
  -e FASTMCP_HOST=0.0.0.0 \
  709825985650.dkr.ecr.us-east-1.amazonaws.com/synechron/mcp-text2sql:0.0.10-2025.07.04-rc3 server

Running the Sample Client with Docker

docker run -it \
    -v ~/.aws:/home/appuser/.aws:ro \
    -e AWS_PROFILE=AWSAdministratorAccess-880502554482 \
    -e BEDROCK_MODEL_ID=us.anthropic.claude-3-7-sonnet-20250219-v1:0 \
    -e BEDROCK_API_VERSION=2025-01-01-preview \
    -e MCP_SERVER_HOST=host.docker.internal \
     709825985650.dkr.ecr.us-east-1.amazonaws.com/synechron/mcp-text2sql:0.0.10-2025.07.04-rc3 client

Example Client Usage

Connected to server with tools: ['Text2Sql_Cyber_Threats', 'Text2Sql_ESG_Ratings']

MCP Client Started!
Type your queries or 'quit' to exit.

Query: What kind of data is in the Cyber Threat datasource?

The Cyber Threat datasource contains comprehensive information about cybersecurity incidents with the following data fields:

1. country - The country where the cyber attack took place
2. year - The year when the cyber attack occurred
3. attack_type - The type of cyber attack (e.g., Phishing, Ransomware, DDoS, Man-in-the-Middle, SQL Injection)
4. target_industry - The industry that was targeted (e.g., Education, Retail, IT, Telecommunications, Healthcare, Government, Banking)
5. financial_loss_(in_million_$) - The financial impact of the attack in millions of dollars
6. number_of_affected_users - How many users were affected by the attack
7. attack_source - Where the attack originated from (e.g., Hacker Group, Nation-state, Insider, Unknown)
8. security_vulnerability_type - The type of security vulnerability exploited (e.g., Unpatched Software, Weak Passwords, Social Engineering)
9. defense_mechanism_used - What defense was in place (e.g., VPN, Firewall, Antivirus, AI-based Detection)
10. incident_resolution_time_(in_hours) - How long it took to resolve the incident in hours

The database tracks cybersecurity incidents across different countries, industries, and years, including details about attack methods, financial impact, affected users, vulnerabilities exploited, and resolution times.

Using the Server with an Existing MCP Client

The server should be run with Docker. Use either of the following configurations to add it to a MCP client:

{
  "servers": {
    "text2sql": {
      "type": "streamable-http",
      "url": "http://localhost:8000/mcp"
    }
  }
}
{
  "mcpServers": {
    "text2sql": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "--env-file",
        ".env",
        "709825985650.dkr.ecr.us-east-1.amazonaws.com/synechron/mcp-text2sql:0.0.10-2025.07.04-rc3",
        "server"
      ]
    }
  }
}

Configuration

The Text2SQL MCP server can be configured with a combination of environment variables and JSON configuration.

Server Configuration

Valves Config Environment Variable Default Description
ENABLED False Enable tool
ENABLE_RAG True Use AI to synthesize a response using data collected.
ENABLE_MARKDOWN False Format response in Markdown
ENABLE_PANDAS True Use pandas to execute and parse SQL query, otherwise use llama_index
ENABLE_TABLE_INDEXING True Use table retriever to create embeddings for database schema and sample data to optimize context for LLM
FORCE_REINDEXING False Force tool to regenerate vector indexes for tables and sample data
DATABASE_CONFIG DB_CONFIG Default db config JSON formatted tool configuration with database connection string, tables, context, and prompts
IGNORE_SCHEMA Default system tables Comma separated list of schemas to ignore
CREDENTIAL_SERVICE CREDENTIAL_SERVICE Select authentication type (Azure, AWS, Google, None)
MODEL_API_TYPE / MODEL_TYPE MODEL_TYPE OpenAI Select model API type (OpenAI, Azure OpenAI, Bedrock, Ollama)
API_ENDPOINT OPENAI_ENDPOINT http://litellm-proxy:4000 OpenAI API endpoint
API_ID API_ACCESS_ID Access Key ID
API_KEY OPENAI_API_KEY OpenAI API key
API_REGION AWS_REGION us-east-1 Region for cloud-hosted models
API_VERSION 2025-01-01-preview OpenAI API version
LLM_MODEL_NAME gpt-4.1-mini Text-to-SQL model
EMBED_MODEL_NAME text-embedding-3-small Embedding model
MAX_RETRY 3 Maximum number of retries for failing queries
MAX_RESULTS 200 Maximum number of rows in a response
MAX_DATAFRAME 1000 Maximum number of rows in a dataframe sent to LLM to synthesize a response
SAMPLE_RATIO 0.0001 Percentage of dataset to index. Higher values will take longer to index but give better results.
SAMPLE_MAX_SIZE 50 Maximum number of rows to sample. Supersedes SAMPLE_RATIO
DEBUG False Debug mode
MODE MODE stdio Mode in which the MCP server should operate. Use shttp for Streamable HTTP.
AUTH_ENABLED AUTH_ENABLED false Whether or not to require MCP clients to be authenticated.
AUTH_ISSUER AUTH_ISSUER Required if AUTH_ENABLED is true.
AUTH_JWKS_URI AUTH_JWKS_URI Required if AUTH_ENABLED is true.
AWS_ACCESS_KEY_ID AWS_ACCESS_KEY_ID Required if using AWS Bedrock for embeddings
AWS_SECRET_ACCESS_KEY AWS_SECRET_ACCESS_KEY Required if using AWS Bedrock for embeddings
LOG_LEVEL LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR).
MCP_SERVER_API_KEY MCP_SERVER_API_KEY API Key for access to LLM / Embeddings. Configure with AWS_SECRET_ACCESS_KEY, LiteLLM key, or OpenAI API key.
MCP_SERVER_DATA MCP_SERVER_DATA data Directory for temporary data files
TEXT2SQL_VALVES_JSON TEXT2SQL_VALVES_JSON valves.json Full path to configuration JSON, see above for example.

Example .env File

LOG_LEVEL=INFO
PYTHONUNBUFFERED=1
TEXT2SQL_VALVES_JSON=/home/appuser/config/config.json

MODE=shttp
FASTMCP_HOST=0.0.0.0
FASTMCP_PORT=8000

AUTH_ENABLED=true
AUTH_JWKS_URI=https://cognito-idp.us-east-1.amazonaws.com/us-east-1_/.well-known/jwks.json
AUTH_ISSUER=https://cognito-idp.us-east-1.amazonaws.com/us-east-1_

AWS_PROFILE=default

Example TEXT2SQL_VALVES_JSON (valves.json)

The Text2SQL MCP server must be configured using a JSON file that specifies database connections, model settings, and other parameters. Here's an example configuration:

{
  "ENABLED": true,
  "ENABLE_RAG": false,
  "ENABLE_MARKDOWN": true,
  "ENABLE_PANDAS": true,
  "ENABLE_TABLE_INDEXING": true,
  "FORCE_REINDEXING": false,
  "DATABASE_CONFIG": {
    "Cyber_Threats": {
      "description": "A comprehensive dataset tracking cybersecurity incidents, attack vectors, threat types, and affected countries.",
      "topics": [
        "Cyber Threats",
        "Attacks",
        "Targets"
      ],
      "url": "sqlite:///file:/home/appuser/config/cyber_threats.db?uri=true",
      "tables": {
        "cyber_threats": "The Global Cybersecurity Threats Dataset (2015-2024) provides extensive data on cyberattacks, malware types, targeted industries, and affected countries."
      },
      "prompts": [
        {
          "name": "biggest_cyber_threat",
          "description": "Which type of cyber threat has caused the biggest financial loss?"
        }
      ]
    },
    "ESG_Ratings": {
      "description": "S&P 500 Companies ESG Insights & Risk Scores for Informed Decisions.",
      "topics": [
        "ESG Ratings",
        "ESG Risk",
        "Sustainability"
      ],
      "url": "sqlite:///file:/home/appuser/config/esg_ratings.db?uri=true",
      "tables": {
        "esg_ratings": "A comprehensive dataset tracking ESG ratings for S&P 500 companies."
      }
    }
  },
  "IGNORE_SCHEMA": "information_schema, INFORMATION_SCHEMA, _rsc, db_accessadmin, db_backupoperator, db_datareader, db_datawriter, db_ddladmin, db_denydatareader, db_denydatawriter, db_owner, db_securityadmin, guest, queryinsights, sys, pg_catalog",
  "MODEL_API_TYPE": "Bedrock",
  "AWS_PROFILE": "default",
  "API_VERSION": "2025-01-01-preview",
  "LLM_MODEL_NAME": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
  "EMBED_MODEL_NAME": "amazon.titan-embed-text-v2:0",
  "MAX_RETRY": 3,
  "MAX_RESULTS": 100,
  "MAX_DATAFRAME": 2000,
  "LOG_LEVEL": "INFO"
}

Client Configuration

The following environment variables can be provided to the sample MCP client:

Option Default Description
JWT_ACCESS_TOKEN false Token used to authenticate client with server if the server has AUTH_ENABLED true
MCP_SERVER_HOST Host for MCP server. For a server running in Docker on the same host use host.docker.internal
MCP_SERVER_PORT 8000 Port for MCP server.
BEDROCK_MODEL_ID Required if using AWS Bedrock, e.g. us.anthropic.claude-3-7-sonnet-20250219-v1:0
BEDROCK_API_VERSION Required if using AWS Bedrock, e.g. 2023-06-01-preview
AWS_PROFILE Required if using AWS Bedrock, e.g. default
AZURE_DEPLOYMENT_MODEL Required if using Azure OpenAI. Ignored if BEDROCK_MODEL_ID is set. For Azure authentication see DefaultAzureCredential.
AZURE_API_VERSION Required If using Azure OpenAI. Ignored if BEDROCK_MODEL_ID is set.
OTEL_SDK_DISABLED false Enable telemetry for Crew.AI client.
CREWAI_DISABLE_TELEMETRY false Enable telemetry for Crew.AI client.

Tools

The Text2SQL MCP server dynamically creates MCP Tools and Prompts for the configured databases in the following format:

  • Text2Sql_DatabaseName: Query a specific database using natural language
    • Parameters:
      • query: Natural language query to execute against the database

License

This project is licensed under the MIT License.

About

synechron-text2sql-mcp-server

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published