Unified API for Multiple LLM Providers
Build once, run anywhere: One API for all your LLMs, cloud or local
[
]https://github.com/ngstcf/llmbase)
Features β’ Installation β’ Quick Start β’ Documentation β’ Examples
- π Multi-Provider Support: OpenAI, Azure OpenAI, Anthropic, Google Gemini, DeepSeek, xAI/Grok, Perplexity, Ollama
- π¦ Structured Output: Built-in
json_modeensures valid JSON responses across all providers - π Dual Mode: Use as a Python library (no Flask) or HTTP API server (optional Flask)
- π‘οΈ Resilience: Automatic retries with exponential backoff and circuit breakers
- π§ Advanced Features: Support for reasoning models, streaming, extended thinking
- π§ Configuration-Driven: Hot-reload model configs without code changes
- π CORS-Ready: Built-in CORS support for web applications
- π Debugging & Transparency: Built-in logging, request tracking, performance metrics, and configuration status endpoints
# Clone the repository
git clone https://github.com/yourusername/llmbase.git
cd llmbase
# Install dependencies
pip install -r requirements.txtpip install git+https://github.com/yourusername/llmbase.gitCore (Library Mode - no Flask required):
- python-dotenv
- openai
- anthropic
- google-genai
- requests
- urllib3
Optional (API Server Mode):
- flask
- flask-session
Enable API server mode by setting LLM_API_MODE=true in your .env file.
from llmservices import LLMService, LLMRequest
# Simple call
req = LLMRequest(
provider="openai",
model="gpt-4o",
prompt="Write a haiku about AI"
)
response = LLMService.call(req)
print(response.content)# Enable API mode
export LLM_API_MODE=true
# Run the server
python llmservices.pyThen make HTTP requests:
curl -X POST http://localhost:8888/api/llm/call \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"prompt": "Hello, world!"
}'π Full Documentation: https://c3.unu.edu/projects/ai/llmbase/
π Blog Post: One API, Many AI Models
The complete HTML documentation is hosted online with:
- Interactive navigation
- Code examples
- API reference
- Best practices guide
Create a .env file:
# Provider API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
DEEPSEEK_API_KEY=sk-...
# Optional Providers
PERPLEXITY_API_KEY=pplx-...
AZURE_OAI_ENDPOINT=https://...
AZURE_OAI_KEY=...
AZURE_OAI_DEPLOYMENT_NAME=gpt-4
# Ollama (Local)
OLLAMA_CHAT_ENDPOINT=http://localhost:11434/api/chat
OLLAMA_MODELS_ENDPOINT=http://localhost:11434/api/models
# Service Config
LLM_CONFIG_FILE=llm_config.json
LLM_API_MODE=false # Set to true for API server mode
FLASK_SECRET_KEY=your-secret-key
# Debugging & Logging (Optional)
LLM_LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
LLM_DEBUG=false # Enable verbose logging for debuggingCreate llm_config.json:
{
"resilience": {
"max_retries": 3,
"backoff_factor": 1.5,
"retry_jitter": 0.5,
"circuit_breaker_failure_threshold": 5,
"circuit_breaker_recovery_timeout": 60
},
"openai": {
"api_base": "https://api.openai.com/v1",
"default_model": "gpt-4o",
"models": {
"gpt-4o": {
"max_tokens": 16384,
"supports_streaming": true,
"temperature_default": 0.3,
"uses_completion_tokens": false
},
"o1": {
"max_tokens": 100000,
"supports_streaming": false,
"supports_reasoning": true,
"uses_completion_tokens": true
}
}
},
"anthropic": {
"default_model": "claude-sonnet-4-5-20250929",
"models": {
"claude-sonnet-4-5-20250929": {
"max_tokens": 8192,
"supports_streaming": true,
"supports_extended_thinking": true,
"uses_completion_tokens": false
}
}
}
}Note: The
uses_completion_tokensfield indicates whether the model usesmax_completion_tokensinstead ofmax_tokensin the API request (e.g., OpenAI o1 series). Set totruefor reasoning models that require this parameter.
| Provider | Description |
|---|---|
| OpenAI | GPT models |
| Azure OpenAI | GPT models via Azure |
| Anthropic | Claude models |
| Google Gemini | Gemini models |
| DeepSeek | Chat and reasoning models |
| xAI / Grok | Grok models |
| Perplexity | Sonar models |
| Ollama | Local models |
req = LLMRequest(
provider="openai",
model="gpt-4o",
prompt="Extract names from: Meeting with Sarah on 2025-05-12",
json_mode=True
)
response = LLMService.call(req)
import json
data = json.loads(response.content)
print(data) # {"names": ["Sarah"]}req = LLMRequest(
provider="anthropic",
model="claude-sonnet-4-5-20250929",
prompt="Write a short story",
stream=True
)
for chunk in LLMService.stream(req):
print(chunk, end='', flush=True)from llmservices import CircuitBreakerOpenException, LLMError
try:
response = LLMService.call(req)
except LLMError as e:
# Enhanced error with context
print(f"Error: {e.message}")
print(f"Provider: {e.provider}")
print(f"Status: {e.status_code}")
print(f"Request ID: {e.request_id}")
except CircuitBreakerOpenException as e:
print(f"Service unavailable: {e}")
except Exception as e:
print(f"Error: {e}")from llmservices import LLMService, LLMRequest, LLMConfig
# Check configuration status
status = LLMConfig.get_status()
print(f"Version: {status['version']}")
print(f"Providers: {status['providers_configured']}")
# Request with tracking
req = LLMRequest(
provider="openai",
model="gpt-4o",
prompt="Hello"
)
print(f"Request ID: {req.request_id}")
response = LLMService.call(req)
# Access debugging information
print(f"Request ID: {response.request_id}")
print(f"Usage: {response.usage}")
print(f"Finish Reason: {response.finish_reason}")
if response.timing:
print(f"Duration: {response.timing.total_duration_ms}ms")@dataclass
class LLMRequest:
provider: str # Required: Provider name
model: str # Required: Model name
prompt: str # Required: User prompt
stream: bool = False # Enable streaming
temperature: Optional[float] = None # 0.0-1.0
max_tokens: Optional[int] = None # Max response tokens
system_prompt: Optional[str] = None # System message
messages: Optional[List[Dict]] = None # Chat messages
reasoning_effort: Optional[str] = None # "low", "medium", "high"
enable_thinking: bool = True # Enable extended thinking
json_mode: bool = False # Force JSON output@dataclass
class LLMResponse:
content: str # Response text
model: str # Model used
provider: str # Provider used
usage: Optional[Dict[str, int]] # Token usage
reasoning_content: Optional[str] # Thinking content
finish_reason: Optional[str] # Stop reason
# Debugging fields
request_id: str # Unique request identifier
response_headers: Optional[Dict[str, str]] # HTTP response headers
rate_limit_remaining: Optional[int] # Rate limit info
timing: Optional[LLMTiming] # Performance metrics
metadata: Optional[LLMMetadata] # Request metadataclass LLMService:
@staticmethod
def call(req: LLMRequest) -> LLMResponse:
"""Make a non-streaming LLM call"""
@staticmethod
def stream(req: LLMRequest) -> Generator[str, None, None]:
"""Make a streaming LLM call"""Configured in llm_config.json:
{
"resilience": {
"max_retries": 3,
"backoff_factor": 1.5,
"retry_jitter": 0.5
}
}Automatically blocks failing providers:
- CLOSED: Normal operation
- OPEN: Blocking requests (after threshold failures)
- HALF_OPEN: Testing recovery (after timeout)
| Endpoint | Method | Description |
|---|---|---|
/api/llm/call |
POST | Make LLM call |
/api/providers |
GET | List providers |
/api/providers/<provider>/models |
GET | List models |
/api/config/reload |
POST | Reload config |
/api/config/status |
GET | Get configuration status |
/health |
GET | Health check with detailed status |
llmbase/
βββ llmservices.py # Main library
βββ llm_config.json # Model configuration
βββ .env # Environment secrets
βββ .env.example # Environment template
βββ requirements.txt # Dependencies
βββ README.md # This file
βββ LICENSE # MIT License
βββ .gitignore # Git ignore rules
βββ examples/ # Usage examples
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Ng Chong
- GitHub: @ngstcf
- Added structured logging system with configurable log levels
- Added
LLMMetadataclass for request/response tracking - Added
LLMTimingclass for performance metrics - Added
LLMErrorclass with enhanced error context - Added
LLMConfig.get_status()for configuration transparency - Added debug mode (
LLM_DEBUG) for verbose logging - Enhanced health check endpoint with detailed status
- Added
/api/config/statusendpoint for debugging - Added
request_idtracking for all requests - Updated
LLMResponsewith debugging fields
- Added xAI/Grok provider support
- Added Grok 4 reasoning model support
- Fixed Ollama API key to be optional
- Added DeepSeek provider support
- Added DeepSeek reasoning model (R1) support
- Added thinking tokens support via extra_body
- Added conditional Flask imports
- Library mode now works without Flask
- Added
LLM_API_MODEenvironment variable - Added
run_api_server()function
- Added
json_modesupport for all providers - Improved JSON output handling
Built for the AI community