LLM Provider Abstraction Layer¶
Nexus v0.4.0 introduces a unified LLM provider abstraction layer that supports multiple AI providers with a consistent interface.
Overview¶
The LLM provider abstraction layer provides:
- Multi-provider support: Anthropic Claude, OpenAI GPT, Google Gemini, and more via litellm
- Unified interface: Consistent API across all providers
- Function calling: Tool/function calling support across providers
- Vision support: Image input handling for vision-capable models
- Token counting: Accurate token counting with caching
- Cost tracking: Automatic cost calculation and tracking
- Metrics storage: Integration with Nexus metadata database
- Response caching: CAS-based caching for LLM responses (planned)
- Retry logic: Exponential backoff with configurable retry parameters
Installation¶
The LLM provider dependencies are included in the base Nexus installation:
Dependencies installed: - litellm>=1.0 - Multi-provider LLM support - tiktoken>=0.5 - Token counting for OpenAI models - anthropic>=0.40 - Native Anthropic SDK
Quick Start¶
Recommended: OpenRouter (One Key, All Models)¶
The simplest way to use multiple LLM providers is through OpenRouter, which gives you access to Claude, GPT-4, Gemini, and 200+ other models with a single API key.
# Get your OpenRouter API key from https://openrouter.ai/keys
export OPENROUTER_API_KEY="sk-or-v1-..."
import os
from pydantic import SecretStr
from nexus.llm import LLMConfig, LLMProvider, Message, MessageRole
# One API key for all models!
api_key = SecretStr(os.getenv("OPENROUTER_API_KEY"))
# Use Claude (prefix with "openrouter/" for OpenRouter routing)
config = LLMConfig(
model="openrouter/anthropic/claude-3.5-sonnet",
api_key=api_key,
temperature=0.7,
)
# Or use GPT-4
# config = LLMConfig(
# model="openrouter/openai/gpt-4o",
# api_key=api_key,
# )
# Or use Gemini
# config = LLMConfig(
# model="openrouter/google/gemini-pro",
# api_key=api_key,
# )
# Note: OpenRouter model IDs may differ from direct provider IDs.
# Check available models at https://openrouter.ai/models
provider = LLMProvider.from_config(config)
# Create messages
messages = [
Message(role=MessageRole.SYSTEM, content="You are a helpful assistant."),
Message(role=MessageRole.USER, content="What is the capital of France?"),
]
# Send request
response = provider.complete(messages)
print(response.content)
print(f"Cost: ${response.cost:.6f}")
Alternative: Direct Provider Keys¶
You can also use provider-specific API keys directly:
# Anthropic direct
config = LLMConfig(
model="claude-sonnet-4-20250514",
api_key=SecretStr(os.getenv("ANTHROPIC_API_KEY")),
)
# OpenAI direct
config = LLMConfig(
model="gpt-4o",
api_key=SecretStr(os.getenv("OPENAI_API_KEY")),
)
Function Calling¶
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
},
},
"required": ["location"],
},
},
}
]
# Request with tools
response = provider.complete(messages, tools=tools)
# Handle tool calls
if response.tool_calls:
for tool_call in response.tool_calls:
function_name = tool_call["function"]["name"]
arguments = tool_call["function"]["arguments"]
print(f"Function: {function_name}")
print(f"Arguments: {arguments}")
Streaming¶
Token Counting¶
# Count tokens before sending
token_count = provider.count_tokens(messages)
print(f"Estimated tokens: {token_count}")
Configuration¶
LLMConfig Options¶
config = LLMConfig(
# Model configuration
model="claude-sonnet-4-20250514", # Model name
api_key=SecretStr("your-key"), # API key
base_url=None, # Custom API endpoint
api_version=None, # API version (Azure)
custom_llm_provider=None, # Custom provider name
# Generation parameters
temperature=0.7, # 0.0-2.0
max_output_tokens=4096, # Max tokens to generate
max_input_tokens=None, # Max input tokens (auto-detected)
top_p=1.0, # Nucleus sampling
seed=None, # Random seed
# Timeout and retry
timeout=120.0, # Timeout in seconds
num_retries=3, # Number of retries
retry_min_wait=4.0, # Min wait between retries
retry_max_wait=10.0, # Max wait between retries
retry_multiplier=2.0, # Exponential backoff multiplier
# Features
native_tool_calling=None, # Enable function calling (auto-detect)
caching_prompt=False, # Enable prompt caching (Claude)
disable_vision=False, # Disable vision capabilities
# Advanced
custom_tokenizer=None, # Custom tokenizer name
reasoning_effort=None, # "low"/"medium"/"high" (o1/o3 models)
input_cost_per_token=None, # Custom input cost
output_cost_per_token=None, # Custom output cost
)
Why OpenRouter?¶
| Approach | API Keys Needed | Supported Models | Key Management |
|---|---|---|---|
| OpenRouter (Recommended) | 1 key | 200+ models (Claude, GPT, Gemini, Llama, etc.) | Single key to manage |
| Direct providers | 3+ keys | Only provider's own models | Multiple keys, multiple accounts |
Benefits of OpenRouter: - ✅ One API key for all models - ✅ Easy model switching (just change model name) - ✅ Unified billing and analytics - ✅ No need for multiple provider accounts - ✅ Access to 200+ models including latest releases - ✅ Free tier available for testing
Get your OpenRouter key: https://openrouter.ai/keys
Supported Providers¶
The abstraction layer supports all providers available through litellm:
Anthropic Claude¶
config = LLMConfig(
model="claude-sonnet-4-20250514",
api_key=SecretStr(os.getenv("ANTHROPIC_API_KEY")),
caching_prompt=True, # Enable prompt caching
)
Models: - claude-opus-4-20250514 - claude-sonnet-4-20250514 - claude-3-7-sonnet-20250219 - claude-3-5-sonnet-20241022 - claude-3-5-haiku-20241022 - claude-3-opus-20240229
OpenAI¶
Models: - gpt-4o - gpt-4o-mini - o1-2024-12-17 - o3-mini - o3
Google Gemini¶
Models: - gemini-2.5-pro - gemini-pro
OpenRouter (Recommended)¶
Access 200+ models with one API key. Get your key at https://openrouter.ai/keys
config = LLMConfig(
model="openrouter/anthropic/claude-3.5-sonnet",
api_key=SecretStr(os.getenv("OPENROUTER_API_KEY")),
)
# Other popular models via OpenRouter:
# model="openrouter/openai/gpt-4o"
# model="openrouter/google/gemini-pro"
# model="openrouter/meta-llama/llama-3.3-70b-instruct"
# model="openrouter/qwen/qwen-2.5-72b-instruct"
Notes: - The openrouter/ prefix tells litellm to route through OpenRouter - OpenRouter model IDs may differ from direct provider IDs - Check available models at https://openrouter.ai/models
Azure OpenAI¶
config = LLMConfig(
model="azure/gpt-4o",
api_key=SecretStr(os.getenv("AZURE_API_KEY")),
base_url="https://your-resource.openai.azure.com",
api_version="2024-02-15-preview",
)
Metrics Tracking¶
The provider automatically tracks metrics:
# Access metrics
print(f"Total cost: ${provider.metrics.accumulated_cost:.6f}")
print(f"Total requests: {provider.metrics.total_requests}")
print(f"Average latency: {provider.metrics.average_latency:.2f}s")
# Token usage
usage = provider.metrics.accumulated_token_usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Cache read tokens: {usage.cache_read_tokens}")
print(f"Cache write tokens: {usage.cache_write_tokens}")
# Reset metrics
provider.reset_metrics()
Persistent Metrics¶
Metrics can be saved to Nexus metadata database:
from nexus.llm import MetricsStore
store = MetricsStore(metadata_path="/path/to/nexus/metadata.db")
# Save metrics
store.save_metrics(provider.metrics, session_id="session-123")
# Load metrics
metrics = store.load_metrics(session_id="session-123")
Response Caching¶
Response caching using Nexus CAS (planned for future release):
from nexus.llm import CachedLLMProvider
provider = CachedLLMProvider.from_config(
config,
cache_backend=nexus_cas, # Nexus CAS instance
cache_ttl=3600, # Cache TTL in seconds
)
# Responses are automatically cached by message hash
response = provider.complete(messages) # Cache miss
response = provider.complete(messages) # Cache hit (faster, free)
Error Handling¶
from nexus.llm import (
LLMRateLimitError,
LLMTimeoutError,
LLMAuthenticationError,
LLMNoResponseError,
)
try:
response = provider.complete(messages)
except LLMRateLimitError as e:
print("Rate limit exceeded, retrying...")
except LLMTimeoutError as e:
print("Request timed out")
except LLMAuthenticationError as e:
print("Invalid API key")
except LLMNoResponseError as e:
print("No response from provider")
Best Practices¶
1. Use Environment Variables for API Keys¶
import os
from pydantic import SecretStr
config = LLMConfig(
model="claude-sonnet-4",
api_key=SecretStr(os.getenv("ANTHROPIC_API_KEY")),
)
2. Enable Prompt Caching for Claude¶
config = LLMConfig(
model="claude-sonnet-4-20250514",
api_key=SecretStr(api_key),
caching_prompt=True, # Reduces costs for repeated context
)
3. Monitor Token Usage¶
# Check tokens before sending
tokens = provider.count_tokens(messages)
if tokens > provider.config.max_input_tokens:
print("Warning: Message too long, truncating...")
4. Use Appropriate Timeouts¶
config = LLMConfig(
model="claude-sonnet-4",
api_key=SecretStr(api_key),
timeout=60.0, # Shorter timeout for simple requests
num_retries=5, # More retries for important requests
)
5. Track Costs¶
Examples¶
See examples/llm_demo.py for complete examples:
Testing¶
Run tests:
# Unit tests (no API calls)
pytest tests/test_llm.py
# Integration tests (requires API keys)
export ANTHROPIC_API_KEY="your-key"
pytest tests/test_llm.py --run-integration
Roadmap¶
v0.4.0 (Current)¶
- ✅ Multi-provider support (Anthropic, OpenAI, Google)
- ✅ Function calling
- ✅ Token counting
- ✅ Cost tracking
- ✅ Metrics collection
- ✅ Retry logic
v0.4.1 (Planned)¶
- ⏳ Response caching with Nexus CAS
- ⏳ Persistent metrics storage
- ⏳ Async support improvements
- ⏳ Better streaming error handling
v0.5.0 (Planned)¶
- ⏳ LLM-powered document analysis
- ⏳ Semantic search integration
- ⏳ RAG (Retrieval Augmented Generation)
- ⏳ Agent workflows
Architecture¶
The LLM provider abstraction is built on top of litellm with Nexus-specific enhancements:
┌─────────────────────────────────────────┐
│ Nexus Application Code │
└──────────────────┬──────────────────────┘
│
┌──────────────────▼──────────────────────┐
│ LLMProvider (nexus.llm) │
│ - Unified interface │
│ - Metrics tracking │
│ - Response caching (CAS) │
└──────────────────┬──────────────────────┘
│
┌──────────────────▼──────────────────────┐
│ LiteLLM Library │
│ - Multi-provider routing │
│ - Token counting │
│ - Cost calculation │
└──────────────────┬──────────────────────┘
│
┌──────────┼──────────┐
│ │ │
┌───────▼────┐ ┌──▼───────┐ ┌▼─────────┐
│ Anthropic │ │ OpenAI │ │ Google │
│ Claude │ │ GPT │ │ Gemini │
└────────────┘ └──────────┘ └──────────┘
Contributing¶
Contributions are welcome! Areas for improvement:
- Provider-specific optimizations: Better handling of provider-specific features
- Enhanced caching: More sophisticated cache invalidation strategies
- Metrics visualization: Dashboard for tracking LLM usage
- Cost optimization: Automatic model selection based on cost/quality tradeoffs
- Testing: More comprehensive test coverage
See CONTRIBUTING.md for guidelines.
License¶
Apache 2.0 - See LICENSE for details.