Sandbox Management¶
The Big Picture¶
Nexus Sandbox Management provides secure, isolated code execution environments for AI agents. Think of it as giving your agents a safe playground where they can run code, analyze data, and perform computations without affecting your main system.
Why Sandboxes?¶
AI agents often need to: - ✅ Execute Python/JavaScript/Bash code dynamically - ✅ Analyze data files (CSV, JSON, logs) - ✅ Run calculations and transformations - ✅ Test code snippets before deployment - ✅ Generate reports or visualizations
The problem: Running untrusted code is dangerous. You need isolation.
The solution: Nexus sandboxes provide ephemeral, cloud-based execution environments that automatically clean up.
Architecture¶
graph TB
subgraph "🤖 AI Agent Layer"
A1[Your Agent Code]
A2[CrewAI Agent]
A3[LangGraph Node]
end
subgraph "🔧 Nexus Sandbox API"
B1[sandbox_create<br/>Create ephemeral env]
B2[sandbox_run<br/>Execute code]
B3[sandbox_pause<br/>Save costs]
B4[sandbox_stop<br/>Cleanup]
B5[sandbox_list<br/>View all]
end
subgraph "☁️ Sandbox Providers"
C1[E2B<br/>Fast cloud sandboxes]
C2[Docker<br/>Local containers]
C3[Modal<br/>Serverless compute]
end
A1 & A2 & A3 --> B1 & B2 & B3 & B4 & B5
B1 & B2 & B3 & B4 & B5 --> C1 & C2 & C3 Key Components¶
- Sandbox Manager: Coordinates lifecycle (create, run, cleanup) across providers
- Provider Interface: Pluggable backends (E2B, Docker, Modal)
- TTL Management: Automatic cleanup of idle sandboxes
- Metadata Storage: Track sandbox state, expiry, usage
Core Concepts¶
1. Ephemeral Environments¶
Sandboxes are temporary by design: - Created on-demand - Automatically expire after TTL (default: 10 minutes) - Extended TTL with each code execution - Destroyed when stopped or expired
# Create sandbox with 30-minute TTL
sandbox = nx.sandbox_create(name="data-analysis", ttl_minutes=30)
# TTL refreshes on each code execution
nx.sandbox_run(sandbox_id, "python", code="print('Hello')")
# ↑ TTL extended by 30 minutes from now
2. Multi-Language Support¶
Run code in multiple languages: - Python: Data analysis, ML inference, scripting - JavaScript/Node.js: Async operations, JSON processing - Bash: System commands, file operations
# Python
nx.sandbox_run(sandbox_id, "python", code="""
import pandas as pd
df = pd.read_csv('data.csv')
print(df.describe())
""")
# JavaScript
nx.sandbox_run(sandbox_id, "javascript", code="""
const data = [1, 2, 3, 4, 5];
console.log(data.map(x => x * 2));
""")
# Bash
nx.sandbox_run(sandbox_id, "bash", code="""
ls -la /home/user
df -h
""")
3. Provider Abstraction¶
Switch providers without changing agent code:
# Development: Local Docker
nx = NexusFS(
backend=LocalBackend(),
sandbox_provider="docker"
)
# Production: E2B cloud sandboxes
nx = NexusFS(
backend=LocalBackend(),
sandbox_provider="e2b",
e2b_api_key=os.getenv("E2B_API_KEY")
)
# Same API works for both
sandbox = nx.sandbox_create("my-sandbox")
result = nx.sandbox_run(sandbox["sandbox_id"], "python", "print('Hello')")
4. Lifecycle States¶
stateDiagram-v2
[*] --> Creating
Creating --> Active: ✓ Created
Active --> Active: Run code (TTL reset)
Active --> Paused: pause_sandbox()
Paused --> Active: resume_sandbox()
Active --> Stopped: stop_sandbox()
Active --> Stopped: TTL expired
Stopped --> [*] - Creating: Provider is spinning up environment
- Active: Ready to execute code
- Paused: Saved state, no resource consumption (provider-dependent)
- Stopped: Destroyed, cannot be resumed
Quick Start: Python API¶
1. Create and Use a Sandbox¶
from nexus import NexusFS, LocalBackend
# Initialize with E2B provider
nx = NexusFS(backend=LocalBackend(), is_admin=True)
# Create sandbox
sandbox = nx.sandbox_create(
name="my-python-sandbox",
ttl_minutes=20
)
sandbox_id = sandbox["sandbox_id"]
print(f"Created: {sandbox_id}")
# Run Python code
result = nx.sandbox_run(
sandbox_id=sandbox_id,
language="python",
code="""
import sys
print(f"Python {sys.version}")
print("2 + 2 =", 2 + 2)
""",
timeout=30 # seconds
)
print("STDOUT:", result["stdout"])
print("Exit Code:", result["exit_code"])
print("Time:", result["execution_time"], "seconds")
# Cleanup
nx.sandbox_stop(sandbox_id=sandbox_id)
2. List and Monitor Sandboxes¶
# List all sandboxes for current user
sandboxes = nx.sandbox_list()
for sb in sandboxes["sandboxes"]:
print(f"{sb['name']}: {sb['status']} (expires: {sb['expires_at']})")
# Get detailed status
status = nx.sandbox_status(sandbox_id=sandbox_id)
print(f"Uptime: {status['uptime_seconds']} seconds")
print(f"Last active: {status['last_active_at']}")
Quick Start: CLI¶
Basic Commands¶
# Set E2B credentials
export E2B_API_KEY="your-e2b-key"
export E2B_TEMPLATE_ID="your-template-id" # Optional
# Create sandbox
nexus sandbox create my-sandbox --ttl 30
# Output: Created sandbox: sb_abc123...
# Run Python code
nexus sandbox run sb_abc123 -c "print('Hello from sandbox!')"
# Run from file
nexus sandbox run sb_abc123 -f script.py
# Run from stdin
echo "console.log('test')" | nexus sandbox run sb_abc123 -l javascript -c -
# List all sandboxes
nexus sandbox list
# Get status
nexus sandbox status sb_abc123
# Stop sandbox
nexus sandbox stop sb_abc123
JSON Output for Scripting¶
# Create and capture sandbox_id
SANDBOX_ID=$(nexus sandbox create test --json | jq -r '.sandbox_id')
# Run code and parse results
nexus sandbox run $SANDBOX_ID -c "print(42)" --json | jq -r '.stdout'
# Output: 42
# Check status
nexus sandbox status $SANDBOX_ID --json | jq '.status'
# Output: "active"
Advanced Features¶
1. Timeout Control¶
Prevent runaway code with execution timeouts:
# Quick timeout (5 seconds)
try:
result = nx.sandbox_run(
sandbox_id,
"python",
code="import time; time.sleep(10)",
timeout=5 # Will timeout
)
except Exception as e:
print(f"Timeout: {e}") # Code execution exceeded 5 second timeout
2. Code Input Methods¶
Multiple ways to provide code:
# Inline code
nx.sandbox_run(sandbox_id, "python", code="print('inline')")
# From file (CLI)
# nexus sandbox run sb_123 -f data_analysis.py
# From stdin (CLI)
# cat script.py | nexus sandbox run sb_123 -c -
3. Error Handling¶
Capture and handle execution errors:
result = nx.sandbox_run(
sandbox_id,
"python",
code="print(1/0)" # Division by zero
)
if result["exit_code"] != 0:
print("Error occurred:")
print(result["stderr"])
# ZeroDivisionError: division by zero
4. Pause/Resume (Provider-Dependent)¶
Save costs by pausing idle sandboxes:
# Pause sandbox (stops billing)
nx.sandbox_pause(sandbox_id)
# Resume later
nx.sandbox_resume(sandbox_id)
# Note: E2B doesn't support pause/resume
# Use stop instead for E2B
Integration Examples¶
Example 1: Data Analysis Agent¶
from nexus import NexusFS, LocalBackend
nx = NexusFS(backend=LocalBackend(), is_admin=True)
# Agent analyzes CSV data
def analyze_csv_data(csv_path: str) -> dict:
# Create temporary sandbox
sandbox = nx.sandbox_create(name="csv-analyzer", ttl_minutes=15)
sandbox_id = sandbox["sandbox_id"]
try:
# Read CSV from Nexus
csv_content = nx.read(csv_path)
# Run analysis code
analysis_code = f"""
import pandas as pd
import io
# Load data
data = {repr(csv_content.decode())}
df = pd.read_csv(io.StringIO(data))
# Analyze
print("Rows:", len(df))
print("Columns:", list(df.columns))
print("\\nSummary:")
print(df.describe())
"""
result = nx.sandbox_run(sandbox_id, "python", analysis_code)
return {
"success": result["exit_code"] == 0,
"output": result["stdout"],
"time": result["execution_time"]
}
finally:
# Always cleanup
nx.sandbox_stop(sandbox_id)
# Use it
report = analyze_csv_data("/workspace/data/sales.csv")
print(report["output"])
Example 2: Multi-Language Code Execution¶
def execute_multi_language(sandbox_id: str, tasks: list) -> list:
"""Execute tasks in different languages"""
results = []
for task in tasks:
result = nx.sandbox_run(
sandbox_id=sandbox_id,
language=task["language"],
code=task["code"],
timeout=task.get("timeout", 30)
)
results.append({
"task": task["name"],
"language": task["language"],
"success": result["exit_code"] == 0,
"output": result["stdout"],
"time": result["execution_time"]
})
return results
# Run mixed workload
sandbox = nx.sandbox_create("multi-lang", ttl_minutes=30)
tasks = [
{
"name": "process-json",
"language": "javascript",
"code": "console.log(JSON.stringify({result: 42}))"
},
{
"name": "analyze-data",
"language": "python",
"code": "print(sum([1, 2, 3, 4, 5]))"
},
{
"name": "check-env",
"language": "bash",
"code": "echo $USER && pwd"
}
]
results = execute_multi_language(sandbox["sandbox_id"], tasks)
Example 3: CrewAI Integration¶
from crewai import Agent, Task, Crew
from nexus import NexusFS, LocalBackend
nx = NexusFS(backend=LocalBackend(), is_admin=True)
# Create long-lived sandbox for agent
sandbox = nx.sandbox_create("crewai-coder", ttl_minutes=60)
sandbox_id = sandbox["sandbox_id"]
# Python code executor agent
code_executor = Agent(
role="Code Executor",
goal="Execute Python code safely in sandboxes",
backstory="I run code in isolated environments",
tools=[], # Add custom tools that use nx.sandbox_run()
allow_delegation=False
)
# Task: Execute data analysis
analyze_task = Task(
description="""
Execute this Python code in sandbox {sandbox_id}:
```python
import statistics
data = [1, 2, 3, 4, 5, 100]
print(f"Mean: {statistics.mean(data)}")
print(f"Median: {statistics.median(data)}")
```
""".format(sandbox_id=sandbox_id),
agent=code_executor,
expected_output="Analysis results"
)
# Run crew
crew = Crew(agents=[code_executor], tasks=[analyze_task])
result = crew.kickoff()
# Cleanup
nx.sandbox_stop(sandbox_id)
Production Considerations¶
1. TTL Management¶
Choose appropriate TTLs based on use case:
# Short-lived: One-off analysis
nx.sandbox_create("quick-task", ttl_minutes=5)
# Medium: Interactive sessions
nx.sandbox_create("user-session", ttl_minutes=30)
# Long: Background processing
nx.sandbox_create("batch-job", ttl_minutes=120)
2. Automatic Cleanup¶
Nexus automatically cleans up expired sandboxes:
# Background task runs every 5 minutes
# Checks for sandboxes where: expires_at < now()
# Calls sandbox_stop() on expired sandboxes
3. Cost Optimization¶
E2B Pricing: Charged per second of sandbox runtime - Use shorter TTLs for infrequent use - Reuse sandboxes for multiple operations - Stop sandboxes explicitly when done
# GOOD: Reuse sandbox for batch
sandbox = nx.sandbox_create("batch-processor")
for task in tasks:
nx.sandbox_run(sandbox["sandbox_id"], "python", task["code"])
nx.sandbox_stop(sandbox["sandbox_id"])
# BAD: Create new sandbox per task
for task in tasks:
sandbox = nx.sandbox_create(f"task-{task['id']}")
nx.sandbox_run(sandbox["sandbox_id"], "python", task["code"])
nx.sandbox_stop(sandbox["sandbox_id"])
4. Error Recovery¶
Always wrap sandbox operations in try/finally:
sandbox = nx.sandbox_create("critical-task")
try:
result = nx.sandbox_run(
sandbox["sandbox_id"],
"python",
code=dangerous_code
)
return result
except Exception as e:
logger.error(f"Sandbox execution failed: {e}")
raise
finally:
# Always cleanup, even on error
nx.sandbox_stop(sandbox["sandbox_id"])
5. Security Best Practices¶
- ✅ Never pass user credentials to sandbox code
- ✅ Validate and sanitize code before execution
- ✅ Use timeouts to prevent resource exhaustion
- ✅ Monitor sandbox usage for anomalies
- ✅ Use separate sandboxes for different tenants
Provider Comparison¶
| Feature | E2B | Docker | Modal |
|---|---|---|---|
| Speed | Very fast (2-5s) | Medium (10-30s) | Fast (5-10s) |
| Cost | Pay-per-second | Free (local) | Pay-per-invocation |
| Scalability | High | Low (local only) | High |
| Pause/Resume | ❌ No | ✅ Yes | ❌ No |
| Languages | Python, Node, Bash | All | Python, custom |
| Use Case | Production, AI agents | Development, testing | Batch processing |
Choosing a Provider¶
Use E2B when: - Building production AI agents - Need fast startup times (< 5 seconds) - Want zero infrastructure management - Okay with cloud-only deployment
Use Docker when: - Developing locally - Need full container customization - Want pause/resume capability - Have on-premise requirements
Use Modal when: - Running batch jobs at scale - Need GPU support - Prefer serverless billing model
Troubleshooting¶
Problem: Sandbox creation fails¶
# Error: E2B API key required
# Solution: Set environment variable
import os
os.environ["E2B_API_KEY"] = "your-key"
Problem: Code execution times out¶
# Error: Code execution exceeded 30 second timeout
# Solution: Increase timeout
result = nx.sandbox_run(sandbox_id, "python", code, timeout=60)
Problem: Sandbox list is empty¶
# Sandboxes are user-scoped
# Check you're authenticated as correct user
sandboxes = nx.sandbox_list()
print(f"User: {nx.current_user}")
Problem: Cannot subtract offset-naive and offset-aware datetimes¶
# This is fixed in Nexus (see sandbox_manager.py:377-407)
# If you encounter this, upgrade to latest version
Monitoring and Observability¶
Track Sandbox Usage¶
# Get all active sandboxes
sandboxes = nx.sandbox_list()
active = [s for s in sandboxes["sandboxes"] if s["status"] == "active"]
print(f"Active sandboxes: {len(active)}")
for sb in active:
print(f" {sb['name']}: {sb['uptime_seconds']}s uptime")
Log Execution Results¶
import logging
logger = logging.getLogger(__name__)
def run_with_logging(sandbox_id: str, code: str):
result = nx.sandbox_run(sandbox_id, "python", code)
logger.info(
f"Sandbox execution",
extra={
"sandbox_id": sandbox_id,
"exit_code": result["exit_code"],
"execution_time": result["execution_time"],
"stdout_length": len(result["stdout"])
}
)
return result
Next Steps¶
Learn More¶
- E2B Documentation - E2B provider details
- Docker Integration - Docker provider setup
- Agent Permissions - Control sandbox access
Examples¶
- Sandbox Comprehensive Demo - Full CLI demo
- CrewAI + Sandboxes - Agent integration
- LangGraph Code Executor - LangGraph example
Related Concepts¶
- Agent Permissions - Multi-agent sandbox access
- Workflows - Trigger sandboxes on events
- Multi-Tenancy - Isolated sandboxes per tenant
API Reference¶
Python API¶
# Create sandbox
sandbox_create(name: str, ttl_minutes: int = 10, template_id: str | None = None) -> dict
# Run code
sandbox_run(sandbox_id: str, language: str, code: str, timeout: int = 30) -> dict
# Pause/resume
sandbox_pause(sandbox_id: str) -> dict
sandbox_resume(sandbox_id: str) -> dict
# Stop
sandbox_stop(sandbox_id: str) -> dict
# List/status
sandbox_list() -> dict
sandbox_status(sandbox_id: str) -> dict
CLI Reference¶
nexus sandbox create <name> [--ttl MINUTES] [--template ID] [--json]
nexus sandbox run <sandbox_id> -c <code> [-l LANGUAGE] [--timeout SECONDS] [--json]
nexus sandbox pause <sandbox_id> [--json]
nexus sandbox resume <sandbox_id> [--json]
nexus sandbox stop <sandbox_id> [--json]
nexus sandbox list [--json]
nexus sandbox status <sandbox_id> [--json]
Frequently Asked Questions¶
Q: How long do sandboxes live?¶
A: Default TTL is 10 minutes. TTL resets on each code execution. Manually stopped or expired sandboxes are destroyed.
Q: Can multiple agents share one sandbox?¶
A: Yes, but consider isolation. For multi-tenant systems, use separate sandboxes per tenant/user.
Q: What happens if I don't stop a sandbox?¶
A: It will auto-expire after TTL. Background cleanup task destroys expired sandboxes every 5 minutes.
Q: Can I install custom packages?¶
A: Yes, with custom E2B templates. See E2B Custom Templates.
Q: Is there a limit on execution time?¶
A: Yes, configurable via timeout parameter (default: 30 seconds, max depends on provider).
Q: Can I access the network from sandboxes?¶
A: Yes, sandboxes have full network access (for API calls, downloads, etc.).
Q: How do I debug sandbox code?¶
A: Check stdout, stderr, and exit_code in execution results. Use print statements liberally.
Summary¶
Nexus Sandbox Management provides: ✅ Secure, isolated code execution ✅ Multi-language support (Python, JavaScript, Bash) ✅ Automatic TTL-based cleanup ✅ Provider abstraction (E2B, Docker, Modal) ✅ Simple API for AI agents
Start building: Quick Start Guide