Multi-Backend Storage¶
⏱️ Time: 20 minutes | 📊 Difficulty: Intermediate | 🎯 Goal: Master hybrid storage with local + cloud backends
What You'll Learn¶
By the end of this guide, you'll understand how to:
- ✅ Start Nexus server with local backend
- ✅ Add GCS backend dynamically at runtime
- ✅ Route different paths to different backends
- ✅ Implement hybrid storage patterns (hot/cold data)
- ✅ Optimize costs with tiered storage
- ✅ Build production-ready multi-cloud architectures
Prerequisites¶
- Nexus installed (
pip install nexus-ai-fs) - GCS bucket and service account credentials (optional but recommended)
- Basic understanding of Nexus file operations
Why Multi-Backend Storage?¶
Traditional approaches force you to choose ONE storage system:
❌ Traditional Problems: - Store everything in expensive cloud storage OR slow local disk - Rewrite code to switch between storage providers - Manual migration when requirements change - Vendor lock-in
✅ Nexus Multi-Backend Solution: - Mix-and-match: Local for hot data, GCS for archives - Transparent routing: Same API, different backends - Cost optimization: Right storage tier for each use case - Zero vendor lock-in: Switch backends without code changes
Real-World Use Cases¶
| Use Case | Storage Strategy |
|---|---|
| AI Training | /datasets/raw → GCS (durable)/cache/processed → Local (fast) |
| Document Processing | /inbox → Local (temporary)/archive → GCS (long-term) |
| Multi-Region Apps | /us-west → GCS us-west1/eu → GCS europe-west1 |
| Cost Optimization | /hot → Local SSD/warm → Local HDD/cold → GCS Coldline |
Part 1: Server Setup¶
Step 1: Start Nexus Server¶
Start Nexus server with database authentication:
# First-time setup: Initialize server with admin user
nexus serve --auth-type database --init --port 8765
# Output:
# ✓ Admin user created: admin
# ✓ API key: nxk_abc123def456...
# Save this API key - you'll need it for client connections!
Save the API key - this is the only time it will be displayed!
For subsequent starts (after initialization):
Step 2: Create Admin User and API Key¶
# Create admin user (if needed)
nexus admin create-user admin \
--name "Admin User" \
--email admin@company.com
# Generate API key for admin
nexus admin create-user-key admin
# Output: nxk_abc123def456...
Part 2: Working with Multiple Backends¶
Step 1: Connect Client to Server¶
Use the recommended nexus.connect() pattern which auto-detects remote mode:
import nexus
# Connect to server (auto-detects remote mode from NEXUS_URL)
# export NEXUS_URL=http://localhost:8765
# export NEXUS_API_KEY=nxk_abc123...
nx = nexus.connect()
# Or connect explicitly
nx = nexus.connect(config={
"url": "http://localhost:8765",
"api_key": "nxk_abc123..."
})
Alternatively, use RemoteNexusFS directly:
from nexus import RemoteNexusFS
# Direct remote connection
nx = RemoteNexusFS(
server_url="http://localhost:8765",
api_key="nxk_abc123..."
)
Step 2: Add GCS Backend Dynamically¶
Now add a GCS backend for archival storage:
# Add GCS mount for archives
gcs_mount_id = nx.mount_manager.add_mount(
mount_point="/archives",
backend_type="gcs",
backend_config={
"bucket": "my-nexus-archives",
"project_id": "my-gcp-project"
# credentials_path: "/path/to/service-account.json" # Optional
},
priority=10,
readonly=False
)
print(f"✓ GCS backend mounted at /archives (ID: {gcs_mount_id})")
# Add second GCS mount for shared datasets (read-only)
dataset_mount_id = nx.mount_manager.add_mount(
mount_point="/datasets",
backend_type="gcs",
backend_config={
"bucket": "company-ml-datasets",
"project_id": "my-gcp-project"
},
priority=20,
readonly=True # Prevent accidental modifications
)
print(f"✓ Dataset backend mounted at /datasets (ID: {dataset_mount_id})")
Step 3: List Available Mounts¶
# List all mounts
mounts = nx.list_mounts()
for mount in mounts:
print(f"📂 {mount.mount_point}: {mount.backend_type} "
f"(priority={mount.priority}, readonly={mount.readonly})")
# Output:
# 📂 /workspace: local (priority=0, readonly=False)
# 📂 /archives: gcs (priority=10, readonly=False)
# 📂 /datasets: gcs (priority=20, readonly=True)
Step 4: Write to Different Backends¶
The same API works across all backends - Nexus routes automatically:
# Write to default local backend
nx.write("/workspace/active-task.txt", b"Processing...")
print("✓ Written to: Local backend")
# Write to GCS archive (durable, cost-effective)
nx.write("/archives/2024/report.pdf", b"PDF content...")
print("✓ Written to: GCS (gs://my-nexus-archives/...)")
# Try to write to read-only dataset (will fail)
try:
nx.write("/datasets/new-data.csv", b"data")
except Exception as e:
print(f"✗ Cannot write to read-only mount: {e}")
# Read works from any backend
data = nx.read("/datasets/reference/model-weights.bin")
print(f"✓ Read {len(data)} bytes from GCS dataset")
Key Insight: Your code doesn't know or care where data is stored. Nexus handles routing automatically based on path.
Step 5: Path Routing and Priority¶
Understanding how Nexus routes paths to backends:
# Example: Add overlapping mount with higher priority
nx.mount_manager.add_mount(
mount_point="/workspace/shared",
backend_type="gcs",
backend_config={"bucket": "team-shared", "project_id": "my-project"},
priority=10 # Higher than default /workspace (priority=0)
)
# Routing examples:
nx.write("/workspace/file.txt", b"data")
# → Matches: /workspace (local, priority=0)
# ✓ Routed to: local backend
nx.write("/workspace/shared/team-doc.txt", b"data")
# → Matches: /workspace (local, priority=0)
# /workspace/shared (gcs, priority=10)
# ✓ Routed to: gcs backend (longer prefix + higher priority)
# Get routing information for a path
mount = nx.get_mount_info("/workspace/shared")
if mount:
print(f"Path /workspace/shared routes to: {mount.backend_type}")
Routing Algorithm: 1. Find all mounts with matching path prefix 2. Sort by: priority (DESC) → prefix length (DESC) 3. Return first match
Part 3: Production Patterns¶
Pattern 1: Hot/Cold Data Tiering¶
Optimize costs by storing frequently-accessed data locally, archive to cloud:
from datetime import datetime, timedelta
def tier_old_files(nx, hot_path: str, cold_path: str, days_threshold: int = 30):
"""Move old files from hot (local) to cold (GCS) storage"""
cutoff_date = datetime.now() - timedelta(days=days_threshold)
files_moved = 0
# List files in hot storage
for entry in nx.list(hot_path, recursive=True):
if not entry['is_directory']:
file_path = entry['path']
# Get file metadata
metadata = nx.stat(file_path)
created_at = datetime.fromisoformat(metadata['created_at'])
# Move to cold storage if older than threshold
if created_at < cutoff_date:
# Construct cold storage path
rel_path = file_path.replace(hot_path, "")
cold_file = f"{cold_path}{rel_path}"
# Copy to cold storage
content = nx.read(file_path)
nx.write(cold_file, content)
# Delete from hot storage
nx.remove(file_path)
files_moved += 1
print(f"📦 Archived: {file_path} → {cold_file}")
print(f"✓ Moved {files_moved} files to cold storage")
return files_moved
# Example: Archive old workspace files to GCS
tier_old_files(nx,
hot_path="/workspace/processed",
cold_path="/archives/workspace-archive",
days_threshold=7 # Move files older than 7 days
)
Pattern 2: Content Deduplication Across Backends¶
Nexus automatically deduplicates content within each backend using SHA-256 hashing:
# Same content written to different paths
content = b"Shared configuration data"
# Write to local backend
nx.write("/workspace/config.yaml", content)
nx.write("/workspace/backup/config.yaml", content) # Deduplicated!
# Write to GCS backend
nx.write("/archives/configs/v1.yaml", content)
nx.write("/archives/configs/v2.yaml", content) # Deduplicated!
# Each backend stores content ONCE, both paths reference same content hash
print("✓ 4 paths, 2 backends, only 2 physical copies (one per backend)")
# Note: Deduplication is per-backend (local + GCS each store one copy)
# This is by design - each backend is independent
Pattern 3: Multi-Tenant Storage Isolation¶
Each tenant gets their own backend mount:
def provision_tenant_storage(nx, tenant_id: str, gcs_bucket: str):
"""Provision isolated storage for a new tenant"""
# Create tenant-specific mount
mount_id = nx.mount_manager.add_mount(
mount_point=f"/tenants/{tenant_id}",
backend_type="gcs",
backend_config={
"bucket": gcs_bucket,
"prefix": f"tenant-{tenant_id}/", # Bucket prefix isolation
"project_id": "my-gcp-project"
},
priority=20
)
# Create tenant workspace structure
nx.mkdir(f"/tenants/{tenant_id}/documents")
nx.mkdir(f"/tenants/{tenant_id}/uploads")
nx.mkdir(f"/tenants/{tenant_id}/exports")
print(f"✓ Tenant {tenant_id} provisioned with mount {mount_id}")
return mount_id
# Provision storage for new tenants
provision_tenant_storage(nx, "acme-corp", "nexus-prod-storage")
provision_tenant_storage(nx, "globex", "nexus-prod-storage")
# Each tenant's data is isolated
nx.write("/tenants/acme-corp/documents/contract.pdf", b"...")
nx.write("/tenants/globex/documents/proposal.pdf", b"...")
# Tenants CANNOT access each other's data (enforced by path isolation + ReBAC)
Pattern 4: Hybrid Cloud Bursting¶
Start with local storage, burst to cloud when capacity is reached:
def smart_write(nx, path: str, content: bytes, local_limit_gb: float = 100.0):
"""Write to local if space available, otherwise use cloud"""
# Check local storage usage
local_usage = get_local_storage_usage() # Custom function
if local_usage < local_limit_gb:
# Write to fast local storage
local_path = f"/workspace{path}"
nx.write(local_path, content)
print(f"✓ Written to local: {local_path}")
else:
# Burst to cloud storage
cloud_path = f"/archives{path}"
nx.write(cloud_path, content)
print(f"☁️ Burst to cloud: {cloud_path}")
def get_local_storage_usage() -> float:
"""Get local storage usage in GB"""
import shutil
stat = shutil.disk_usage("/var/nexus/data")
return (stat.total - stat.free) / (1024**3)
# Usage
for i in range(1000):
smart_write(nx, f"/data/file-{i}.bin", b"x" * 1024 * 1024) # 1MB files
Part 4: GCS Backend Setup¶
Prerequisites¶
- GCS Bucket: Create a bucket in Google Cloud Console
- Service Account: Create service account with
Storage Object Adminrole - Credentials: Download JSON key file
Step 1: Create GCS Bucket¶
# Using gcloud CLI (or via console)
gsutil mb -l us-west1 gs://my-nexus-storage
gsutil mb -l us-west1 gs://my-nexus-archives
# Set lifecycle policy for cost optimization (optional)
gsutil lifecycle set lifecycle-policy.json gs://my-nexus-archives
Example lifecycle-policy.json:
{
"lifecycle": {
"rule": [
{
"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
"condition": {"age": 30, "matchesStorageClass": ["STANDARD"]}
},
{
"action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
"condition": {"age": 90, "matchesStorageClass": ["NEARLINE"]}
}
]
}
}
Step 2: Configure Service Account¶
# Set credentials via environment variable (recommended for server)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
# Then add mount via Python API (shown in Part 2)
Step 3: Test GCS Connection¶
import nexus
nx = nexus.connect(config={
"url": "http://localhost:8765",
"api_key": "nxk_abc123..."
})
# Add GCS mount
nx.mount_manager.add_mount(
mount_point="/archives",
backend_type="gcs",
backend_config={
"bucket": "my-nexus-archives",
"project_id": "my-gcp-project"
},
priority=10
)
# Test write to GCS
test_path = "/archives/test/hello.txt"
nx.write(test_path, b"Hello from Nexus!")
print(f"✓ Written to GCS: {test_path}")
# Test read from GCS
content = nx.read(test_path)
print(f"✓ Read from GCS: {content.decode()}")
# Verify in GCS bucket
# gsutil ls gs://my-nexus-archives/cas/
# Should see content-addressed files
Part 5: Local-Only Multi-Backend (No GCS)¶
If you don't have GCS set up, you can still learn multi-backend concepts with multiple local backends:
import nexus
# Start server with default local backend
# nexus serve --auth-type database --init --port 8765
nx = nexus.connect(config={
"url": "http://localhost:8765",
"api_key": "nxk_abc123..."
})
# Add multiple local backends (simulating different storage tiers)
# Fast SSD storage
nx.mount_manager.add_mount(
mount_point="/fast",
backend_type="local",
backend_config={"data_dir": "/tmp/nexus/fast-ssd"},
priority=20
)
# Slow but large storage (simulating archive)
nx.mount_manager.add_mount(
mount_point="/archive",
backend_type="local",
backend_config={"data_dir": "/tmp/nexus/archive-hdd"},
priority=10
)
# Temporary scratch space
nx.mount_manager.add_mount(
mount_point="/scratch",
backend_type="local",
backend_config={"data_dir": "/tmp/nexus/scratch"},
priority=5
)
# Use exactly the same API with multiple local backends
nx.write("/fast/active-model.bin", model_data)
nx.write("/archive/2024-q1-data.tar.gz", archive_data)
nx.write("/scratch/temp-results.json", temp_data)
# Same multi-backend patterns work with all local storage!
Common Patterns Summary¶
| Pattern | Use Case | Implementation |
|---|---|---|
| Hot/Cold Tiering | Cost optimization | Local for recent files, GCS for archives |
| Multi-Region | Low latency | Regional GCS buckets, route by user location |
| Hybrid Burst | Capacity management | Local until full, then cloud |
| Tenant Isolation | Multi-tenancy | Separate GCS bucket prefix per tenant |
| Read-Only Datasets | Shared resources | Mount GCS bucket as readonly |
| Temporary Mounts | Job-specific storage | Dynamic mount + cleanup after job |
Troubleshooting¶
Issue: "Backend not found for path /archives/file.txt"¶
Cause: Mount not configured.
Solution:
# Verify mounts are configured
mounts = nx.list_mounts()
for m in mounts:
print(f"{m.mount_point}: {m.backend_type}")
# Add missing mount
nx.mount_manager.add_mount(
mount_point="/archives",
backend_type="gcs",
backend_config={"bucket": "my-bucket", "project_id": "my-project"},
priority=10
)
Issue: GCS authentication errors¶
Cause: Invalid or missing credentials.
Solution:
# Verify credentials file exists and is valid
cat /path/to/service-account.json
# Test gcloud authentication
gcloud auth application-default login
# Or set explicit path
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
Issue: Write fails to GCS mount¶
Cause: Insufficient permissions on service account.
Solution:
# Grant Storage Object Admin role
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
--role="roles/storage.objectAdmin"
Issue: Slow GCS operations¶
Cause: Network latency or bucket location mismatch.
Solution: - Create bucket in same region as your server - Use regional buckets, not multi-regional - Consider adding local caching layer
What's Next?¶
You've mastered multi-backend storage! Here's how to continue:
Immediate Next Steps¶
- Multi-Tenant SaaS - Build production multi-tenant apps with backend isolation
- Team Collaboration - Multi-user permissions with shared storage
- Workflow Automation - Automate data tiering with workflows
Production Checklist¶
Before deploying multi-backend setup to production:
- Configure authentication and API keys
- Set up GCS service account with minimal required permissions
- Test failover scenarios (backend unavailable, network issues)
- Implement monitoring for backend health and latency
- Configure lifecycle policies for cost optimization
- Document backend topology and routing rules
- Set up backup strategy for each backend
- Test disaster recovery procedures
Advanced Topics¶
- Custom Backends: Implement
Backendinterface for S3, Azure, PostgreSQL - Backend Replication: Sync data across multiple backends for redundancy
- Caching Strategies: Multi-layer caching (memory → local → cloud)
- Backend Migration: Move data between backends without downtime
Complete Example: Production Setup¶
"""
Production multi-backend setup with:
- Local storage for hot data
- GCS for durable archival
- Automatic tiering
- Multi-tenant isolation
"""
import nexus
from datetime import datetime, timedelta
# Connect to server
nx = nexus.connect(config={
"url": "https://nexus.company.com",
"api_key": "nxk_prod_key_..."
})
# Add GCS archive backend
nx.mount_manager.add_mount(
mount_point="/archives",
backend_type="gcs",
backend_config={
"bucket": "company-archives",
"project_id": "prod-project"
},
priority=10
)
# Add GCS cold storage
nx.mount_manager.add_mount(
mount_point="/cold",
backend_type="gcs",
backend_config={
"bucket": "company-coldline",
"project_id": "prod-project"
},
priority=5
)
class StorageManager:
def __init__(self, nx):
self.nx = nx
self.hot_path = "/workspace"
self.warm_path = "/archives"
self.cold_path = "/cold"
def write_with_tiering(self, path: str, content: bytes):
"""Write to hot storage, automatically tier later"""
hot_file = f"{self.hot_path}{path}"
self.nx.write(hot_file, content)
# Tag for future tiering
self.nx.set_metadata(hot_file, {
"tier": "hot",
"accessed_at": datetime.now().isoformat()
})
return hot_file
def tier_data(self):
"""Background job: Tier old data from hot → warm → cold"""
now = datetime.now()
# Hot → Warm (after 7 days)
for entry in self.nx.list(self.hot_path, recursive=True):
if entry['is_directory']:
continue
meta = self.nx.get_metadata(entry['path'])
accessed = datetime.fromisoformat(meta.get('accessed_at', now.isoformat()))
if now - accessed > timedelta(days=7):
self._move_to_tier(entry['path'], self.hot_path, self.warm_path, "warm")
# Warm → Cold (after 90 days)
for entry in self.nx.list(self.warm_path, recursive=True):
if entry['is_directory']:
continue
meta = self.nx.get_metadata(entry['path'])
accessed = datetime.fromisoformat(meta.get('accessed_at', now.isoformat()))
if now - accessed > timedelta(days=90):
self._move_to_tier(entry['path'], self.warm_path, self.cold_path, "cold")
def _move_to_tier(self, path: str, from_tier: str, to_tier: str, tier_name: str):
"""Move file between tiers"""
rel_path = path.replace(from_tier, "")
new_path = f"{to_tier}{rel_path}"
# Copy to new tier
content = self.nx.read(path)
self.nx.write(new_path, content)
# Update metadata
self.nx.set_metadata(new_path, {
"tier": tier_name,
"accessed_at": datetime.now().isoformat(),
"original_path": path
})
# Remove from old tier
self.nx.remove(path)
print(f"✓ Tiered: {path} → {new_path}")
# Usage
storage = StorageManager(nx)
# Application writes always go to hot storage
storage.write_with_tiering("/data/model-output.bin", model_data)
storage.write_with_tiering("/data/user-upload.pdf", pdf_content)
# Run tiering job (would be scheduled via cron/workflow)
storage.tier_data()
print("✓ Production storage manager running")
print(" - Hot tier: Local (/workspace)")
print(" - Warm tier: GCS Standard (/archives)")
print(" - Cold tier: GCS Coldline (/cold)")
Key Takeaways¶
- Dynamic Mount Management: Add backends at runtime via
mount_manager.add_mount() - Transparent Routing: Path-based routing handled automatically
- Cost Optimization: Right storage tier for each use case
- No Vendor Lock-in: Switch backends without code changes
- Production-Ready: Multi-region, multi-tenant, automatic tiering
Next: Multi-Tenant SaaS →