Multi-Backend Storage¶

⏱️ Time: 20 minutes | 📊 Difficulty: Intermediate | 🎯 Goal: Master hybrid storage with local + cloud backends

What You'll Learn¶

By the end of this guide, you'll understand how to:

✅ Start Nexus server with local backend
✅ Add GCS backend dynamically at runtime
✅ Route different paths to different backends
✅ Implement hybrid storage patterns (hot/cold data)
✅ Optimize costs with tiered storage
✅ Build production-ready multi-cloud architectures

Prerequisites¶

Nexus installed (pip install nexus-ai-fs)
GCS bucket and service account credentials (optional but recommended)
Basic understanding of Nexus file operations

Why Multi-Backend Storage?¶

Traditional approaches force you to choose ONE storage system:

❌ Traditional Problems: - Store everything in expensive cloud storage OR slow local disk - Rewrite code to switch between storage providers - Manual migration when requirements change - Vendor lock-in

✅ Nexus Multi-Backend Solution: - Mix-and-match: Local for hot data, GCS for archives - Transparent routing: Same API, different backends - Cost optimization: Right storage tier for each use case - Zero vendor lock-in: Switch backends without code changes

Real-World Use Cases¶

Use Case	Storage Strategy
AI Training	`/datasets/raw` → GCS (durable) `/cache/processed` → Local (fast)
Document Processing	`/inbox` → Local (temporary) `/archive` → GCS (long-term)
Multi-Region Apps	`/us-west` → GCS us-west1 `/eu` → GCS europe-west1
Cost Optimization	`/hot` → Local SSD `/warm` → Local HDD `/cold` → GCS Coldline

Part 1: Server Setup¶

Step 1: Start Nexus Server¶

Start Nexus server with database authentication:

# First-time setup: Initialize server with admin user
nexus serve --auth-type database --init --port 8765

# Output:
# ✓ Admin user created: admin
# ✓ API key: nxk_abc123def456...
# Save this API key - you'll need it for client connections!

Save the API key - this is the only time it will be displayed!

For subsequent starts (after initialization):

# Restart server (already initialized)
nexus serve --auth-type database --port 8765

Step 2: Create Admin User and API Key¶

# Create admin user (if needed)
nexus admin create-user admin \
  --name "Admin User" \
  --email admin@company.com

# Generate API key for admin
nexus admin create-user-key admin
# Output: nxk_abc123def456...

Part 2: Working with Multiple Backends¶

Step 1: Connect Client to Server¶

Use the recommended nexus.connect() pattern which auto-detects remote mode:

import nexus

# Connect to server (auto-detects remote mode from NEXUS_URL)
# export NEXUS_URL=http://localhost:8765
# export NEXUS_API_KEY=nxk_abc123...
nx = nexus.connect()

# Or connect explicitly
nx = nexus.connect(config={
    "url": "http://localhost:8765",
    "api_key": "nxk_abc123..."
})

Alternatively, use RemoteNexusFS directly:

from nexus import RemoteNexusFS

# Direct remote connection
nx = RemoteNexusFS(
    server_url="http://localhost:8765",
    api_key="nxk_abc123..."
)

Step 2: Add GCS Backend Dynamically¶

Now add a GCS backend for archival storage:

# Add GCS mount for archives
gcs_mount_id = nx.mount_manager.add_mount(
    mount_point="/archives",
    backend_type="gcs",
    backend_config={
        "bucket": "my-nexus-archives",
        "project_id": "my-gcp-project"
        # credentials_path: "/path/to/service-account.json"  # Optional
    },
    priority=10,
    readonly=False
)
print(f"✓ GCS backend mounted at /archives (ID: {gcs_mount_id})")

# Add second GCS mount for shared datasets (read-only)
dataset_mount_id = nx.mount_manager.add_mount(
    mount_point="/datasets",
    backend_type="gcs",
    backend_config={
        "bucket": "company-ml-datasets",
        "project_id": "my-gcp-project"
    },
    priority=20,
    readonly=True  # Prevent accidental modifications
)
print(f"✓ Dataset backend mounted at /datasets (ID: {dataset_mount_id})")

Step 3: List Available Mounts¶

# List all mounts
mounts = nx.list_mounts()
for mount in mounts:
    print(f"📂 {mount.mount_point}: {mount.backend_type} "
          f"(priority={mount.priority}, readonly={mount.readonly})")

# Output:
# 📂 /workspace: local (priority=0, readonly=False)
# 📂 /archives: gcs (priority=10, readonly=False)
# 📂 /datasets: gcs (priority=20, readonly=True)

Step 4: Write to Different Backends¶

The same API works across all backends - Nexus routes automatically:

# Write to default local backend
nx.write("/workspace/active-task.txt", b"Processing...")
print("✓ Written to: Local backend")

# Write to GCS archive (durable, cost-effective)
nx.write("/archives/2024/report.pdf", b"PDF content...")
print("✓ Written to: GCS (gs://my-nexus-archives/...)")

# Try to write to read-only dataset (will fail)
try:
    nx.write("/datasets/new-data.csv", b"data")
except Exception as e:
    print(f"✗ Cannot write to read-only mount: {e}")

# Read works from any backend
data = nx.read("/datasets/reference/model-weights.bin")
print(f"✓ Read {len(data)} bytes from GCS dataset")

Key Insight: Your code doesn't know or care where data is stored. Nexus handles routing automatically based on path.

Step 5: Path Routing and Priority¶

Understanding how Nexus routes paths to backends:

# Example: Add overlapping mount with higher priority
nx.mount_manager.add_mount(
    mount_point="/workspace/shared",
    backend_type="gcs",
    backend_config={"bucket": "team-shared", "project_id": "my-project"},
    priority=10  # Higher than default /workspace (priority=0)
)

# Routing examples:
nx.write("/workspace/file.txt", b"data")
# → Matches: /workspace (local, priority=0)
# ✓ Routed to: local backend

nx.write("/workspace/shared/team-doc.txt", b"data")
# → Matches: /workspace (local, priority=0)
#            /workspace/shared (gcs, priority=10)
# ✓ Routed to: gcs backend (longer prefix + higher priority)

# Get routing information for a path
mount = nx.get_mount_info("/workspace/shared")
if mount:
    print(f"Path /workspace/shared routes to: {mount.backend_type}")

Routing Algorithm: 1. Find all mounts with matching path prefix 2. Sort by: priority (DESC) → prefix length (DESC) 3. Return first match

Part 3: Production Patterns¶

Pattern 1: Hot/Cold Data Tiering¶

Optimize costs by storing frequently-accessed data locally, archive to cloud:

from datetime import datetime, timedelta

def tier_old_files(nx, hot_path: str, cold_path: str, days_threshold: int = 30):
    """Move old files from hot (local) to cold (GCS) storage"""

    cutoff_date = datetime.now() - timedelta(days=days_threshold)
    files_moved = 0

    # List files in hot storage
    for entry in nx.list(hot_path, recursive=True):
        if not entry['is_directory']:
            file_path = entry['path']

            # Get file metadata
            metadata = nx.stat(file_path)
            created_at = datetime.fromisoformat(metadata['created_at'])

            # Move to cold storage if older than threshold
            if created_at < cutoff_date:
                # Construct cold storage path
                rel_path = file_path.replace(hot_path, "")
                cold_file = f"{cold_path}{rel_path}"

                # Copy to cold storage
                content = nx.read(file_path)
                nx.write(cold_file, content)

                # Delete from hot storage
                nx.remove(file_path)

                files_moved += 1
                print(f"📦 Archived: {file_path} → {cold_file}")

    print(f"✓ Moved {files_moved} files to cold storage")
    return files_moved

# Example: Archive old workspace files to GCS
tier_old_files(nx,
    hot_path="/workspace/processed",
    cold_path="/archives/workspace-archive",
    days_threshold=7  # Move files older than 7 days
)

Pattern 2: Content Deduplication Across Backends¶

Nexus automatically deduplicates content within each backend using SHA-256 hashing:

# Same content written to different paths
content = b"Shared configuration data"

# Write to local backend
nx.write("/workspace/config.yaml", content)
nx.write("/workspace/backup/config.yaml", content)  # Deduplicated!

# Write to GCS backend
nx.write("/archives/configs/v1.yaml", content)
nx.write("/archives/configs/v2.yaml", content)  # Deduplicated!

# Each backend stores content ONCE, both paths reference same content hash
print("✓ 4 paths, 2 backends, only 2 physical copies (one per backend)")

# Note: Deduplication is per-backend (local + GCS each store one copy)
# This is by design - each backend is independent

Pattern 3: Multi-Tenant Storage Isolation¶

Each tenant gets their own backend mount:

def provision_tenant_storage(nx, zone_id: str, gcs_bucket: str):
    """Provision isolated storage for a new tenant"""

    # Create tenant-specific mount
    mount_id = nx.mount_manager.add_mount(
        mount_point=f"/tenants/{zone_id}",
        backend_type="gcs",
        backend_config={
            "bucket": gcs_bucket,
            "prefix": f"tenant-{zone_id}/",  # Bucket prefix isolation
            "project_id": "my-gcp-project"
        },
        priority=20
    )

    # Create tenant workspace structure
    nx.mkdir(f"/tenants/{zone_id}/documents")
    nx.mkdir(f"/tenants/{zone_id}/uploads")
    nx.mkdir(f"/tenants/{zone_id}/exports")

    print(f"✓ Tenant {zone_id} provisioned with mount {mount_id}")
    return mount_id

# Provision storage for new tenants
provision_tenant_storage(nx, "acme-corp", "nexus-prod-storage")
provision_tenant_storage(nx, "globex", "nexus-prod-storage")

# Each tenant's data is isolated
nx.write("/tenants/acme-corp/documents/contract.pdf", b"...")
nx.write("/tenants/globex/documents/proposal.pdf", b"...")

# Tenants CANNOT access each other's data (enforced by path isolation + ReBAC)

Pattern 4: Hybrid Cloud Bursting¶

Start with local storage, burst to cloud when capacity is reached:

def smart_write(nx, path: str, content: bytes, local_limit_gb: float = 100.0):
    """Write to local if space available, otherwise use cloud"""

    # Check local storage usage
    local_usage = get_local_storage_usage()  # Custom function

    if local_usage < local_limit_gb:
        # Write to fast local storage
        local_path = f"/workspace{path}"
        nx.write(local_path, content)
        print(f"✓ Written to local: {local_path}")
    else:
        # Burst to cloud storage
        cloud_path = f"/archives{path}"
        nx.write(cloud_path, content)
        print(f"☁️  Burst to cloud: {cloud_path}")

def get_local_storage_usage() -> float:
    """Get local storage usage in GB"""
    import shutil
    stat = shutil.disk_usage("/var/nexus/data")
    return (stat.total - stat.free) / (1024**3)

# Usage
for i in range(1000):
    smart_write(nx, f"/data/file-{i}.bin", b"x" * 1024 * 1024)  # 1MB files

Part 4: GCS Backend Setup¶

Prerequisites¶

GCS Bucket: Create a bucket in Google Cloud Console
Service Account: Create service account with Storage Object Admin role
Credentials: Download JSON key file

Step 1: Create GCS Bucket¶

# Using gcloud CLI (or via console)
gsutil mb -l us-west1 gs://my-nexus-storage
gsutil mb -l us-west1 gs://my-nexus-archives

# Set lifecycle policy for cost optimization (optional)
gsutil lifecycle set lifecycle-policy.json gs://my-nexus-archives

Example lifecycle-policy.json:

{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
        "condition": {"age": 30, "matchesStorageClass": ["STANDARD"]}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
        "condition": {"age": 90, "matchesStorageClass": ["NEARLINE"]}
      }
    ]
  }
}

Step 2: Configure Service Account¶

# Set credentials via environment variable (recommended for server)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

# Then add mount via Python API (shown in Part 2)

Step 3: Test GCS Connection¶

import nexus

nx = nexus.connect(config={
    "url": "http://localhost:8765",
    "api_key": "nxk_abc123..."
})

# Add GCS mount
nx.mount_manager.add_mount(
    mount_point="/archives",
    backend_type="gcs",
    backend_config={
        "bucket": "my-nexus-archives",
        "project_id": "my-gcp-project"
    },
    priority=10
)

# Test write to GCS
test_path = "/archives/test/hello.txt"
nx.write(test_path, b"Hello from Nexus!")
print(f"✓ Written to GCS: {test_path}")

# Test read from GCS
content = nx.read(test_path)
print(f"✓ Read from GCS: {content.decode()}")

# Verify in GCS bucket
# gsutil ls gs://my-nexus-archives/cas/
# Should see content-addressed files

Part 5: Local-Only Multi-Backend (No GCS)¶

If you don't have GCS set up, you can still learn multi-backend concepts with multiple local backends:

import nexus

# Start server with default local backend
# nexus serve --auth-type database --init --port 8765

nx = nexus.connect(config={
    "url": "http://localhost:8765",
    "api_key": "nxk_abc123..."
})

# Add multiple local backends (simulating different storage tiers)
# Fast SSD storage
nx.mount_manager.add_mount(
    mount_point="/fast",
    backend_type="local",
    backend_config={"data_dir": "/tmp/nexus/fast-ssd"},
    priority=20
)

# Slow but large storage (simulating archive)
nx.mount_manager.add_mount(
    mount_point="/archive",
    backend_type="local",
    backend_config={"data_dir": "/tmp/nexus/archive-hdd"},
    priority=10
)

# Temporary scratch space
nx.mount_manager.add_mount(
    mount_point="/scratch",
    backend_type="local",
    backend_config={"data_dir": "/tmp/nexus/scratch"},
    priority=5
)

# Use exactly the same API with multiple local backends
nx.write("/fast/active-model.bin", model_data)
nx.write("/archive/2024-q1-data.tar.gz", archive_data)
nx.write("/scratch/temp-results.json", temp_data)

# Same multi-backend patterns work with all local storage!

Common Patterns Summary¶

Pattern	Use Case	Implementation
Hot/Cold Tiering	Cost optimization	Local for recent files, GCS for archives
Multi-Region	Low latency	Regional GCS buckets, route by user location
Hybrid Burst	Capacity management	Local until full, then cloud
Tenant Isolation	Multi-tenancy	Separate GCS bucket prefix per tenant
Read-Only Datasets	Shared resources	Mount GCS bucket as readonly
Temporary Mounts	Job-specific storage	Dynamic mount + cleanup after job

Troubleshooting¶

Issue: "Backend not found for path /archives/file.txt"¶

Cause: Mount not configured.

Solution:

# Verify mounts are configured
mounts = nx.list_mounts()
for m in mounts:
    print(f"{m.mount_point}: {m.backend_type}")

# Add missing mount
nx.mount_manager.add_mount(
    mount_point="/archives",
    backend_type="gcs",
    backend_config={"bucket": "my-bucket", "project_id": "my-project"},
    priority=10
)

Issue: GCS authentication errors¶

Cause: Invalid or missing credentials.

Solution:

# Verify credentials file exists and is valid
cat /path/to/service-account.json

# Test gcloud authentication
gcloud auth application-default login

# Or set explicit path
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Issue: Write fails to GCS mount¶

Cause: Insufficient permissions on service account.

Solution:

# Grant Storage Object Admin role
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
  --role="roles/storage.objectAdmin"

Issue: Slow GCS operations¶

Cause: Network latency or bucket location mismatch.

Solution: - Create bucket in same region as your server - Use regional buckets, not multi-regional - Consider adding local caching layer

What's Next?¶

You've mastered multi-backend storage! Here's how to continue:

Immediate Next Steps¶

Multi-Tenant SaaS - Build production multi-zone apps with backend isolation
Team Collaboration - Multi-user permissions with shared storage
Workflow Automation - Automate data tiering with workflows

Production Checklist¶

Before deploying multi-backend setup to production:

Configure authentication and API keys
Set up GCS service account with minimal required permissions
Test failover scenarios (backend unavailable, network issues)
Implement monitoring for backend health and latency
Configure lifecycle policies for cost optimization
Document backend topology and routing rules
Set up backup strategy for each backend
Test disaster recovery procedures

Advanced Topics¶

Custom Backends: Implement Backend interface for S3, Azure, PostgreSQL
Backend Replication: Sync data across multiple backends for redundancy
Caching Strategies: Multi-layer caching (memory → local → cloud)
Backend Migration: Move data between backends without downtime

Complete Example: Production Setup¶

"""
Production multi-backend setup with:
- Local storage for hot data
- GCS for durable archival
- Automatic tiering
- Multi-zone isolation
"""

import nexus
from datetime import datetime, timedelta

# Connect to server
nx = nexus.connect(config={
    "url": "https://nexus.company.com",
    "api_key": "nxk_prod_key_..."
})

# Add GCS archive backend
nx.mount_manager.add_mount(
    mount_point="/archives",
    backend_type="gcs",
    backend_config={
        "bucket": "company-archives",
        "project_id": "prod-project"
    },
    priority=10
)

# Add GCS cold storage
nx.mount_manager.add_mount(
    mount_point="/cold",
    backend_type="gcs",
    backend_config={
        "bucket": "company-coldline",
        "project_id": "prod-project"
    },
    priority=5
)

class StorageManager:
    def __init__(self, nx):
        self.nx = nx
        self.hot_path = "/workspace"
        self.warm_path = "/archives"
        self.cold_path = "/cold"

    def write_with_tiering(self, path: str, content: bytes):
        """Write to hot storage, automatically tier later"""
        hot_file = f"{self.hot_path}{path}"
        self.nx.write(hot_file, content)

        # Tag for future tiering
        self.nx.set_metadata(hot_file, {
            "tier": "hot",
            "accessed_at": datetime.now().isoformat()
        })

        return hot_file

    def tier_data(self):
        """Background job: Tier old data from hot → warm → cold"""
        now = datetime.now()

        # Hot → Warm (after 7 days)
        for entry in self.nx.list(self.hot_path, recursive=True):
            if entry['is_directory']:
                continue

            meta = self.nx.get_metadata(entry['path'])
            accessed = datetime.fromisoformat(meta.get('accessed_at', now.isoformat()))

            if now - accessed > timedelta(days=7):
                self._move_to_tier(entry['path'], self.hot_path, self.warm_path, "warm")

        # Warm → Cold (after 90 days)
        for entry in self.nx.list(self.warm_path, recursive=True):
            if entry['is_directory']:
                continue

            meta = self.nx.get_metadata(entry['path'])
            accessed = datetime.fromisoformat(meta.get('accessed_at', now.isoformat()))

            if now - accessed > timedelta(days=90):
                self._move_to_tier(entry['path'], self.warm_path, self.cold_path, "cold")

    def _move_to_tier(self, path: str, from_tier: str, to_tier: str, tier_name: str):
        """Move file between tiers"""
        rel_path = path.replace(from_tier, "")
        new_path = f"{to_tier}{rel_path}"

        # Copy to new tier
        content = self.nx.read(path)
        self.nx.write(new_path, content)

        # Update metadata
        self.nx.set_metadata(new_path, {
            "tier": tier_name,
            "accessed_at": datetime.now().isoformat(),
            "original_path": path
        })

        # Remove from old tier
        self.nx.remove(path)
        print(f"✓ Tiered: {path} → {new_path}")

# Usage
storage = StorageManager(nx)

# Application writes always go to hot storage
storage.write_with_tiering("/data/model-output.bin", model_data)
storage.write_with_tiering("/data/user-upload.pdf", pdf_content)

# Run tiering job (would be scheduled via cron/workflow)
storage.tier_data()

print("✓ Production storage manager running")
print("  - Hot tier: Local (/workspace)")
print("  - Warm tier: GCS Standard (/archives)")
print("  - Cold tier: GCS Coldline (/cold)")

Key Takeaways¶

Dynamic Mount Management: Add backends at runtime via mount_manager.add_mount()
Transparent Routing: Path-based routing handled automatically
Cost Optimization: Right storage tier for each use case
No Vendor Lock-in: Switch backends without code changes
Production-Ready: Multi-region, multi-zone, automatic tiering

Next: Multi-Tenant SaaS →