Nexus Architecture¶

Version: 0.6.0 | Last Updated: 2025-10-26

Purpose: High-level architecture overview of Nexus, an AI-native distributed filesystem with advanced features for AI agent workflows.

Table of Contents¶

Overview
System Architecture
Core Components
NexusFS Core
LLM Provider
Plugin System
Work Queue
Workflow Engine
Skills System
Permission System
Memory System
Storage Layer
Namespace System
Data Flow
Key Design Decisions
Performance
Security
Deployment

Overview¶

Nexus is an AI-native distributed filesystem providing a unified API across multiple storage backends with advanced features for AI agent workflows:

Unified Interface: Single API for local, GCS, S3, and cloud storage
Content-Addressable Storage: Automatic deduplication (30-50% savings)
ReBAC Permissions: Pure Zanzibar-style relationship-based access control
Identity-Based Memory: Order-neutral paths for multi-agent collaboration
Time-Travel: Full operation history with undo capability
AI-Native Features: Semantic search, LLM integration, workflow automation

System Architecture¶

┌─────────────────────────────────────────────────────┐
│              User-Facing APIs                       │
│   CLI  │  Python SDK  │  MCP Server  │  HTTP API   │
├─────────────────────────────────────────────────────┤
│              Core Components                        │
│   NexusFS  │  Plugins  │  Workflows  │  LLM        │
│   Permissions (ReBAC)  │  Memory System             │
├─────────────────────────────────────────────────────┤
│              Storage Layer                          │
│   Metadata Store  │  CAS  │  Cache  │  Op Log      │
├─────────────────────────────────────────────────────┤
│              Backend Adapters                       │
│   Local  │  GCS  │  S3  │  GDrive  │  Workspace   │
└─────────────────────────────────────────────────────┘

Core Components¶

1. NexusFS Core¶

Purpose: Central filesystem abstraction providing unified file operations across all backends.

Location: src/nexus/core/nexus_fs.py

Key Capabilities: - Multi-Backend Routing: Automatic path routing to appropriate storage backend - Permission Enforcement: Integrated ReBAC permission system - Operation Logging: Complete audit trail for time-travel and undo - CAS Integration: Automatic content deduplication via SHA-256 hashing - Batch Operations: 4x faster bulk writes via write_batch() - Async-First Design: Non-blocking I/O for scalability

Implementation: Mixin-based architecture separating concerns: - NexusFSCoreMixin: Core read/write/delete operations - NexusFSReBACMixin: Relationship-based access control (fully remote-capable via RPC) - NexusFSSearchMixin: Semantic and keyword search - NexusFSVersionsMixin: Workspace snapshots and versioning - NexusFSMountsMixin: Mount management for virtual filesystem views

RPC Exposure: All public methods use @rpc_expose decorator for automatic remote access via HTTP/RPC protocol. RPC parity is automatically enforced in CI to prevent local-only methods.

2. LLM Provider Abstraction (v0.4.0)¶

Purpose: Unified interface for multiple LLM providers with automatic KV cache management.

Location: src/nexus/llm/

Key Features: - Multi-provider support via LiteLLM (Anthropic, OpenAI, Google, Ollama) - Automatic KV cache management (50-90% cost savings on repeated queries) - Token counting and cost tracking - Streaming response support

Example: See examples/py_demo/llm_provider_demo.py

3. Plugin System¶

Purpose: Extensible architecture for vendor integrations without forking core.

Location: src/nexus/plugins/

Key Components: - Plugin registry with auto-discovery - Lifecycle hooks (before/after read, write, delete, mkdir, copy) - CLI command integration - Configuration management

Plugin Interface: Base class NexusPlugin with metadata, commands, hooks, and lifecycle methods.

Available Plugins: - nexus-plugin-anthropic: Claude Skills API integration - nexus-plugin-skill-seekers: Generate skills from documentation - nexus-plugin-firecrawl: Web scraping and content extraction

Development Guide: See docs/development/PLUGIN_DEVELOPMENT.md

4. Work Queue System¶

Purpose: File-based job queue with SQL views for efficient querying.

Location: src/nexus/storage/views.py

Core Concept: Jobs are regular files with metadata - no separate job system needed ("Everything as a File" principle).

Status States: ready, pending, blocked, in_progress, completed, failed

Key Features: - Priority-based scheduling - Dependency resolution (blocked jobs wait on dependencies) - Worker assignment tracking - SQL views for O(1) queue queries

CLI: nexus work ready, nexus work status, nexus work blocked

Note: Provides job state management. Users implement execution logic.

5. Workflow Engine (v0.4.0)¶

Purpose: Event-driven automation for document processing and multi-step operations.

Location: src/nexus/workflows/

Components: - Triggers: File events, schedules, manual invocation - Actions: Built-in + plugin actions (parse, LLM query, file ops) - Engine: DAG execution with dependency resolution - Storage: Workflow definitions stored as YAML files in .nexus/workflows/

Workflow Format: YAML with triggers, actions, and config

Example: See examples/workflows/invoice_processing.yaml

6. Skills System¶

Purpose: Vendor-neutral skill management with three-tier hierarchy and governance.

Location: src/nexus/skills/

Hierarchy: - /system/skills/: System-wide, read-only - /shared/skills/: Tenant-wide, shared - /workspace/.nexus/skills/: Agent-specific

Key Features: - Dependency resolution with cycle detection - Skill versioning and lineage tracking - Approval governance for shared skills - Export/import workflows

Format: SKILL.md files with YAML frontmatter (name, version, dependencies, tier)

7. ReBAC Permission System (v0.6.0+)¶

Purpose: Pure relationship-based access control using Google Zanzibar principles for scalable, flexible authorization.

Location: src/nexus/core/permissions.py, nexus_fs_rebac.py, rebac_manager.py

Architecture: Pure ReBAC (Relationship-Based Access Control) - all UNIX-style permissions and ACLs removed in v0.6.0.

Permission Model:

Subject-Based Identity: Identity specified per-operation, not per-instance
Types: user, agent, service, group, custom entity types
Examples: ("user", "alice"), ("agent", "claude_001"), ("service", "bootstrap")
Relationship Tuples: All permissions expressed as (subject, relation, object) tuples
Direct Relations: direct_owner, direct_editor, direct_viewer
Computed Relations: owner, editor, viewer (unions of direct + inherited)
Permissions: read, write, execute (map to relations via namespace config)
Object Types: file, memory, workspace, custom resource types
Examples: ("file", "/workspace/doc.txt"), ("memory", "mem_123")

Key Capabilities: - Complete CLI + Python SDK for ReBAC operations (nexus rebac create/check/list/delete) - Full Remote Support: All permission operations work via RPC (local/remote parity) - Automatic permission inheritance via parent relationships - Time-limited access with expiration timestamps - Multi-level organization hierarchies (tenant → workspace → user → agent) - Multi-tenant isolation with tenant-aware permission checks - Centralized permission management in client-server deployments - Graph-based permission checking with caching for performance

Permission Check Order: Admin bypass → ReBAC relation check → Deny (default)

Permission Hierarchy:

owner (full access)
  └── write (includes read)
       └── read (view only)

Relations:
- owner = direct_owner ∪ parent_owner
- editor = direct_editor ∪ owner
- viewer = direct_viewer ∪ editor

Examples: See examples/py_demo/rebac_demo.py, rebac_comprehensive_demo.py, rebac_advanced_demo.py

Detailed Documentation: See PERMISSIONS.md for comprehensive guide

8. Identity-Based Memory System (v0.4.0)¶

Purpose: Order-neutral virtual paths with identity-based storage for AI agent memory.

Location: src/nexus/core/entity_registry.py, src/nexus/core/memory_router.py, src/nexus/core/memory_api.py

Core Concept: Separates identity from location. Canonical storage by ID with multiple virtual path views. Memory location ≠ identity; relationships determine access, paths determine browsing.

Key Features: - Order-Neutral Paths: /workspace/alice/agent1 and /workspace/agent1/alice resolve to same memory - Zero Duplication: Memory sharing across agents without file copies - Dual API Access: Use Memory API (nx.memory.*) or File API (nx.read/write) interchangeably - Multi-View Browsing: Access by user, agent, or tenant perspective - Permission Integration: Full ReBAC permission system support

Storage Structure: - Entity Registry: Tracks tenant/user/agent relationships and hierarchies - Memories Table: Stores memory content with identity metadata (tenant_id, user_id, agent_id, scope, visibility) - Virtual Router: Maps flexible paths to canonical memory IDs

Memory Path Patterns (all equivalent):

/objs/memory/{id}                     # Canonical storage
/workspace/alice/agent1/memory/...    # Workspace view (order-neutral)
/memory/by-user/alice/...             # User-centric view
/memory/by-agent/agent1/...           # Agent-centric view

Example Use Case: Alice's two agents share user-scoped memories. Agent1 creates memory → Agent2 can access via user ownership relationship → no file duplication required.

Examples: See examples/py_demo/memory_file_api_demo.py

9. RPC Parity Enforcement System (v0.4.0+)¶

Purpose: Automated verification that all NexusFS methods work identically in local and remote modes.

Location: src/nexus/core/rpc_decorator.py, tests/unit/test_rpc_parity.py

Problem Solved: Previously, adding methods to NexusFS without exposing them via RPC created inconsistencies between local and remote modes. This led to features that only worked locally.

Solution: Automated enforcement at two levels:

@rpc_expose Decorator: All public NexusFS methods must be decorated to auto-register with RPC server
CI Enforcement: Automated test blocks PRs if new public methods lack @rpc_expose or explicit exclusion

Key Features: - Automatic Registration: Decorated methods auto-register with RPC protocol - Zero Manual Dispatch: Server automatically routes RPC calls to decorated methods - CI Blocking: PRs fail if parity is broken - Clear Error Messages: Test output shows exactly which methods need attention

Method Exposure Options:

Expose via RPC (default): Add @rpc_expose decorator + implement in RemoteNexusFS
Mark Internal-Only (rare): Add to INTERNAL_ONLY_METHODS exclusion list with justification

Example:

from nexus.core.rpc_decorator import rpc_expose

@rpc_expose(description="Create ReBAC relationship")
def rebac_create(self, subject, relation, object, tenant_id=None) -> bool:
    """Create a ReBAC relationship tuple."""
    # Implementation

CI Integration: Separate rpc-parity job runs before main tests, ensuring all methods are properly exposed.

Benefits: - ✅ Guaranteed Parity: Local and remote modes always have same capabilities - ✅ No Manual Tracking: Automated detection of missing RPC exposure - ✅ Early Detection: Catches issues at PR time, not in production - ✅ Documentation: @rpc_expose serves as self-documenting API contract

Detailed Guide: See docs/RPC_PARITY_GUIDE.md

Storage Layer¶

Content-Addressable Storage (CAS)¶

Purpose: Automatic deduplication using SHA-256 content hashing.

Location: src/nexus/backends/local.py, src/nexus/storage/

How It Works: Content is stored by hash (e.g., cas/ab/abcd123...). Identical content stored once, referenced many times.

Benefits: - 30-50% storage savings via deduplication - Immutable content enables efficient caching - Lineage tracking across file copies - Efficient time-travel without storing full copies

Operation Log & Time-Travel¶

Purpose: Complete audit trail with undo capability.

Location: src/nexus/storage/operations.py

Key Features: - All filesystem operations logged to database - Undo capability for reversible operations (write, delete, move, copy) - Time-travel: read files at any historical point - Content diffing between versions - Multi-agent safe with per-agent tracking

CLI: nexus ops log, nexus ops undo, nexus time-travel

Caching System (v0.4.0)¶

Purpose: Multi-tier caching for performance optimization.

Location: src/nexus/storage/cache.py, src/nexus/storage/content_cache.py

Cache Tiers: 1. Metadata Cache: File metadata, path lookups, existence checks 2. Content Cache: LRU cache for file content (256MB default) 3. Permission Cache: Permission check results with TTL

Performance Impact: - Cached reads: 10-50x faster - Metadata operations: 5x faster - Configurable sizes and TTLs

Namespace System¶

Purpose: Organize files into namespaces with different access control and visibility rules.

Location: src/nexus/core/router.py

Built-in Namespaces¶

Namespace	Purpose	Readonly	Admin-Only	Tenant Required
`/workspace`	Agent-specific workspace	No	No	Yes
`/shared`	Tenant-wide shared files	No	No	Yes
`/archives`	Long-term storage	Yes	No	Yes
`/external`	External integrations	No	No	No
`/system`	System configuration	Yes	Yes	No

Visibility: Namespaces are automatically filtered based on user context (tenant_id, is_admin).

FUSE Integration: When mounting via FUSE, namespace directories appear at root level dynamically based on access rights.

Data Flow¶

Read Flow¶

User API → NexusFS → Cache Check → (if miss) → Metadata Lookup → CAS Fetch → Return Content

Write Flow¶

User API → Hooks (before_write) → Hash Content → CAS Store →
Metadata Update → Operation Log → Hooks (after_write) → Cache Invalidation

Undo Flow¶

User Undo → Load Operation → Extract Undo State → Reverse Operation → Log Undo

Backend Adapters¶

Purpose: Abstract storage backends behind unified interface.

Interface: Backend base class with read, write, delete, list, exists, stat methods.

Implementations: - LocalFSBackend: Local filesystem with CAS support - GCSBackend: Google Cloud Storage - S3Backend: AWS S3 (partial) - GDriveBackend: Google Drive (partial) - WorkspaceBackend: Agent workspace abstraction

Location: src/nexus/backends/

Key Design Decisions¶

Why Content-Addressable Storage?¶

Benefits: 30-50% storage savings, immutable content enables caching, lineage tracking, time-travel without full copies Tradeoff: Hash computation overhead

Why SQLite for Local Mode?¶

Benefits: Zero-deployment, ACID guarantees, easy backup Tradeoff: Single-writer limitation (solved by PostgreSQL in hosted mode)

Why Plugin System?¶

Benefits: Vendor neutrality, extensibility without forking, community contributions, composable tools Philosophy: Unix philosophy of composable tools

Why YAML for Workflows?¶

Benefits: Human-readable, Git-friendly, standard format (no custom DSL), everything-as-a-file principle

Performance Characteristics¶

Latency Targets (Local Mode)¶

Read: < 5ms (cached), < 50ms (uncached)
Write: < 100ms (including hash + CAS + metadata)
List: < 50ms for 1000 files
Undo: < 200ms

Throughput Targets¶

Sequential reads: 100+ MB/s
Sequential writes: 50+ MB/s
Batch writes: 4x faster than individual writes
Concurrent operations: 100+ ops/sec

Scaling Limits (Local Mode)¶

Files: 1M+ per tenant
Storage: 10GB - 1TB typical
Operations log: 10M+ operations

Security¶

Multi-Tenancy¶

Tenant isolation at database level
Path namespace isolation
Per-tenant operation logs and metadata

Permission Model¶

Pure ReBAC: Zanzibar-style relationship-based access control
Permissions: read, write, execute (mapped from relations)
Relations: owner, editor, viewer (with direct_ variants)
Inheritance: Directory → file inheritance via parent relationships
Multi-tenant: Complete tenant isolation in permission checks

Data Security¶

SHA-256 content hashing for integrity
Optional encryption at rest (backend-dependent)
Append-only operation log
Complete audit trail for compliance

Deployment Modes¶

Local Mode¶

Single Python process with SQLite and local filesystem. Ideal for development and CLI tools.

Hosted Mode (Auto-Scaling)¶

API layer (FastAPI) → NexusFS Core → PostgreSQL + Cloud Storage (GCS/S3). Auto-scales based on usage.

See: Deployment Guide

References¶

Core Tenets - Design principles and philosophy
Plugin Development - Building extensions
Permission System - Comprehensive permission guide
Database Compatibility - SQLite vs PostgreSQL
Deployment Guide - Production deployment

Document Status: Living document, updated with each major release Next Review: v0.5.0 release