Nexus Architecture¶
Version: 0.6.0 | Last Updated: 2025-10-26
Purpose: High-level architecture overview of Nexus, an AI-native distributed filesystem with advanced features for AI agent workflows.
Table of Contents¶
- Overview
- System Architecture
- Core Components
- NexusFS Core
- LLM Provider
- Plugin System
- Work Queue
- Workflow Engine
- Skills System
- Permission System
- Memory System
- Storage Layer
- Namespace System
- Data Flow
- Key Design Decisions
- Performance
- Security
- Deployment
Overview¶
Nexus is an AI-native distributed filesystem providing a unified API across multiple storage backends with advanced features for AI agent workflows:
- Unified Interface: Single API for local, GCS, S3, and cloud storage
- Content-Addressable Storage: Automatic deduplication (30-50% savings)
- ReBAC Permissions: Pure Zanzibar-style relationship-based access control
- Identity-Based Memory: Order-neutral paths for multi-agent collaboration
- Time-Travel: Full operation history with undo capability
- AI-Native Features: Semantic search, LLM integration, workflow automation
System Architecture¶
┌─────────────────────────────────────────────────────┐
│ User-Facing APIs │
│ CLI │ Python SDK │ MCP Server │ HTTP API │
├─────────────────────────────────────────────────────┤
│ Core Components │
│ NexusFS │ Plugins │ Workflows │ LLM │
│ Permissions (ReBAC) │ Memory System │
├─────────────────────────────────────────────────────┤
│ Storage Layer │
│ Metadata Store │ CAS │ Cache │ Op Log │
├─────────────────────────────────────────────────────┤
│ Backend Adapters │
│ Local │ GCS │ S3 │ GDrive │ Workspace │
└─────────────────────────────────────────────────────┘
Core Components¶
1. NexusFS Core¶
Purpose: Central filesystem abstraction providing unified file operations across all backends.
Location: src/nexus/core/nexus_fs.py
Key Capabilities: - Multi-Backend Routing: Automatic path routing to appropriate storage backend - Permission Enforcement: Integrated ReBAC permission system - Operation Logging: Complete audit trail for time-travel and undo - CAS Integration: Automatic content deduplication via SHA-256 hashing - Batch Operations: 4x faster bulk writes via write_batch() - Async-First Design: Non-blocking I/O for scalability
Implementation: Mixin-based architecture separating concerns: - NexusFSCoreMixin: Core read/write/delete operations - NexusFSReBACMixin: Relationship-based access control (fully remote-capable via RPC) - NexusFSSearchMixin: Semantic and keyword search - NexusFSVersionsMixin: Workspace snapshots and versioning - NexusFSMountsMixin: Mount management for virtual filesystem views
RPC Exposure: All public methods use @rpc_expose decorator for automatic remote access via HTTP/RPC protocol. RPC parity is automatically enforced in CI to prevent local-only methods.
2. LLM Provider Abstraction (v0.4.0)¶
Purpose: Unified interface for multiple LLM providers with automatic KV cache management.
Location: src/nexus/llm/
Key Features: - Multi-provider support via LiteLLM (Anthropic, OpenAI, Google, Ollama) - Automatic KV cache management (50-90% cost savings on repeated queries) - Token counting and cost tracking - Streaming response support
Example: See examples/py_demo/llm_provider_demo.py
3. Plugin System¶
Purpose: Extensible architecture for vendor integrations without forking core.
Location: src/nexus/plugins/
Key Components: - Plugin registry with auto-discovery - Lifecycle hooks (before/after read, write, delete, mkdir, copy) - CLI command integration - Configuration management
Plugin Interface: Base class NexusPlugin with metadata, commands, hooks, and lifecycle methods.
Available Plugins: - nexus-plugin-anthropic: Claude Skills API integration - nexus-plugin-skill-seekers: Generate skills from documentation - nexus-plugin-firecrawl: Web scraping and content extraction
Development Guide: See docs/development/PLUGIN_DEVELOPMENT.md
4. Work Queue System¶
Purpose: File-based job queue with SQL views for efficient querying.
Location: src/nexus/storage/views.py
Core Concept: Jobs are regular files with metadata - no separate job system needed ("Everything as a File" principle).
Status States: ready, pending, blocked, in_progress, completed, failed
Key Features: - Priority-based scheduling - Dependency resolution (blocked jobs wait on dependencies) - Worker assignment tracking - SQL views for O(1) queue queries
CLI: nexus work ready, nexus work status, nexus work blocked
Note: Provides job state management. Users implement execution logic.
5. Workflow Engine (v0.4.0)¶
Purpose: Event-driven automation for document processing and multi-step operations.
Location: src/nexus/workflows/
Components: - Triggers: File events, schedules, manual invocation - Actions: Built-in + plugin actions (parse, LLM query, file ops) - Engine: DAG execution with dependency resolution - Storage: Workflow definitions stored as YAML files in .nexus/workflows/
Workflow Format: YAML with triggers, actions, and config
Example: See examples/workflows/invoice_processing.yaml
6. Skills System¶
Purpose: Vendor-neutral skill management with three-tier hierarchy and governance.
Location: src/nexus/skills/
Hierarchy: - /system/skills/: System-wide, read-only - /shared/skills/: Tenant-wide, shared - /workspace/.nexus/skills/: Agent-specific
Key Features: - Dependency resolution with cycle detection - Skill versioning and lineage tracking - Approval governance for shared skills - Export/import workflows
Format: SKILL.md files with YAML frontmatter (name, version, dependencies, tier)
7. ReBAC Permission System (v0.6.0+)¶
Purpose: Pure relationship-based access control using Google Zanzibar principles for scalable, flexible authorization.
Location: src/nexus/core/permissions.py, nexus_fs_rebac.py, rebac_manager.py
Architecture: Pure ReBAC (Relationship-Based Access Control) - all UNIX-style permissions and ACLs removed in v0.6.0.
Permission Model:
- Subject-Based Identity: Identity specified per-operation, not per-instance
- Types:
user,agent,service,group, custom entity types -
Examples:
("user", "alice"),("agent", "claude_001"),("service", "bootstrap") -
Relationship Tuples: All permissions expressed as
(subject, relation, object)tuples - Direct Relations:
direct_owner,direct_editor,direct_viewer - Computed Relations:
owner,editor,viewer(unions of direct + inherited) -
Permissions:
read,write,execute(map to relations via namespace config) -
Object Types:
file,memory,workspace, custom resource types - Examples:
("file", "/workspace/doc.txt"),("memory", "mem_123")
Key Capabilities: - Complete CLI + Python SDK for ReBAC operations (nexus rebac create/check/list/delete) - Full Remote Support: All permission operations work via RPC (local/remote parity) - Automatic permission inheritance via parent relationships - Time-limited access with expiration timestamps - Multi-level organization hierarchies (tenant → workspace → user → agent) - Multi-tenant isolation with tenant-aware permission checks - Centralized permission management in client-server deployments - Graph-based permission checking with caching for performance
Permission Check Order: Admin bypass → ReBAC relation check → Deny (default)
Permission Hierarchy:
owner (full access)
└── write (includes read)
└── read (view only)
Relations:
- owner = direct_owner ∪ parent_owner
- editor = direct_editor ∪ owner
- viewer = direct_viewer ∪ editor
Examples: See examples/py_demo/rebac_demo.py, rebac_comprehensive_demo.py, rebac_advanced_demo.py
Detailed Documentation: See PERMISSIONS.md for comprehensive guide
8. Identity-Based Memory System (v0.4.0)¶
Purpose: Order-neutral virtual paths with identity-based storage for AI agent memory.
Location: src/nexus/core/entity_registry.py, src/nexus/core/memory_router.py, src/nexus/core/memory_api.py
Core Concept: Separates identity from location. Canonical storage by ID with multiple virtual path views. Memory location ≠ identity; relationships determine access, paths determine browsing.
Key Features: - Order-Neutral Paths: /workspace/alice/agent1 and /workspace/agent1/alice resolve to same memory - Zero Duplication: Memory sharing across agents without file copies - Dual API Access: Use Memory API (nx.memory.*) or File API (nx.read/write) interchangeably - Multi-View Browsing: Access by user, agent, or tenant perspective - Permission Integration: Full ReBAC permission system support
Storage Structure: - Entity Registry: Tracks tenant/user/agent relationships and hierarchies - Memories Table: Stores memory content with identity metadata (tenant_id, user_id, agent_id, scope, visibility) - Virtual Router: Maps flexible paths to canonical memory IDs
Memory Path Patterns (all equivalent):
/objs/memory/{id} # Canonical storage
/workspace/alice/agent1/memory/... # Workspace view (order-neutral)
/memory/by-user/alice/... # User-centric view
/memory/by-agent/agent1/... # Agent-centric view
Example Use Case: Alice's two agents share user-scoped memories. Agent1 creates memory → Agent2 can access via user ownership relationship → no file duplication required.
Examples: See examples/py_demo/memory_file_api_demo.py
9. RPC Parity Enforcement System (v0.4.0+)¶
Purpose: Automated verification that all NexusFS methods work identically in local and remote modes.
Location: src/nexus/core/rpc_decorator.py, tests/unit/test_rpc_parity.py
Problem Solved: Previously, adding methods to NexusFS without exposing them via RPC created inconsistencies between local and remote modes. This led to features that only worked locally.
Solution: Automated enforcement at two levels:
- @rpc_expose Decorator: All public NexusFS methods must be decorated to auto-register with RPC server
- CI Enforcement: Automated test blocks PRs if new public methods lack
@rpc_exposeor explicit exclusion
Key Features: - Automatic Registration: Decorated methods auto-register with RPC protocol - Zero Manual Dispatch: Server automatically routes RPC calls to decorated methods - CI Blocking: PRs fail if parity is broken - Clear Error Messages: Test output shows exactly which methods need attention
Method Exposure Options:
- Expose via RPC (default): Add
@rpc_exposedecorator + implement inRemoteNexusFS - Mark Internal-Only (rare): Add to
INTERNAL_ONLY_METHODSexclusion list with justification
Example:
from nexus.core.rpc_decorator import rpc_expose
@rpc_expose(description="Create ReBAC relationship")
def rebac_create(self, subject, relation, object, tenant_id=None) -> bool:
"""Create a ReBAC relationship tuple."""
# Implementation
CI Integration: Separate rpc-parity job runs before main tests, ensuring all methods are properly exposed.
Benefits: - ✅ Guaranteed Parity: Local and remote modes always have same capabilities - ✅ No Manual Tracking: Automated detection of missing RPC exposure - ✅ Early Detection: Catches issues at PR time, not in production - ✅ Documentation: @rpc_expose serves as self-documenting API contract
Detailed Guide: See docs/RPC_PARITY_GUIDE.md
Storage Layer¶
Content-Addressable Storage (CAS)¶
Purpose: Automatic deduplication using SHA-256 content hashing.
Location: src/nexus/backends/local.py, src/nexus/storage/
How It Works: Content is stored by hash (e.g., cas/ab/abcd123...). Identical content stored once, referenced many times.
Benefits: - 30-50% storage savings via deduplication - Immutable content enables efficient caching - Lineage tracking across file copies - Efficient time-travel without storing full copies
Operation Log & Time-Travel¶
Purpose: Complete audit trail with undo capability.
Location: src/nexus/storage/operations.py
Key Features: - All filesystem operations logged to database - Undo capability for reversible operations (write, delete, move, copy) - Time-travel: read files at any historical point - Content diffing between versions - Multi-agent safe with per-agent tracking
CLI: nexus ops log, nexus ops undo, nexus time-travel
Caching System (v0.4.0)¶
Purpose: Multi-tier caching for performance optimization.
Location: src/nexus/storage/cache.py, src/nexus/storage/content_cache.py
Cache Tiers: 1. Metadata Cache: File metadata, path lookups, existence checks 2. Content Cache: LRU cache for file content (256MB default) 3. Permission Cache: Permission check results with TTL
Performance Impact: - Cached reads: 10-50x faster - Metadata operations: 5x faster - Configurable sizes and TTLs
Namespace System¶
Purpose: Organize files into namespaces with different access control and visibility rules.
Location: src/nexus/core/router.py
Built-in Namespaces¶
| Namespace | Purpose | Readonly | Admin-Only | Tenant Required |
|---|---|---|---|---|
/workspace | Agent-specific workspace | No | No | Yes |
/shared | Tenant-wide shared files | No | No | Yes |
/archives | Long-term storage | Yes | No | Yes |
/external | External integrations | No | No | No |
/system | System configuration | Yes | Yes | No |
Visibility: Namespaces are automatically filtered based on user context (tenant_id, is_admin).
FUSE Integration: When mounting via FUSE, namespace directories appear at root level dynamically based on access rights.
Data Flow¶
Read Flow¶
Write Flow¶
User API → Hooks (before_write) → Hash Content → CAS Store →
Metadata Update → Operation Log → Hooks (after_write) → Cache Invalidation
Undo Flow¶
Backend Adapters¶
Purpose: Abstract storage backends behind unified interface.
Interface: Backend base class with read, write, delete, list, exists, stat methods.
Implementations: - LocalFSBackend: Local filesystem with CAS support - GCSBackend: Google Cloud Storage - S3Backend: AWS S3 (partial) - GDriveBackend: Google Drive (partial) - WorkspaceBackend: Agent workspace abstraction
Location: src/nexus/backends/
Key Design Decisions¶
Why Content-Addressable Storage?¶
Benefits: 30-50% storage savings, immutable content enables caching, lineage tracking, time-travel without full copies Tradeoff: Hash computation overhead
Why SQLite for Local Mode?¶
Benefits: Zero-deployment, ACID guarantees, easy backup Tradeoff: Single-writer limitation (solved by PostgreSQL in hosted mode)
Why Plugin System?¶
Benefits: Vendor neutrality, extensibility without forking, community contributions, composable tools Philosophy: Unix philosophy of composable tools
Why YAML for Workflows?¶
Benefits: Human-readable, Git-friendly, standard format (no custom DSL), everything-as-a-file principle
Performance Characteristics¶
Latency Targets (Local Mode)¶
- Read: < 5ms (cached), < 50ms (uncached)
- Write: < 100ms (including hash + CAS + metadata)
- List: < 50ms for 1000 files
- Undo: < 200ms
Throughput Targets¶
- Sequential reads: 100+ MB/s
- Sequential writes: 50+ MB/s
- Batch writes: 4x faster than individual writes
- Concurrent operations: 100+ ops/sec
Scaling Limits (Local Mode)¶
- Files: 1M+ per tenant
- Storage: 10GB - 1TB typical
- Operations log: 10M+ operations
Security¶
Multi-Tenancy¶
- Tenant isolation at database level
- Path namespace isolation
- Per-tenant operation logs and metadata
Permission Model¶
- Pure ReBAC: Zanzibar-style relationship-based access control
- Permissions: read, write, execute (mapped from relations)
- Relations: owner, editor, viewer (with direct_ variants)
- Inheritance: Directory → file inheritance via parent relationships
- Multi-tenant: Complete tenant isolation in permission checks
Data Security¶
- SHA-256 content hashing for integrity
- Optional encryption at rest (backend-dependent)
- Append-only operation log
- Complete audit trail for compliance
Deployment Modes¶
Local Mode¶
Single Python process with SQLite and local filesystem. Ideal for development and CLI tools.
Hosted Mode (Auto-Scaling)¶
API layer (FastAPI) → NexusFS Core → PostgreSQL + Cloud Storage (GCS/S3). Auto-scales based on usage.
See: Deployment Guide
References¶
- Core Tenets - Design principles and philosophy
- Plugin Development - Building extensions
- Permission System - Comprehensive permission guide
- Database Compatibility - SQLite vs PostgreSQL
- Deployment Guide - Production deployment
Document Status: Living document, updated with each major release Next Review: v0.5.0 release