Skill Seekers - Auto-Generate Skills from Documentation¶

Learn how to automatically generate Claude skills from any documentation URL using Nexus Skill Seekers - a plugin that transforms online docs into AI-readable skills with optional AI enhancement.

🎯 What is Skill Seekers?¶

Skill Seekers is a Nexus plugin that automatically generates skills from documentation URLs:

Scrape any docs URL into structured SKILL.md format
AI-enhanced generation with Claude, GPT, or OpenRouter
llms.txt detection for 10x faster scraping
Firecrawl integration for multi-page documentation
Three-tier system (agent/tenant/system) with ReBAC permissions
Skills CLI for easy management and discovery

Nexus Skill Seekers provides:

Tiered scraping - llms.txt → Firecrawl → Basic scraping
AI enhancement - Professional structure with examples and best practices
Permission system - ReBAC integration with approval workflow
Discovery - Search and manage skills via CLI or Python SDK

📊 Demo: Generate Skills from Python Docs¶

The Skill Seekers demo shows how to generate skills from official Python documentation, transforming web pages into structured skills that Claude can use.

What the Demo Shows¶

Automatic skill generation that:

Fetches documentation from any URL
Detects llms.txt for optimized scraping (10x faster)
Falls back to Firecrawl for comprehensive extraction
AI-enhances content with Claude/GPT (optional)
Creates structured SKILL.md with metadata

Quick Start¶

# Install the plugin
pip install nexus-ai-fs nexus-plugin-skill-seekers

# Start Nexus server
./scripts/init-nexus-with-auth.sh

# Load credentials
source .nexus-admin-env

# Optional: Enable AI enhancement
export OPENROUTER_API_KEY="your-key"

# Run the demo
./examples/cli/skill_seekers_demo.sh

AI Enhancement

Set OPENROUTER_API_KEY, ANTHROPIC_API_KEY, or OPENAI_API_KEY to enable AI-enhanced generation with professional structure and code examples.

🔬 How It Works¶

Tiered Scraping Strategy¶

graph TB
    Start([📄 Documentation URL]) --> Check1{llms.txt<br/>available?}

    Check1 -->|Yes| LLMS[⚡ llms.txt<br/>10x faster<br/>411KB in <1s]
    Check1 -->|No| Check2{Firecrawl<br/>API key?}

    Check2 -->|Yes| Fire[🔥 Firecrawl<br/>Multi-page<br/>Comprehensive]
    Check2 -->|No| Basic[📋 Basic<br/>Single page<br/>BeautifulSoup]

    LLMS --> AI{AI<br/>enhancement?}
    Fire --> AI
    Basic --> AI

    AI -->|Yes| Enhanced[🤖 AI-Enhanced<br/>Professional structure<br/>Code examples]
    AI -->|No| Simple[📝 Basic<br/>Simple formatting]

    Enhanced --> Output([✅ SKILL.md<br/>Ready to use])
    Simple --> Output

    style Start fill:#e1f5ff,stroke:#0288d1,stroke-width:3px
    style LLMS fill:#c8e6c9,stroke:#388e3c,stroke-width:3px
    style Fire fill:#ffecb3,stroke:#f57c00,stroke-width:2px
    style Basic fill:#e0e0e0,stroke:#616161,stroke-width:2px
    style Enhanced fill:#ce93d8,stroke:#7b1fa2,stroke-width:3px
    style Output fill:#a5d6a7,stroke:#2e7d32,stroke-width:3px

Core APIs¶

1. Generate Skill from URL¶

Create a skill from documentation:

import asyncio
from nexus.remote import RemoteNexusFS
from nexus_skill_seekers.plugin import SkillSeekersPlugin

async def main():
    # Connect to Nexus
    nx = RemoteNexusFS("http://localhost:2026", api_key="your-key")

    # Initialize plugin
    plugin = SkillSeekersPlugin(nx)

    # Generate skill
    skill_path = await plugin.generate_skill(
        url="https://docs.python.org/3/library/json.html",
        name="json-module",
        tier="agent",
        use_ai=True  # Enable AI enhancement
    )

    print(f"✓ Skill created: {skill_path}")
    # /workspace/.nexus/skills/json-module/SKILL.md

asyncio.run(main())

2. Discover and List Skills¶

Find all available skills:

from nexus.skills.registry import SkillRegistry

# Create registry
registry = SkillRegistry(filesystem=nx)

# Discover skills from all tiers
count = await registry.discover(tiers=["agent", "tenant", "system"])
print(f"Found {count} skills")

# List skills by tier
agent_skills = registry.list_skills(tier="agent")
for skill_name in agent_skills:
    print(f"  • {skill_name}")

3. Read Skill Content¶

Access skill metadata and content:

# Get skill with full content
skill = await registry.get_skill("json-module")

print(f"Name: {skill.metadata.name}")
print(f"Version: {skill.metadata.version}")
print(f"Description: {skill.metadata.description}")
print(f"Source: {skill.metadata.source_url}")

# Read the markdown content
print(skill.content)

4. Search Skills¶

Find skills by keyword:

# Search via CLI
$ nexus skills search json

# Or via Python
from nexus.skills.search import search_skills

results = search_skills(registry, query="json", limit=5)
for result in results:
    print(f"{result.name}: {result.score:.2f}")

📈 Expected Results¶

Without AI Enhancement¶

Basic generation creates simple, functional skills:

---
name: json-module
version: 1.0.0
description: Skill generated from https://docs.python.org/3/library/json.html
tier: agent
---

# Json Module

## Overview

This skill was automatically generated from documentation.

## Description

[Raw documentation content...]

Scraping Speed: 2-5 seconds (basic) | <1 second (llms.txt)

With AI Enhancement¶

AI-enhanced generation creates professional, structured skills:

---
name: json-module
version: 1.0.0
description: Python's json module for encoding/decoding JSON data
tier: agent
---

# JSON Processing in Python

## Overview
Python's json module provides functionality to encode Python objects...

## Key Concepts
- JSON data types map to Python types (dict → object, list → array)
- Encoding (serialization): Python objects → JSON string
- Decoding (deserialization): JSON string → Python objects

## Usage Examples

### Basic Encoding
\```python
import json
data = ['foo', {'bar': ('baz', None, 1.0, 2)}]
json_string = json.dumps(data)
\```

### Basic Decoding
\```python
data = json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')
\```

## API Reference
- `json.dumps()` - Serialize object to JSON string
- `json.loads()` - Deserialize JSON string to Python object
- `json.dump()` - Serialize to file
- `json.load()` - Deserialize from file

Enhancement Time: +3-5 seconds with OpenRouter/Anthropic

🛠️ Customization¶

Custom Skill Names¶

# Auto-generate name from URL
await plugin.generate_skill(
    url="https://docs.python.org/3/library/json.html"
    # name auto-generated: "json-html"
)

# Or specify custom name
await plugin.generate_skill(
    url="https://docs.python.org/3/library/json.html",
    name="python-json-complete-guide"
)

Multi-Tier System¶

# Agent tier - personal skills
await plugin.generate_skill(
    url="...",
    tier="agent"  # /workspace/.nexus/skills/
)

# Tenant tier - team skills (requires approval)
await plugin.generate_skill(
    url="...",
    tier="tenant",  # /shared/skills/
    creator_id="alice",
    zone_id="acme-corp"
)

# System tier - global skills (admin only)
await plugin.generate_skill(
    url="...",
    tier="system"  # /system/skills/
)

AI Model Selection¶

import os

# Use OpenRouter (default: Claude 3.5 Sonnet)
os.environ["OPENROUTER_API_KEY"] = "sk-or-v1-..."
os.environ["OPENROUTER_MODEL"] = "anthropic/claude-3.5-sonnet"

# Or use Anthropic directly
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."

# Or use OpenAI
os.environ["OPENAI_API_KEY"] = "sk-..."

Firecrawl Integration¶

# Enable multi-page crawling with Firecrawl
os.environ["FIRECRAWL_API_KEY"] = "fc-..."

# Plugin automatically uses Firecrawl when:
# 1. llms.txt not found
# 2. FIRECRAWL_API_KEY is set
# Falls back to basic scraping otherwise

💡 Real-World Applications¶

API Documentation Skills¶

Generate skills for your team's internal APIs:

# Generate skills from internal API docs
await plugin.generate_skill(
    url="https://api.company.com/docs/v2/users",
    name="api-users-v2",
    tier="tenant",
    creator_id="engineering",
    zone_id="company"
)

# Claude can now understand your API
# "How do I create a new user via API?"
# Claude references the skill and provides accurate code

Library Documentation¶

Create skills for popular libraries:

libraries = [
    "https://docs.python.org/3/library/json.html",
    "https://requests.readthedocs.io/en/latest/",
    "https://pandas.pydata.org/docs/",
    "https://numpy.org/doc/stable/",
]

for url in libraries:
    await plugin.generate_skill(url=url, tier="system")

Team Knowledge Base¶

Transform company docs into searchable skills:

# Knowledge base URLs
docs = {
    "deployment": "https://wiki.company.com/deployment",
    "security": "https://wiki.company.com/security",
    "architecture": "https://wiki.company.com/architecture",
}

for name, url in docs.items():
    await plugin.generate_skill(
        url=url,
        name=f"kb-{name}",
        tier="tenant",
        zone_id="engineering"
    )

📚 Skills CLI Reference¶

List Skills¶

# List all skills
nexus skills list

# List by tier
nexus skills list --tier agent

# With metadata
nexus skills list --verbose

Skill Info¶

# Get detailed information
nexus skills info json-module

# Output:
# Name: json-module
# Version: 1.0.0
# Description: Python's json module...
# Tier: agent
# Source: https://docs.python.org/3/library/json.html

Search Skills¶

# Search by keyword
nexus skills search json

# Search with limit
nexus skills search api --limit 10

# Search in specific tier
nexus skills search --tier system validation

Export Skills¶

# Export single skill
nexus skills export json-module --output json-module.zip

# Export all agent skills
nexus skills export-all --tier agent --output my-skills.zip

🐛 Troubleshooting¶

llms.txt Not Found¶

Most sites don't have llms.txt yet. Plugin automatically falls back:

# Expected output when llms.txt not available:
→ Checking for llms.txt...
⚠ llms.txt not found
→ Using Firecrawl for multi-page crawl...

Firecrawl 400 Error¶

Free tier limitations - plugin gracefully falls back:

# Expected output:
→ Scraping with Firecrawl...
⚠ Firecrawl error: 400 Bad Request
→ Falling back to basic scraping (limited)

CAS Content Not Found¶

If skills CLI fails after server restart:

# Clean restart to clear stale CAS references
pkill -f "nexus.cli serve"
rm -rf nexus-data/
./scripts/init-nexus-with-auth.sh

See KNOWN_ISSUES.md for permanent fixes.

AI Enhancement Fails¶

Check API key and model availability:

# Debug AI enhancement
import os

print("API Keys:")
print(f"  OpenRouter: {bool(os.getenv('OPENROUTER_API_KEY'))}")
print(f"  Anthropic: {bool(os.getenv('ANTHROPIC_API_KEY'))}")
print(f"  OpenAI: {bool(os.getenv('OPENAI_API_KEY'))}")

# Plugin auto-falls back to basic generation if AI fails

🎓 Understanding Skill Seekers¶

Why Tiered Scraping?¶

llms.txt (fastest) - 411KB in <1 second
Industry standard for AI-optimized docs
Perfect for sites that support it (anthropic.com, etc.)
Firecrawl (comprehensive) - Multi-page crawling
Handles complex documentation sites
Requires API key (paid service)
Basic (fallback) - Single page scraping
Works on any site
Limited to current page

Key Principles¶

Smart scraping - Try fastest methods first, fallback gracefully
AI-optional - Works with or without AI enhancement
Permission-aware - ReBAC integration for team skills
Tier-based - Agent (personal), Tenant (team), System (global)

llms.txt Standard¶

llms.txt is an emerging standard for AI-readable documentation:

# Example llms.txt structure
$ curl https://docs.anthropic.com/llms.txt

# Claude API Documentation
# Version: 2024-06

## Overview
Claude is an AI assistant created by Anthropic...

## API Reference
...

Benefits: 10x faster, AI-optimized, always up-to-date

🚀 Next Steps¶

Generate your first skill - Try with Python or Requests docs
Enable AI enhancement - Set API key for better quality
Build team library - Create tenant-tier skills for your team
Integrate with Claude - Use skills in your AI workflows

Example Workflow¶

# 1. Generate skills for your project's dependencies
python << 'EOF'
import asyncio
from nexus.remote import RemoteNexusFS
from nexus_skill_seekers.plugin import SkillSeekersPlugin

async def main():
    nx = RemoteNexusFS("http://localhost:2026", api_key="...")
    plugin = SkillSeekersPlugin(nx)

    # Your project's dependencies
    libraries = [
        "https://fastapi.tiangolo.com/",
        "https://docs.pydantic.dev/",
        "https://www.sqlalchemy.org/docs/",
    ]

    for url in libraries:
        print(f"Generating skill for {url}...")
        await plugin.generate_skill(url=url, use_ai=True)

asyncio.run(main())
EOF

# 2. Verify skills created
nexus skills list

# 3. Use in Claude conversations
# "How do I create a FastAPI endpoint with Pydantic validation?"
# Claude can now reference your generated skills!

📖 Learn More¶

Plugin Repository: nexus-plugin-skill-seekers
llms.txt Standard: llmstxt.org
Firecrawl: firecrawl.dev
Skills System: See docs/api/skills.md

Powered by Nexus Skill Seekers 🔍 - Transform any documentation into Claude skills