fetch_ml/docs/_pages/architecture.md
Jeremie Fraeys 385d2cf386 docs: add comprehensive documentation with MkDocs site
- Add complete API documentation and architecture guides
- Include quick start, installation, and deployment guides
- Add troubleshooting and security documentation
- Include CLI reference and configuration schema docs
- Add production monitoring and operations guides
- Implement MkDocs configuration with search functionality
- Include comprehensive user and developer documentation

Provides complete documentation for users and developers
covering all aspects of the FetchML platform.
2025-12-04 16:54:57 -05:00

17 KiB

layout title permalink nav_order
page Homelab Architecture /architecture/ 1

Homelab Architecture

Simple, secure architecture for ML experiments in your homelab.

Components Overview

graph TB
    subgraph "Homelab Stack"
        CLI[Zig CLI]
        API[HTTPS API]
        REDIS[Redis Cache]
        FS[Local Storage]
    end
    
    CLI --> API
    API --> REDIS
    API --> FS

Core Services

API Server

  • Purpose: Secure HTTPS API for ML experiments
  • Port: 9101 (HTTPS only)
  • Auth: API key authentication
  • Security: Rate limiting, IP whitelisting

Redis

  • Purpose: Caching and job queuing
  • Port: 6379 (localhost only)
  • Storage: Temporary data only
  • Persistence: Local volume

Zig CLI

  • Purpose: High-performance experiment management
  • Language: Zig for maximum speed and efficiency
  • Features:
    • Content-addressed storage with deduplication
    • SHA256-based commit ID generation
    • WebSocket communication for real-time updates
    • Rsync-based incremental file transfers
    • Multi-threaded operations
    • Secure API key authentication
    • Auto-sync monitoring with file system watching
    • Priority-based job queuing
    • Memory-efficient operations with arena allocators

Security Architecture

graph LR
    USER[User] --> AUTH[API Key Auth]
    AUTH --> RATE[Rate Limiting]
    RATE --> WHITELIST[IP Whitelist]
    WHITELIST --> API[Secure API]
    API --> AUDIT[Audit Logging]

Security Layers

  1. API Key Authentication - Hashed keys with roles
  2. Rate Limiting - 30 requests/minute
  3. IP Whitelisting - Local networks only
  4. Fail2Ban - Automatic IP blocking
  5. HTTPS/TLS - Encrypted communication
  6. Audit Logging - Complete action tracking

Data Flow

sequenceDiagram
    participant CLI
    participant API
    participant Redis
    participant Storage
    
    CLI->>API: HTTPS Request
    API->>API: Validate Auth
    API->>Redis: Cache/Queue
    API->>Storage: Experiment Data
    Storage->>API: Results
    API->>CLI: Response

Deployment Options

services:
  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]
    volumes: [redis_data:/data]
    
  api-server:
    build: .
    ports: ["9101:9101"]
    depends_on: [redis]

Local Setup

./setup.sh && ./manage.sh start

Network Architecture

  • Private Network: Docker internal network
  • Localhost Access: Redis only on localhost
  • HTTPS API: Port 9101, TLS encrypted
  • No External Dependencies: Everything runs locally

Storage Architecture

data/
├── experiments/     # ML experiment results
├── cache/          # Temporary cache files
└── backups/        # Local backups

logs/
├── app.log         # Application logs
├── audit.log       # Security events
└── access.log      # API access logs

Monitoring Architecture

Simple, lightweight monitoring:

  • Health Checks: Service availability
  • Log Files: Structured logging
  • Basic Metrics: Request counts, error rates
  • Security Events: Failed auth, rate limits

Homelab Benefits

  • Simple Setup: One-command installation
  • Local Only: No external dependencies
  • Secure by Default: HTTPS, auth, rate limiting
  • Low Resource: Minimal CPU/memory usage
  • Easy Backup: Local file system
  • Privacy: Everything stays on your network

High-Level Architecture

graph TB
    subgraph "Client Layer"
        CLI[CLI Tools]
        TUI[Terminal UI]
        API[REST API]
    end
    
    subgraph "Authentication Layer"
        Auth[Authentication Service]
        RBAC[Role-Based Access Control]
        Perm[Permission Manager]
    end
    
    subgraph "Core Services"
        Worker[ML Worker Service]
        DataMgr[Data Manager Service]
        Queue[Job Queue]
    end
    
    subgraph "Storage Layer"
        Redis[(Redis Cache)]
        DB[(SQLite/PostgreSQL)]
        Files[File Storage]
    end
    
    subgraph "Container Runtime"
        Podman[Podman/Docker]
        Containers[ML Containers]
    end
    
    CLI --> Auth
    TUI --> Auth
    API --> Auth
    
    Auth --> RBAC
    RBAC --> Perm
    
    Worker --> Queue
    Worker --> DataMgr
    Worker --> Podman
    
    DataMgr --> DB
    DataMgr --> Files
    
    Queue --> Redis
    
    Podman --> Containers

Zig CLI Architecture

Component Structure

graph TB
    subgraph "Zig CLI Components"
        Main[main.zig] --> Commands[commands/]
        Commands --> Config[config.zig]
        Commands --> Utils[utils/]
        Commands --> Net[net/]
        Commands --> Errors[errors.zig]
        
        subgraph "Commands"
            Init[init.zig]
            Sync[sync.zig]
            Queue[queue.zig]
            Watch[watch.zig]
            Status[status.zig]
            Monitor[monitor.zig]
            Cancel[cancel.zig]
            Prune[prune.zig]
        end
        
        subgraph "Utils"
            Crypto[crypto.zig]
            Storage[storage.zig]
            Rsync[rsync.zig]
        end
        
        subgraph "Network"
            WS[ws.zig]
        end
    end

Performance Optimizations

Content-Addressed Storage

  • Deduplication: Files stored by SHA256 hash
  • Space Efficiency: Shared files across experiments
  • Fast Lookup: Hash-based file retrieval

Memory Management

  • Arena Allocators: Efficient bulk allocation
  • Zero-Copy Operations: Minimized memory copying
  • Automatic Cleanup: Resource deallocation

Network Communication

  • WebSocket Protocol: Real-time bidirectional communication
  • Connection Pooling: Reused connections
  • Binary Messaging: Efficient data transfer

Security Implementation

graph LR
    subgraph "CLI Security"
        Config[Config File] --> Hash[SHA256 Hashing]
        Hash --> Auth[API Authentication]
        Auth --> SSH[SSH Transfer]
        SSH --> WS[WebSocket Security]
    end

Core Components

1. Authentication & Authorization

graph LR
    subgraph "Auth Flow"
        Client[Client] --> APIKey[API Key]
        APIKey --> Hash[Hash Validation]
        Hash --> Roles[Role Resolution]
        Roles --> Perms[Permission Check]
        Perms --> Access[Grant/Deny Access]
    end
    
    subgraph "Permission Sources"
        YAML[YAML Config]
        Inline[Inline Fallback]
        Roles --> YAML
        Roles --> Inline
    end

Features:

  • API key-based authentication
  • Role-based access control (RBAC)
  • YAML-based permission configuration
  • Fallback to inline permissions
  • Admin wildcard permissions

2. Worker Service

graph TB
    subgraph "Worker Architecture"
        API[HTTP API] --> Router[Request Router]
        Router --> Auth[Auth Middleware]
        Auth --> Queue[Job Queue]
        Queue --> Processor[Job Processor]
        Processor --> Runtime[Container Runtime]
        Runtime --> Storage[Result Storage]
        
        subgraph "Job Lifecycle"
            Submit[Submit Job] --> Queue
            Queue --> Execute[Execute]
            Execute --> Monitor[Monitor]
            Monitor --> Complete[Complete]
            Complete --> Store[Store Results]
        end
    end

Responsibilities:

  • HTTP API for job submission
  • Job queue management
  • Container orchestration
  • Result collection and storage
  • Metrics and monitoring

3. Data Manager Service

graph TB
    subgraph "Data Management"
        API[Data API] --> Storage[Storage Layer]
        Storage --> Metadata[Metadata DB]
        Storage --> Files[File System]
        Storage --> Cache[Redis Cache]
        
        subgraph "Data Operations"
            Upload[Upload Data] --> Validate[Validate]
            Validate --> Store[Store]
            Store --> Index[Index]
            Index --> Catalog[Catalog]
        end
    end

Features:

  • Data upload and validation
  • Metadata management
  • File system abstraction
  • Caching layer
  • Data catalog

4. Terminal UI (TUI)

graph TB
    subgraph "TUI Architecture"
        UI[UI Components] --> Model[Data Model]
        Model --> Update[Update Loop]
        Update --> Render[Render]
        
        subgraph "UI Panels"
            Jobs[Job List]
            Details[Job Details]
            Logs[Log Viewer]
            Status[Status Bar]
        end
        
        UI --> Jobs
        UI --> Details
        UI --> Logs
        UI --> Status
    end

Components:

  • Bubble Tea framework
  • Component-based architecture
  • Real-time updates
  • Keyboard navigation
  • Theme support

Data Flow

Job Execution Flow

sequenceDiagram
    participant Client
    participant Auth
    participant Worker
    participant Queue
    participant Container
    participant Storage
    
    Client->>Auth: Submit job with API key
    Auth->>Client: Validate and return job ID
    
    Client->>Worker: Execute job request
    Worker->>Queue: Queue job
    Queue->>Worker: Job ready
    Worker->>Container: Start ML container
    Container->>Worker: Execute experiment
    Worker->>Storage: Store results
    Worker->>Client: Return results

Authentication Flow

sequenceDiagram
    participant Client
    participant Auth
    participant PermMgr
    participant Config
    
    Client->>Auth: Request with API key
    Auth->>Auth: Validate key hash
    Auth->>PermMgr: Get user permissions
    PermMgr->>Config: Load YAML permissions
    Config->>PermMgr: Return permissions
    PermMgr->>Auth: Return resolved permissions
    Auth->>Client: Grant/deny access

Security Architecture

Defense in Depth

graph TB
    subgraph "Security Layers"
        Network[Network Security]
        Auth[Authentication]
        AuthZ[Authorization]
        Container[Container Security]
        Data[Data Protection]
        Audit[Audit Logging]
    end
    
    Network --> Auth
    Auth --> AuthZ
    AuthZ --> Container
    Container --> Data
    Data --> Audit

Security Features:

  • API key authentication
  • Role-based permissions
  • Container isolation
  • File system sandboxing
  • Comprehensive audit logs
  • Input validation and sanitization

Container Security

graph TB
    subgraph "Container Isolation"
        Host[Host System]
        Podman[Podman Runtime]
        Network[Network Isolation]
        FS[File System Isolation]
        User[User Namespaces]
        ML[ML Container]
        
        Host --> Podman
        Podman --> Network
        Podman --> FS
        Podman --> User
        User --> ML
    end

Isolation Features:

  • Rootless containers
  • Network isolation
  • File system sandboxing
  • User namespace mapping
  • Resource limits

Configuration Architecture

Configuration Hierarchy

graph TB
    subgraph "Config Sources"
        Env[Environment Variables]
        File[Config Files]
        CLI[CLI Flags]
        Defaults[Default Values]
    end
    
    subgraph "Config Processing"
        Merge[Config Merger]
        Validate[Schema Validator]
        Apply[Config Applier]
    end
    
    Env --> Merge
    File --> Merge
    CLI --> Merge
    Defaults --> Merge
    
    Merge --> Validate
    Validate --> Apply

Configuration Priority:

  1. CLI flags (highest)
  2. Environment variables
  3. Configuration files
  4. Default values (lowest)

Scalability Architecture

Horizontal Scaling

graph TB
    subgraph "Scaled Architecture"
        LB[Load Balancer]
        W1[Worker 1]
        W2[Worker 2]
        W3[Worker N]
        Redis[Redis Cluster]
        Storage[Shared Storage]
        
        LB --> W1
        LB --> W2
        LB --> W3
        
        W1 --> Redis
        W2 --> Redis
        W3 --> Redis
        
        W1 --> Storage
        W2 --> Storage
        W3 --> Storage
    end

Scaling Features:

  • Stateless worker services
  • Shared job queue (Redis)
  • Distributed storage
  • Load balancer ready
  • Health checks and monitoring

Technology Stack

Backend Technologies

Component Technology Purpose
Language Go 1.25+ Core application
Web Framework Standard library HTTP server
Authentication Custom API key + RBAC
Database SQLite/PostgreSQL Metadata storage
Cache Redis Job queue & caching
Containers Podman/Docker Job isolation
UI Framework Bubble Tea Terminal UI

Dependencies

// Core dependencies
require (
    github.com/charmbracelet/bubbletea v1.3.10  // TUI framework
    github.com/go-redis/redis/v8 v8.11.5        // Redis client
    github.com/google/uuid v1.6.0               // UUID generation
    github.com/mattn/go-sqlite3 v1.14.32        // SQLite driver
    golang.org/x/crypto v0.45.0                 // Crypto utilities
    gopkg.in/yaml.v3 v3.0.1                     // YAML parsing
)

Development Architecture

Project Structure

fetch_ml/
├── cmd/                    # CLI applications
│   ├── worker/            # ML worker service
│   ├── tui/               # Terminal UI
│   ├── data_manager/      # Data management
│   └── user_manager/      # User management
├── internal/              # Internal packages
│   ├── auth/              # Authentication system
│   ├── config/            # Configuration management
│   ├── container/         # Container operations
│   ├── database/          # Database operations
│   ├── logging/           # Logging utilities
│   ├── metrics/           # Metrics collection
│   └── network/           # Network utilities
├── configs/               # Configuration files
├── scripts/               # Setup and utility scripts
├── tests/                 # Test suites
└── docs/                  # Documentation

Package Dependencies

graph TB
    subgraph "Application Layer"
        Worker[cmd/worker]
        TUI[cmd/tui]
        DataMgr[cmd/data_manager]
        UserMgr[cmd/user_manager]
    end
    
    subgraph "Service Layer"
        Auth[internal/auth]
        Config[internal/config]
        Container[internal/container]
        Database[internal/database]
    end
    
    subgraph "Utility Layer"
        Logging[internal/logging]
        Metrics[internal/metrics]
        Network[internal/network]
    end
    
    Worker --> Auth
    Worker --> Config
    Worker --> Container
    TUI --> Auth
    DataMgr --> Database
    UserMgr --> Auth
    
    Auth --> Logging
    Container --> Network
    Database --> Metrics

Monitoring & Observability

Metrics Collection

graph TB
    subgraph "Metrics Pipeline"
        App[Application] --> Metrics[Metrics Collector]
        Metrics --> Export[Prometheus Exporter]
        Export --> Prometheus[Prometheus Server]
        Prometheus --> Grafana[Grafana Dashboard]
        
        subgraph "Metric Types"
            Counter[Counters]
            Gauge[Gauges]
            Histogram[Histograms]
            Timer[Timers]
        end
        
        App --> Counter
        App --> Gauge
        App --> Histogram
        App --> Timer
    end

Logging Architecture

graph TB
    subgraph "Logging Pipeline"
        App[Application] --> Logger[Structured Logger]
        Logger --> File[File Output]
        Logger --> Console[Console Output]
        Logger --> Syslog[Syslog Forwarder]
        Syslog --> Aggregator[Log Aggregator]
        Aggregator --> Storage[Log Storage]
        Storage --> Viewer[Log Viewer]
    end

Deployment Architecture

Container Deployment

graph TB
    subgraph "Deployment Stack"
        Image[Container Image]
        Registry[Container Registry]
        Orchestrator[Docker Compose]
        Config[ConfigMaps/Secrets]
        Storage[Persistent Storage]
        
        Image --> Registry
        Registry --> Orchestrator
        Config --> Orchestrator
        Storage --> Orchestrator
    end

Service Discovery

graph TB
    subgraph "Service Mesh"
        Gateway[API Gateway]
        Discovery[Service Discovery]
        Worker[Worker Service]
        Data[Data Service]
        Redis[Redis Cluster]
        
        Gateway --> Discovery
        Discovery --> Worker
        Discovery --> Data
        Discovery --> Redis
    end

Future Architecture Considerations

Microservices Evolution

  • API Gateway: Centralized routing and authentication
  • Service Mesh: Inter-service communication
  • Event Streaming: Kafka for job events
  • Distributed Tracing: OpenTelemetry integration
  • Multi-tenant: Tenant isolation and quotas

Homelab Features

  • Docker Compose: Simple container orchestration
  • Local Development: Easy setup and testing
  • Security: Built-in authentication and encryption
  • Monitoring: Basic health checks and logging

This architecture provides a solid foundation for secure, scalable machine learning experiments while maintaining simplicity and developer productivity.