--- layout: page title: "Homelab Architecture" permalink: /architecture/ nav_order: 1 --- # Homelab Architecture Simple, secure architecture for ML experiments in your homelab. ## Components Overview ```mermaid graph TB subgraph "Homelab Stack" CLI[Zig CLI] API[HTTPS API] REDIS[Redis Cache] FS[Local Storage] end CLI --> API API --> REDIS API --> FS ``` ## Core Services ### API Server - **Purpose**: Secure HTTPS API for ML experiments - **Port**: 9101 (HTTPS only) - **Auth**: API key authentication - **Security**: Rate limiting, IP whitelisting ### Redis - **Purpose**: Caching and job queuing - **Port**: 6379 (localhost only) - **Storage**: Temporary data only - **Persistence**: Local volume ### Zig CLI - **Purpose**: High-performance experiment management - **Language**: Zig for maximum speed and efficiency - **Features**: - Content-addressed storage with deduplication - SHA256-based commit ID generation - WebSocket communication for real-time updates - Rsync-based incremental file transfers - Multi-threaded operations - Secure API key authentication - Auto-sync monitoring with file system watching - Priority-based job queuing - Memory-efficient operations with arena allocators ## Security Architecture ```mermaid graph LR USER[User] --> AUTH[API Key Auth] AUTH --> RATE[Rate Limiting] RATE --> WHITELIST[IP Whitelist] WHITELIST --> API[Secure API] API --> AUDIT[Audit Logging] ``` ### Security Layers 1. **API Key Authentication** - Hashed keys with roles 2. **Rate Limiting** - 30 requests/minute 3. **IP Whitelisting** - Local networks only 4. **Fail2Ban** - Automatic IP blocking 5. **HTTPS/TLS** - Encrypted communication 6. **Audit Logging** - Complete action tracking ## Data Flow ```mermaid sequenceDiagram participant CLI participant API participant Redis participant Storage CLI->>API: HTTPS Request API->>API: Validate Auth API->>Redis: Cache/Queue API->>Storage: Experiment Data Storage->>API: Results API->>CLI: Response ``` ## Deployment Options ### Docker Compose (Recommended) ```yaml services: redis: image: redis:7-alpine ports: ["6379:6379"] volumes: [redis_data:/data] api-server: build: . ports: ["9101:9101"] depends_on: [redis] ``` ### Local Setup ```bash ./setup.sh && ./manage.sh start ``` ## Network Architecture - **Private Network**: Docker internal network - **Localhost Access**: Redis only on localhost - **HTTPS API**: Port 9101, TLS encrypted - **No External Dependencies**: Everything runs locally ## Storage Architecture ``` data/ ├── experiments/ # ML experiment results ├── cache/ # Temporary cache files └── backups/ # Local backups logs/ ├── app.log # Application logs ├── audit.log # Security events └── access.log # API access logs ``` ## Monitoring Architecture Simple, lightweight monitoring: - **Health Checks**: Service availability - **Log Files**: Structured logging - **Basic Metrics**: Request counts, error rates - **Security Events**: Failed auth, rate limits ## Homelab Benefits - ✅ **Simple Setup**: One-command installation - ✅ **Local Only**: No external dependencies - ✅ **Secure by Default**: HTTPS, auth, rate limiting - ✅ **Low Resource**: Minimal CPU/memory usage - ✅ **Easy Backup**: Local file system - ✅ **Privacy**: Everything stays on your network ## High-Level Architecture ```mermaid graph TB subgraph "Client Layer" CLI[CLI Tools] TUI[Terminal UI] API[REST API] end subgraph "Authentication Layer" Auth[Authentication Service] RBAC[Role-Based Access Control] Perm[Permission Manager] end subgraph "Core Services" Worker[ML Worker Service] DataMgr[Data Manager Service] Queue[Job Queue] end subgraph "Storage Layer" Redis[(Redis Cache)] DB[(SQLite/PostgreSQL)] Files[File Storage] end subgraph "Container Runtime" Podman[Podman/Docker] Containers[ML Containers] end CLI --> Auth TUI --> Auth API --> Auth Auth --> RBAC RBAC --> Perm Worker --> Queue Worker --> DataMgr Worker --> Podman DataMgr --> DB DataMgr --> Files Queue --> Redis Podman --> Containers ``` ## Zig CLI Architecture ### Component Structure ```mermaid graph TB subgraph "Zig CLI Components" Main[main.zig] --> Commands[commands/] Commands --> Config[config.zig] Commands --> Utils[utils/] Commands --> Net[net/] Commands --> Errors[errors.zig] subgraph "Commands" Init[init.zig] Sync[sync.zig] Queue[queue.zig] Watch[watch.zig] Status[status.zig] Monitor[monitor.zig] Cancel[cancel.zig] Prune[prune.zig] end subgraph "Utils" Crypto[crypto.zig] Storage[storage.zig] Rsync[rsync.zig] end subgraph "Network" WS[ws.zig] end end ``` ### Performance Optimizations #### Content-Addressed Storage - **Deduplication**: Files stored by SHA256 hash - **Space Efficiency**: Shared files across experiments - **Fast Lookup**: Hash-based file retrieval #### Memory Management - **Arena Allocators**: Efficient bulk allocation - **Zero-Copy Operations**: Minimized memory copying - **Automatic Cleanup**: Resource deallocation #### Network Communication - **WebSocket Protocol**: Real-time bidirectional communication - **Connection Pooling**: Reused connections - **Binary Messaging**: Efficient data transfer ### Security Implementation ```mermaid graph LR subgraph "CLI Security" Config[Config File] --> Hash[SHA256 Hashing] Hash --> Auth[API Authentication] Auth --> SSH[SSH Transfer] SSH --> WS[WebSocket Security] end ``` ## Core Components ### 1. Authentication & Authorization ```mermaid graph LR subgraph "Auth Flow" Client[Client] --> APIKey[API Key] APIKey --> Hash[Hash Validation] Hash --> Roles[Role Resolution] Roles --> Perms[Permission Check] Perms --> Access[Grant/Deny Access] end subgraph "Permission Sources" YAML[YAML Config] Inline[Inline Fallback] Roles --> YAML Roles --> Inline end ``` **Features:** - API key-based authentication - Role-based access control (RBAC) - YAML-based permission configuration - Fallback to inline permissions - Admin wildcard permissions ### 2. Worker Service ```mermaid graph TB subgraph "Worker Architecture" API[HTTP API] --> Router[Request Router] Router --> Auth[Auth Middleware] Auth --> Queue[Job Queue] Queue --> Processor[Job Processor] Processor --> Runtime[Container Runtime] Runtime --> Storage[Result Storage] subgraph "Job Lifecycle" Submit[Submit Job] --> Queue Queue --> Execute[Execute] Execute --> Monitor[Monitor] Monitor --> Complete[Complete] Complete --> Store[Store Results] end end ``` **Responsibilities:** - HTTP API for job submission - Job queue management - Container orchestration - Result collection and storage - Metrics and monitoring ### 3. Data Manager Service ```mermaid graph TB subgraph "Data Management" API[Data API] --> Storage[Storage Layer] Storage --> Metadata[Metadata DB] Storage --> Files[File System] Storage --> Cache[Redis Cache] subgraph "Data Operations" Upload[Upload Data] --> Validate[Validate] Validate --> Store[Store] Store --> Index[Index] Index --> Catalog[Catalog] end end ``` **Features:** - Data upload and validation - Metadata management - File system abstraction - Caching layer - Data catalog ### 4. Terminal UI (TUI) ```mermaid graph TB subgraph "TUI Architecture" UI[UI Components] --> Model[Data Model] Model --> Update[Update Loop] Update --> Render[Render] subgraph "UI Panels" Jobs[Job List] Details[Job Details] Logs[Log Viewer] Status[Status Bar] end UI --> Jobs UI --> Details UI --> Logs UI --> Status end ``` **Components:** - Bubble Tea framework - Component-based architecture - Real-time updates - Keyboard navigation - Theme support ## Data Flow ### Job Execution Flow ```mermaid sequenceDiagram participant Client participant Auth participant Worker participant Queue participant Container participant Storage Client->>Auth: Submit job with API key Auth->>Client: Validate and return job ID Client->>Worker: Execute job request Worker->>Queue: Queue job Queue->>Worker: Job ready Worker->>Container: Start ML container Container->>Worker: Execute experiment Worker->>Storage: Store results Worker->>Client: Return results ``` ### Authentication Flow ```mermaid sequenceDiagram participant Client participant Auth participant PermMgr participant Config Client->>Auth: Request with API key Auth->>Auth: Validate key hash Auth->>PermMgr: Get user permissions PermMgr->>Config: Load YAML permissions Config->>PermMgr: Return permissions PermMgr->>Auth: Return resolved permissions Auth->>Client: Grant/deny access ``` ## Security Architecture ### Defense in Depth ```mermaid graph TB subgraph "Security Layers" Network[Network Security] Auth[Authentication] AuthZ[Authorization] Container[Container Security] Data[Data Protection] Audit[Audit Logging] end Network --> Auth Auth --> AuthZ AuthZ --> Container Container --> Data Data --> Audit ``` **Security Features:** - API key authentication - Role-based permissions - Container isolation - File system sandboxing - Comprehensive audit logs - Input validation and sanitization ### Container Security ```mermaid graph TB subgraph "Container Isolation" Host[Host System] Podman[Podman Runtime] Network[Network Isolation] FS[File System Isolation] User[User Namespaces] ML[ML Container] Host --> Podman Podman --> Network Podman --> FS Podman --> User User --> ML end ``` **Isolation Features:** - Rootless containers - Network isolation - File system sandboxing - User namespace mapping - Resource limits ## Configuration Architecture ### Configuration Hierarchy ```mermaid graph TB subgraph "Config Sources" Env[Environment Variables] File[Config Files] CLI[CLI Flags] Defaults[Default Values] end subgraph "Config Processing" Merge[Config Merger] Validate[Schema Validator] Apply[Config Applier] end Env --> Merge File --> Merge CLI --> Merge Defaults --> Merge Merge --> Validate Validate --> Apply ``` **Configuration Priority:** 1. CLI flags (highest) 2. Environment variables 3. Configuration files 4. Default values (lowest) ## Scalability Architecture ### Horizontal Scaling ```mermaid graph TB subgraph "Scaled Architecture" LB[Load Balancer] W1[Worker 1] W2[Worker 2] W3[Worker N] Redis[Redis Cluster] Storage[Shared Storage] LB --> W1 LB --> W2 LB --> W3 W1 --> Redis W2 --> Redis W3 --> Redis W1 --> Storage W2 --> Storage W3 --> Storage end ``` **Scaling Features:** - Stateless worker services - Shared job queue (Redis) - Distributed storage - Load balancer ready - Health checks and monitoring ## Technology Stack ### Backend Technologies | Component | Technology | Purpose | |-----------|------------|---------| | **Language** | Go 1.25+ | Core application | | **Web Framework** | Standard library | HTTP server | | **Authentication** | Custom | API key + RBAC | | **Database** | SQLite/PostgreSQL | Metadata storage | | **Cache** | Redis | Job queue & caching | | **Containers** | Podman/Docker | Job isolation | | **UI Framework** | Bubble Tea | Terminal UI | ### Dependencies ```go // Core dependencies require ( github.com/charmbracelet/bubbletea v1.3.10 // TUI framework github.com/go-redis/redis/v8 v8.11.5 // Redis client github.com/google/uuid v1.6.0 // UUID generation github.com/mattn/go-sqlite3 v1.14.32 // SQLite driver golang.org/x/crypto v0.45.0 // Crypto utilities gopkg.in/yaml.v3 v3.0.1 // YAML parsing ) ``` ## Development Architecture ### Project Structure ``` fetch_ml/ ├── cmd/ # CLI applications │ ├── worker/ # ML worker service │ ├── tui/ # Terminal UI │ ├── data_manager/ # Data management │ └── user_manager/ # User management ├── internal/ # Internal packages │ ├── auth/ # Authentication system │ ├── config/ # Configuration management │ ├── container/ # Container operations │ ├── database/ # Database operations │ ├── logging/ # Logging utilities │ ├── metrics/ # Metrics collection │ └── network/ # Network utilities ├── configs/ # Configuration files ├── scripts/ # Setup and utility scripts ├── tests/ # Test suites └── docs/ # Documentation ``` ### Package Dependencies ```mermaid graph TB subgraph "Application Layer" Worker[cmd/worker] TUI[cmd/tui] DataMgr[cmd/data_manager] UserMgr[cmd/user_manager] end subgraph "Service Layer" Auth[internal/auth] Config[internal/config] Container[internal/container] Database[internal/database] end subgraph "Utility Layer" Logging[internal/logging] Metrics[internal/metrics] Network[internal/network] end Worker --> Auth Worker --> Config Worker --> Container TUI --> Auth DataMgr --> Database UserMgr --> Auth Auth --> Logging Container --> Network Database --> Metrics ``` ## Monitoring & Observability ### Metrics Collection ```mermaid graph TB subgraph "Metrics Pipeline" App[Application] --> Metrics[Metrics Collector] Metrics --> Export[Prometheus Exporter] Export --> Prometheus[Prometheus Server] Prometheus --> Grafana[Grafana Dashboard] subgraph "Metric Types" Counter[Counters] Gauge[Gauges] Histogram[Histograms] Timer[Timers] end App --> Counter App --> Gauge App --> Histogram App --> Timer end ``` ### Logging Architecture ```mermaid graph TB subgraph "Logging Pipeline" App[Application] --> Logger[Structured Logger] Logger --> File[File Output] Logger --> Console[Console Output] Logger --> Syslog[Syslog Forwarder] Syslog --> Aggregator[Log Aggregator] Aggregator --> Storage[Log Storage] Storage --> Viewer[Log Viewer] end ``` ## Deployment Architecture ### Container Deployment ```mermaid graph TB subgraph "Deployment Stack" Image[Container Image] Registry[Container Registry] Orchestrator[Docker Compose] Config[ConfigMaps/Secrets] Storage[Persistent Storage] Image --> Registry Registry --> Orchestrator Config --> Orchestrator Storage --> Orchestrator end ``` ### Service Discovery ```mermaid graph TB subgraph "Service Mesh" Gateway[API Gateway] Discovery[Service Discovery] Worker[Worker Service] Data[Data Service] Redis[Redis Cluster] Gateway --> Discovery Discovery --> Worker Discovery --> Data Discovery --> Redis end ``` ## Future Architecture Considerations ### Microservices Evolution - **API Gateway**: Centralized routing and authentication - **Service Mesh**: Inter-service communication - **Event Streaming**: Kafka for job events - **Distributed Tracing**: OpenTelemetry integration - **Multi-tenant**: Tenant isolation and quotas ### Homelab Features - **Docker Compose**: Simple container orchestration - **Local Development**: Easy setup and testing - **Security**: Built-in authentication and encryption - **Monitoring**: Basic health checks and logging --- This architecture provides a solid foundation for secure, scalable machine learning experiments while maintaining simplicity and developer productivity.