- Add complete API documentation and architecture guides - Include quick start, installation, and deployment guides - Add troubleshooting and security documentation - Include CLI reference and configuration schema docs - Add production monitoring and operations guides - Implement MkDocs configuration with search functionality - Include comprehensive user and developer documentation Provides complete documentation for users and developers covering all aspects of the FetchML platform.
17 KiB
17 KiB
| layout | title | permalink | nav_order |
|---|---|---|---|
| page | Homelab Architecture | /architecture/ | 1 |
Homelab Architecture
Simple, secure architecture for ML experiments in your homelab.
Components Overview
graph TB
subgraph "Homelab Stack"
CLI[Zig CLI]
API[HTTPS API]
REDIS[Redis Cache]
FS[Local Storage]
end
CLI --> API
API --> REDIS
API --> FS
Core Services
API Server
- Purpose: Secure HTTPS API for ML experiments
- Port: 9101 (HTTPS only)
- Auth: API key authentication
- Security: Rate limiting, IP whitelisting
Redis
- Purpose: Caching and job queuing
- Port: 6379 (localhost only)
- Storage: Temporary data only
- Persistence: Local volume
Zig CLI
- Purpose: High-performance experiment management
- Language: Zig for maximum speed and efficiency
- Features:
- Content-addressed storage with deduplication
- SHA256-based commit ID generation
- WebSocket communication for real-time updates
- Rsync-based incremental file transfers
- Multi-threaded operations
- Secure API key authentication
- Auto-sync monitoring with file system watching
- Priority-based job queuing
- Memory-efficient operations with arena allocators
Security Architecture
graph LR
USER[User] --> AUTH[API Key Auth]
AUTH --> RATE[Rate Limiting]
RATE --> WHITELIST[IP Whitelist]
WHITELIST --> API[Secure API]
API --> AUDIT[Audit Logging]
Security Layers
- API Key Authentication - Hashed keys with roles
- Rate Limiting - 30 requests/minute
- IP Whitelisting - Local networks only
- Fail2Ban - Automatic IP blocking
- HTTPS/TLS - Encrypted communication
- Audit Logging - Complete action tracking
Data Flow
sequenceDiagram
participant CLI
participant API
participant Redis
participant Storage
CLI->>API: HTTPS Request
API->>API: Validate Auth
API->>Redis: Cache/Queue
API->>Storage: Experiment Data
Storage->>API: Results
API->>CLI: Response
Deployment Options
Docker Compose (Recommended)
services:
redis:
image: redis:7-alpine
ports: ["6379:6379"]
volumes: [redis_data:/data]
api-server:
build: .
ports: ["9101:9101"]
depends_on: [redis]
Local Setup
./setup.sh && ./manage.sh start
Network Architecture
- Private Network: Docker internal network
- Localhost Access: Redis only on localhost
- HTTPS API: Port 9101, TLS encrypted
- No External Dependencies: Everything runs locally
Storage Architecture
data/
├── experiments/ # ML experiment results
├── cache/ # Temporary cache files
└── backups/ # Local backups
logs/
├── app.log # Application logs
├── audit.log # Security events
└── access.log # API access logs
Monitoring Architecture
Simple, lightweight monitoring:
- Health Checks: Service availability
- Log Files: Structured logging
- Basic Metrics: Request counts, error rates
- Security Events: Failed auth, rate limits
Homelab Benefits
- ✅ Simple Setup: One-command installation
- ✅ Local Only: No external dependencies
- ✅ Secure by Default: HTTPS, auth, rate limiting
- ✅ Low Resource: Minimal CPU/memory usage
- ✅ Easy Backup: Local file system
- ✅ Privacy: Everything stays on your network
High-Level Architecture
graph TB
subgraph "Client Layer"
CLI[CLI Tools]
TUI[Terminal UI]
API[REST API]
end
subgraph "Authentication Layer"
Auth[Authentication Service]
RBAC[Role-Based Access Control]
Perm[Permission Manager]
end
subgraph "Core Services"
Worker[ML Worker Service]
DataMgr[Data Manager Service]
Queue[Job Queue]
end
subgraph "Storage Layer"
Redis[(Redis Cache)]
DB[(SQLite/PostgreSQL)]
Files[File Storage]
end
subgraph "Container Runtime"
Podman[Podman/Docker]
Containers[ML Containers]
end
CLI --> Auth
TUI --> Auth
API --> Auth
Auth --> RBAC
RBAC --> Perm
Worker --> Queue
Worker --> DataMgr
Worker --> Podman
DataMgr --> DB
DataMgr --> Files
Queue --> Redis
Podman --> Containers
Zig CLI Architecture
Component Structure
graph TB
subgraph "Zig CLI Components"
Main[main.zig] --> Commands[commands/]
Commands --> Config[config.zig]
Commands --> Utils[utils/]
Commands --> Net[net/]
Commands --> Errors[errors.zig]
subgraph "Commands"
Init[init.zig]
Sync[sync.zig]
Queue[queue.zig]
Watch[watch.zig]
Status[status.zig]
Monitor[monitor.zig]
Cancel[cancel.zig]
Prune[prune.zig]
end
subgraph "Utils"
Crypto[crypto.zig]
Storage[storage.zig]
Rsync[rsync.zig]
end
subgraph "Network"
WS[ws.zig]
end
end
Performance Optimizations
Content-Addressed Storage
- Deduplication: Files stored by SHA256 hash
- Space Efficiency: Shared files across experiments
- Fast Lookup: Hash-based file retrieval
Memory Management
- Arena Allocators: Efficient bulk allocation
- Zero-Copy Operations: Minimized memory copying
- Automatic Cleanup: Resource deallocation
Network Communication
- WebSocket Protocol: Real-time bidirectional communication
- Connection Pooling: Reused connections
- Binary Messaging: Efficient data transfer
Security Implementation
graph LR
subgraph "CLI Security"
Config[Config File] --> Hash[SHA256 Hashing]
Hash --> Auth[API Authentication]
Auth --> SSH[SSH Transfer]
SSH --> WS[WebSocket Security]
end
Core Components
1. Authentication & Authorization
graph LR
subgraph "Auth Flow"
Client[Client] --> APIKey[API Key]
APIKey --> Hash[Hash Validation]
Hash --> Roles[Role Resolution]
Roles --> Perms[Permission Check]
Perms --> Access[Grant/Deny Access]
end
subgraph "Permission Sources"
YAML[YAML Config]
Inline[Inline Fallback]
Roles --> YAML
Roles --> Inline
end
Features:
- API key-based authentication
- Role-based access control (RBAC)
- YAML-based permission configuration
- Fallback to inline permissions
- Admin wildcard permissions
2. Worker Service
graph TB
subgraph "Worker Architecture"
API[HTTP API] --> Router[Request Router]
Router --> Auth[Auth Middleware]
Auth --> Queue[Job Queue]
Queue --> Processor[Job Processor]
Processor --> Runtime[Container Runtime]
Runtime --> Storage[Result Storage]
subgraph "Job Lifecycle"
Submit[Submit Job] --> Queue
Queue --> Execute[Execute]
Execute --> Monitor[Monitor]
Monitor --> Complete[Complete]
Complete --> Store[Store Results]
end
end
Responsibilities:
- HTTP API for job submission
- Job queue management
- Container orchestration
- Result collection and storage
- Metrics and monitoring
3. Data Manager Service
graph TB
subgraph "Data Management"
API[Data API] --> Storage[Storage Layer]
Storage --> Metadata[Metadata DB]
Storage --> Files[File System]
Storage --> Cache[Redis Cache]
subgraph "Data Operations"
Upload[Upload Data] --> Validate[Validate]
Validate --> Store[Store]
Store --> Index[Index]
Index --> Catalog[Catalog]
end
end
Features:
- Data upload and validation
- Metadata management
- File system abstraction
- Caching layer
- Data catalog
4. Terminal UI (TUI)
graph TB
subgraph "TUI Architecture"
UI[UI Components] --> Model[Data Model]
Model --> Update[Update Loop]
Update --> Render[Render]
subgraph "UI Panels"
Jobs[Job List]
Details[Job Details]
Logs[Log Viewer]
Status[Status Bar]
end
UI --> Jobs
UI --> Details
UI --> Logs
UI --> Status
end
Components:
- Bubble Tea framework
- Component-based architecture
- Real-time updates
- Keyboard navigation
- Theme support
Data Flow
Job Execution Flow
sequenceDiagram
participant Client
participant Auth
participant Worker
participant Queue
participant Container
participant Storage
Client->>Auth: Submit job with API key
Auth->>Client: Validate and return job ID
Client->>Worker: Execute job request
Worker->>Queue: Queue job
Queue->>Worker: Job ready
Worker->>Container: Start ML container
Container->>Worker: Execute experiment
Worker->>Storage: Store results
Worker->>Client: Return results
Authentication Flow
sequenceDiagram
participant Client
participant Auth
participant PermMgr
participant Config
Client->>Auth: Request with API key
Auth->>Auth: Validate key hash
Auth->>PermMgr: Get user permissions
PermMgr->>Config: Load YAML permissions
Config->>PermMgr: Return permissions
PermMgr->>Auth: Return resolved permissions
Auth->>Client: Grant/deny access
Security Architecture
Defense in Depth
graph TB
subgraph "Security Layers"
Network[Network Security]
Auth[Authentication]
AuthZ[Authorization]
Container[Container Security]
Data[Data Protection]
Audit[Audit Logging]
end
Network --> Auth
Auth --> AuthZ
AuthZ --> Container
Container --> Data
Data --> Audit
Security Features:
- API key authentication
- Role-based permissions
- Container isolation
- File system sandboxing
- Comprehensive audit logs
- Input validation and sanitization
Container Security
graph TB
subgraph "Container Isolation"
Host[Host System]
Podman[Podman Runtime]
Network[Network Isolation]
FS[File System Isolation]
User[User Namespaces]
ML[ML Container]
Host --> Podman
Podman --> Network
Podman --> FS
Podman --> User
User --> ML
end
Isolation Features:
- Rootless containers
- Network isolation
- File system sandboxing
- User namespace mapping
- Resource limits
Configuration Architecture
Configuration Hierarchy
graph TB
subgraph "Config Sources"
Env[Environment Variables]
File[Config Files]
CLI[CLI Flags]
Defaults[Default Values]
end
subgraph "Config Processing"
Merge[Config Merger]
Validate[Schema Validator]
Apply[Config Applier]
end
Env --> Merge
File --> Merge
CLI --> Merge
Defaults --> Merge
Merge --> Validate
Validate --> Apply
Configuration Priority:
- CLI flags (highest)
- Environment variables
- Configuration files
- Default values (lowest)
Scalability Architecture
Horizontal Scaling
graph TB
subgraph "Scaled Architecture"
LB[Load Balancer]
W1[Worker 1]
W2[Worker 2]
W3[Worker N]
Redis[Redis Cluster]
Storage[Shared Storage]
LB --> W1
LB --> W2
LB --> W3
W1 --> Redis
W2 --> Redis
W3 --> Redis
W1 --> Storage
W2 --> Storage
W3 --> Storage
end
Scaling Features:
- Stateless worker services
- Shared job queue (Redis)
- Distributed storage
- Load balancer ready
- Health checks and monitoring
Technology Stack
Backend Technologies
| Component | Technology | Purpose |
|---|---|---|
| Language | Go 1.25+ | Core application |
| Web Framework | Standard library | HTTP server |
| Authentication | Custom | API key + RBAC |
| Database | SQLite/PostgreSQL | Metadata storage |
| Cache | Redis | Job queue & caching |
| Containers | Podman/Docker | Job isolation |
| UI Framework | Bubble Tea | Terminal UI |
Dependencies
// Core dependencies
require (
github.com/charmbracelet/bubbletea v1.3.10 // TUI framework
github.com/go-redis/redis/v8 v8.11.5 // Redis client
github.com/google/uuid v1.6.0 // UUID generation
github.com/mattn/go-sqlite3 v1.14.32 // SQLite driver
golang.org/x/crypto v0.45.0 // Crypto utilities
gopkg.in/yaml.v3 v3.0.1 // YAML parsing
)
Development Architecture
Project Structure
fetch_ml/
├── cmd/ # CLI applications
│ ├── worker/ # ML worker service
│ ├── tui/ # Terminal UI
│ ├── data_manager/ # Data management
│ └── user_manager/ # User management
├── internal/ # Internal packages
│ ├── auth/ # Authentication system
│ ├── config/ # Configuration management
│ ├── container/ # Container operations
│ ├── database/ # Database operations
│ ├── logging/ # Logging utilities
│ ├── metrics/ # Metrics collection
│ └── network/ # Network utilities
├── configs/ # Configuration files
├── scripts/ # Setup and utility scripts
├── tests/ # Test suites
└── docs/ # Documentation
Package Dependencies
graph TB
subgraph "Application Layer"
Worker[cmd/worker]
TUI[cmd/tui]
DataMgr[cmd/data_manager]
UserMgr[cmd/user_manager]
end
subgraph "Service Layer"
Auth[internal/auth]
Config[internal/config]
Container[internal/container]
Database[internal/database]
end
subgraph "Utility Layer"
Logging[internal/logging]
Metrics[internal/metrics]
Network[internal/network]
end
Worker --> Auth
Worker --> Config
Worker --> Container
TUI --> Auth
DataMgr --> Database
UserMgr --> Auth
Auth --> Logging
Container --> Network
Database --> Metrics
Monitoring & Observability
Metrics Collection
graph TB
subgraph "Metrics Pipeline"
App[Application] --> Metrics[Metrics Collector]
Metrics --> Export[Prometheus Exporter]
Export --> Prometheus[Prometheus Server]
Prometheus --> Grafana[Grafana Dashboard]
subgraph "Metric Types"
Counter[Counters]
Gauge[Gauges]
Histogram[Histograms]
Timer[Timers]
end
App --> Counter
App --> Gauge
App --> Histogram
App --> Timer
end
Logging Architecture
graph TB
subgraph "Logging Pipeline"
App[Application] --> Logger[Structured Logger]
Logger --> File[File Output]
Logger --> Console[Console Output]
Logger --> Syslog[Syslog Forwarder]
Syslog --> Aggregator[Log Aggregator]
Aggregator --> Storage[Log Storage]
Storage --> Viewer[Log Viewer]
end
Deployment Architecture
Container Deployment
graph TB
subgraph "Deployment Stack"
Image[Container Image]
Registry[Container Registry]
Orchestrator[Docker Compose]
Config[ConfigMaps/Secrets]
Storage[Persistent Storage]
Image --> Registry
Registry --> Orchestrator
Config --> Orchestrator
Storage --> Orchestrator
end
Service Discovery
graph TB
subgraph "Service Mesh"
Gateway[API Gateway]
Discovery[Service Discovery]
Worker[Worker Service]
Data[Data Service]
Redis[Redis Cluster]
Gateway --> Discovery
Discovery --> Worker
Discovery --> Data
Discovery --> Redis
end
Future Architecture Considerations
Microservices Evolution
- API Gateway: Centralized routing and authentication
- Service Mesh: Inter-service communication
- Event Streaming: Kafka for job events
- Distributed Tracing: OpenTelemetry integration
- Multi-tenant: Tenant isolation and quotas
Homelab Features
- Docker Compose: Simple container orchestration
- Local Development: Easy setup and testing
- Security: Built-in authentication and encryption
- Monitoring: Basic health checks and logging
This architecture provides a solid foundation for secure, scalable machine learning experiments while maintaining simplicity and developer productivity.