- Add complete API documentation and architecture guides - Include quick start, installation, and deployment guides - Add troubleshooting and security documentation - Include CLI reference and configuration schema docs - Add production monitoring and operations guides - Implement MkDocs configuration with search functionality - Include comprehensive user and developer documentation Provides complete documentation for users and developers covering all aspects of the FetchML platform.
738 lines
17 KiB
Markdown
738 lines
17 KiB
Markdown
---
|
|
layout: page
|
|
title: "Homelab Architecture"
|
|
permalink: /architecture/
|
|
nav_order: 1
|
|
---
|
|
|
|
# Homelab Architecture
|
|
|
|
Simple, secure architecture for ML experiments in your homelab.
|
|
|
|
## Components Overview
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Homelab Stack"
|
|
CLI[Zig CLI]
|
|
API[HTTPS API]
|
|
REDIS[Redis Cache]
|
|
FS[Local Storage]
|
|
end
|
|
|
|
CLI --> API
|
|
API --> REDIS
|
|
API --> FS
|
|
```
|
|
|
|
## Core Services
|
|
|
|
### API Server
|
|
- **Purpose**: Secure HTTPS API for ML experiments
|
|
- **Port**: 9101 (HTTPS only)
|
|
- **Auth**: API key authentication
|
|
- **Security**: Rate limiting, IP whitelisting
|
|
|
|
### Redis
|
|
- **Purpose**: Caching and job queuing
|
|
- **Port**: 6379 (localhost only)
|
|
- **Storage**: Temporary data only
|
|
- **Persistence**: Local volume
|
|
|
|
### Zig CLI
|
|
- **Purpose**: High-performance experiment management
|
|
- **Language**: Zig for maximum speed and efficiency
|
|
- **Features**:
|
|
- Content-addressed storage with deduplication
|
|
- SHA256-based commit ID generation
|
|
- WebSocket communication for real-time updates
|
|
- Rsync-based incremental file transfers
|
|
- Multi-threaded operations
|
|
- Secure API key authentication
|
|
- Auto-sync monitoring with file system watching
|
|
- Priority-based job queuing
|
|
- Memory-efficient operations with arena allocators
|
|
|
|
## Security Architecture
|
|
|
|
```mermaid
|
|
graph LR
|
|
USER[User] --> AUTH[API Key Auth]
|
|
AUTH --> RATE[Rate Limiting]
|
|
RATE --> WHITELIST[IP Whitelist]
|
|
WHITELIST --> API[Secure API]
|
|
API --> AUDIT[Audit Logging]
|
|
```
|
|
|
|
### Security Layers
|
|
1. **API Key Authentication** - Hashed keys with roles
|
|
2. **Rate Limiting** - 30 requests/minute
|
|
3. **IP Whitelisting** - Local networks only
|
|
4. **Fail2Ban** - Automatic IP blocking
|
|
5. **HTTPS/TLS** - Encrypted communication
|
|
6. **Audit Logging** - Complete action tracking
|
|
|
|
## Data Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant CLI
|
|
participant API
|
|
participant Redis
|
|
participant Storage
|
|
|
|
CLI->>API: HTTPS Request
|
|
API->>API: Validate Auth
|
|
API->>Redis: Cache/Queue
|
|
API->>Storage: Experiment Data
|
|
Storage->>API: Results
|
|
API->>CLI: Response
|
|
```
|
|
|
|
## Deployment Options
|
|
|
|
### Docker Compose (Recommended)
|
|
```yaml
|
|
services:
|
|
redis:
|
|
image: redis:7-alpine
|
|
ports: ["6379:6379"]
|
|
volumes: [redis_data:/data]
|
|
|
|
api-server:
|
|
build: .
|
|
ports: ["9101:9101"]
|
|
depends_on: [redis]
|
|
```
|
|
|
|
### Local Setup
|
|
```bash
|
|
./setup.sh && ./manage.sh start
|
|
```
|
|
|
|
## Network Architecture
|
|
|
|
- **Private Network**: Docker internal network
|
|
- **Localhost Access**: Redis only on localhost
|
|
- **HTTPS API**: Port 9101, TLS encrypted
|
|
- **No External Dependencies**: Everything runs locally
|
|
|
|
## Storage Architecture
|
|
|
|
```
|
|
data/
|
|
├── experiments/ # ML experiment results
|
|
├── cache/ # Temporary cache files
|
|
└── backups/ # Local backups
|
|
|
|
logs/
|
|
├── app.log # Application logs
|
|
├── audit.log # Security events
|
|
└── access.log # API access logs
|
|
```
|
|
|
|
## Monitoring Architecture
|
|
|
|
Simple, lightweight monitoring:
|
|
- **Health Checks**: Service availability
|
|
- **Log Files**: Structured logging
|
|
- **Basic Metrics**: Request counts, error rates
|
|
- **Security Events**: Failed auth, rate limits
|
|
|
|
## Homelab Benefits
|
|
|
|
- ✅ **Simple Setup**: One-command installation
|
|
- ✅ **Local Only**: No external dependencies
|
|
- ✅ **Secure by Default**: HTTPS, auth, rate limiting
|
|
- ✅ **Low Resource**: Minimal CPU/memory usage
|
|
- ✅ **Easy Backup**: Local file system
|
|
- ✅ **Privacy**: Everything stays on your network
|
|
|
|
## High-Level Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Client Layer"
|
|
CLI[CLI Tools]
|
|
TUI[Terminal UI]
|
|
API[REST API]
|
|
end
|
|
|
|
subgraph "Authentication Layer"
|
|
Auth[Authentication Service]
|
|
RBAC[Role-Based Access Control]
|
|
Perm[Permission Manager]
|
|
end
|
|
|
|
subgraph "Core Services"
|
|
Worker[ML Worker Service]
|
|
DataMgr[Data Manager Service]
|
|
Queue[Job Queue]
|
|
end
|
|
|
|
subgraph "Storage Layer"
|
|
Redis[(Redis Cache)]
|
|
DB[(SQLite/PostgreSQL)]
|
|
Files[File Storage]
|
|
end
|
|
|
|
subgraph "Container Runtime"
|
|
Podman[Podman/Docker]
|
|
Containers[ML Containers]
|
|
end
|
|
|
|
CLI --> Auth
|
|
TUI --> Auth
|
|
API --> Auth
|
|
|
|
Auth --> RBAC
|
|
RBAC --> Perm
|
|
|
|
Worker --> Queue
|
|
Worker --> DataMgr
|
|
Worker --> Podman
|
|
|
|
DataMgr --> DB
|
|
DataMgr --> Files
|
|
|
|
Queue --> Redis
|
|
|
|
Podman --> Containers
|
|
```
|
|
|
|
## Zig CLI Architecture
|
|
|
|
### Component Structure
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Zig CLI Components"
|
|
Main[main.zig] --> Commands[commands/]
|
|
Commands --> Config[config.zig]
|
|
Commands --> Utils[utils/]
|
|
Commands --> Net[net/]
|
|
Commands --> Errors[errors.zig]
|
|
|
|
subgraph "Commands"
|
|
Init[init.zig]
|
|
Sync[sync.zig]
|
|
Queue[queue.zig]
|
|
Watch[watch.zig]
|
|
Status[status.zig]
|
|
Monitor[monitor.zig]
|
|
Cancel[cancel.zig]
|
|
Prune[prune.zig]
|
|
end
|
|
|
|
subgraph "Utils"
|
|
Crypto[crypto.zig]
|
|
Storage[storage.zig]
|
|
Rsync[rsync.zig]
|
|
end
|
|
|
|
subgraph "Network"
|
|
WS[ws.zig]
|
|
end
|
|
end
|
|
```
|
|
|
|
### Performance Optimizations
|
|
|
|
#### Content-Addressed Storage
|
|
- **Deduplication**: Files stored by SHA256 hash
|
|
- **Space Efficiency**: Shared files across experiments
|
|
- **Fast Lookup**: Hash-based file retrieval
|
|
|
|
#### Memory Management
|
|
- **Arena Allocators**: Efficient bulk allocation
|
|
- **Zero-Copy Operations**: Minimized memory copying
|
|
- **Automatic Cleanup**: Resource deallocation
|
|
|
|
#### Network Communication
|
|
- **WebSocket Protocol**: Real-time bidirectional communication
|
|
- **Connection Pooling**: Reused connections
|
|
- **Binary Messaging**: Efficient data transfer
|
|
|
|
### Security Implementation
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "CLI Security"
|
|
Config[Config File] --> Hash[SHA256 Hashing]
|
|
Hash --> Auth[API Authentication]
|
|
Auth --> SSH[SSH Transfer]
|
|
SSH --> WS[WebSocket Security]
|
|
end
|
|
```
|
|
|
|
## Core Components
|
|
|
|
### 1. Authentication & Authorization
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "Auth Flow"
|
|
Client[Client] --> APIKey[API Key]
|
|
APIKey --> Hash[Hash Validation]
|
|
Hash --> Roles[Role Resolution]
|
|
Roles --> Perms[Permission Check]
|
|
Perms --> Access[Grant/Deny Access]
|
|
end
|
|
|
|
subgraph "Permission Sources"
|
|
YAML[YAML Config]
|
|
Inline[Inline Fallback]
|
|
Roles --> YAML
|
|
Roles --> Inline
|
|
end
|
|
```
|
|
|
|
**Features:**
|
|
- API key-based authentication
|
|
- Role-based access control (RBAC)
|
|
- YAML-based permission configuration
|
|
- Fallback to inline permissions
|
|
- Admin wildcard permissions
|
|
|
|
### 2. Worker Service
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Worker Architecture"
|
|
API[HTTP API] --> Router[Request Router]
|
|
Router --> Auth[Auth Middleware]
|
|
Auth --> Queue[Job Queue]
|
|
Queue --> Processor[Job Processor]
|
|
Processor --> Runtime[Container Runtime]
|
|
Runtime --> Storage[Result Storage]
|
|
|
|
subgraph "Job Lifecycle"
|
|
Submit[Submit Job] --> Queue
|
|
Queue --> Execute[Execute]
|
|
Execute --> Monitor[Monitor]
|
|
Monitor --> Complete[Complete]
|
|
Complete --> Store[Store Results]
|
|
end
|
|
end
|
|
```
|
|
|
|
**Responsibilities:**
|
|
- HTTP API for job submission
|
|
- Job queue management
|
|
- Container orchestration
|
|
- Result collection and storage
|
|
- Metrics and monitoring
|
|
|
|
### 3. Data Manager Service
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Data Management"
|
|
API[Data API] --> Storage[Storage Layer]
|
|
Storage --> Metadata[Metadata DB]
|
|
Storage --> Files[File System]
|
|
Storage --> Cache[Redis Cache]
|
|
|
|
subgraph "Data Operations"
|
|
Upload[Upload Data] --> Validate[Validate]
|
|
Validate --> Store[Store]
|
|
Store --> Index[Index]
|
|
Index --> Catalog[Catalog]
|
|
end
|
|
end
|
|
```
|
|
|
|
**Features:**
|
|
- Data upload and validation
|
|
- Metadata management
|
|
- File system abstraction
|
|
- Caching layer
|
|
- Data catalog
|
|
|
|
### 4. Terminal UI (TUI)
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "TUI Architecture"
|
|
UI[UI Components] --> Model[Data Model]
|
|
Model --> Update[Update Loop]
|
|
Update --> Render[Render]
|
|
|
|
subgraph "UI Panels"
|
|
Jobs[Job List]
|
|
Details[Job Details]
|
|
Logs[Log Viewer]
|
|
Status[Status Bar]
|
|
end
|
|
|
|
UI --> Jobs
|
|
UI --> Details
|
|
UI --> Logs
|
|
UI --> Status
|
|
end
|
|
```
|
|
|
|
**Components:**
|
|
- Bubble Tea framework
|
|
- Component-based architecture
|
|
- Real-time updates
|
|
- Keyboard navigation
|
|
- Theme support
|
|
|
|
## Data Flow
|
|
|
|
### Job Execution Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Auth
|
|
participant Worker
|
|
participant Queue
|
|
participant Container
|
|
participant Storage
|
|
|
|
Client->>Auth: Submit job with API key
|
|
Auth->>Client: Validate and return job ID
|
|
|
|
Client->>Worker: Execute job request
|
|
Worker->>Queue: Queue job
|
|
Queue->>Worker: Job ready
|
|
Worker->>Container: Start ML container
|
|
Container->>Worker: Execute experiment
|
|
Worker->>Storage: Store results
|
|
Worker->>Client: Return results
|
|
```
|
|
|
|
### Authentication Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Auth
|
|
participant PermMgr
|
|
participant Config
|
|
|
|
Client->>Auth: Request with API key
|
|
Auth->>Auth: Validate key hash
|
|
Auth->>PermMgr: Get user permissions
|
|
PermMgr->>Config: Load YAML permissions
|
|
Config->>PermMgr: Return permissions
|
|
PermMgr->>Auth: Return resolved permissions
|
|
Auth->>Client: Grant/deny access
|
|
```
|
|
|
|
## Security Architecture
|
|
|
|
### Defense in Depth
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Security Layers"
|
|
Network[Network Security]
|
|
Auth[Authentication]
|
|
AuthZ[Authorization]
|
|
Container[Container Security]
|
|
Data[Data Protection]
|
|
Audit[Audit Logging]
|
|
end
|
|
|
|
Network --> Auth
|
|
Auth --> AuthZ
|
|
AuthZ --> Container
|
|
Container --> Data
|
|
Data --> Audit
|
|
```
|
|
|
|
**Security Features:**
|
|
- API key authentication
|
|
- Role-based permissions
|
|
- Container isolation
|
|
- File system sandboxing
|
|
- Comprehensive audit logs
|
|
- Input validation and sanitization
|
|
|
|
### Container Security
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Container Isolation"
|
|
Host[Host System]
|
|
Podman[Podman Runtime]
|
|
Network[Network Isolation]
|
|
FS[File System Isolation]
|
|
User[User Namespaces]
|
|
ML[ML Container]
|
|
|
|
Host --> Podman
|
|
Podman --> Network
|
|
Podman --> FS
|
|
Podman --> User
|
|
User --> ML
|
|
end
|
|
```
|
|
|
|
**Isolation Features:**
|
|
- Rootless containers
|
|
- Network isolation
|
|
- File system sandboxing
|
|
- User namespace mapping
|
|
- Resource limits
|
|
|
|
## Configuration Architecture
|
|
|
|
### Configuration Hierarchy
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Config Sources"
|
|
Env[Environment Variables]
|
|
File[Config Files]
|
|
CLI[CLI Flags]
|
|
Defaults[Default Values]
|
|
end
|
|
|
|
subgraph "Config Processing"
|
|
Merge[Config Merger]
|
|
Validate[Schema Validator]
|
|
Apply[Config Applier]
|
|
end
|
|
|
|
Env --> Merge
|
|
File --> Merge
|
|
CLI --> Merge
|
|
Defaults --> Merge
|
|
|
|
Merge --> Validate
|
|
Validate --> Apply
|
|
```
|
|
|
|
**Configuration Priority:**
|
|
1. CLI flags (highest)
|
|
2. Environment variables
|
|
3. Configuration files
|
|
4. Default values (lowest)
|
|
|
|
## Scalability Architecture
|
|
|
|
### Horizontal Scaling
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Scaled Architecture"
|
|
LB[Load Balancer]
|
|
W1[Worker 1]
|
|
W2[Worker 2]
|
|
W3[Worker N]
|
|
Redis[Redis Cluster]
|
|
Storage[Shared Storage]
|
|
|
|
LB --> W1
|
|
LB --> W2
|
|
LB --> W3
|
|
|
|
W1 --> Redis
|
|
W2 --> Redis
|
|
W3 --> Redis
|
|
|
|
W1 --> Storage
|
|
W2 --> Storage
|
|
W3 --> Storage
|
|
end
|
|
```
|
|
|
|
**Scaling Features:**
|
|
- Stateless worker services
|
|
- Shared job queue (Redis)
|
|
- Distributed storage
|
|
- Load balancer ready
|
|
- Health checks and monitoring
|
|
|
|
## Technology Stack
|
|
|
|
### Backend Technologies
|
|
|
|
| Component | Technology | Purpose |
|
|
|-----------|------------|---------|
|
|
| **Language** | Go 1.25+ | Core application |
|
|
| **Web Framework** | Standard library | HTTP server |
|
|
| **Authentication** | Custom | API key + RBAC |
|
|
| **Database** | SQLite/PostgreSQL | Metadata storage |
|
|
| **Cache** | Redis | Job queue & caching |
|
|
| **Containers** | Podman/Docker | Job isolation |
|
|
| **UI Framework** | Bubble Tea | Terminal UI |
|
|
|
|
### Dependencies
|
|
|
|
```go
|
|
// Core dependencies
|
|
require (
|
|
github.com/charmbracelet/bubbletea v1.3.10 // TUI framework
|
|
github.com/go-redis/redis/v8 v8.11.5 // Redis client
|
|
github.com/google/uuid v1.6.0 // UUID generation
|
|
github.com/mattn/go-sqlite3 v1.14.32 // SQLite driver
|
|
golang.org/x/crypto v0.45.0 // Crypto utilities
|
|
gopkg.in/yaml.v3 v3.0.1 // YAML parsing
|
|
)
|
|
```
|
|
|
|
## Development Architecture
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
fetch_ml/
|
|
├── cmd/ # CLI applications
|
|
│ ├── worker/ # ML worker service
|
|
│ ├── tui/ # Terminal UI
|
|
│ ├── data_manager/ # Data management
|
|
│ └── user_manager/ # User management
|
|
├── internal/ # Internal packages
|
|
│ ├── auth/ # Authentication system
|
|
│ ├── config/ # Configuration management
|
|
│ ├── container/ # Container operations
|
|
│ ├── database/ # Database operations
|
|
│ ├── logging/ # Logging utilities
|
|
│ ├── metrics/ # Metrics collection
|
|
│ └── network/ # Network utilities
|
|
├── configs/ # Configuration files
|
|
├── scripts/ # Setup and utility scripts
|
|
├── tests/ # Test suites
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
### Package Dependencies
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Application Layer"
|
|
Worker[cmd/worker]
|
|
TUI[cmd/tui]
|
|
DataMgr[cmd/data_manager]
|
|
UserMgr[cmd/user_manager]
|
|
end
|
|
|
|
subgraph "Service Layer"
|
|
Auth[internal/auth]
|
|
Config[internal/config]
|
|
Container[internal/container]
|
|
Database[internal/database]
|
|
end
|
|
|
|
subgraph "Utility Layer"
|
|
Logging[internal/logging]
|
|
Metrics[internal/metrics]
|
|
Network[internal/network]
|
|
end
|
|
|
|
Worker --> Auth
|
|
Worker --> Config
|
|
Worker --> Container
|
|
TUI --> Auth
|
|
DataMgr --> Database
|
|
UserMgr --> Auth
|
|
|
|
Auth --> Logging
|
|
Container --> Network
|
|
Database --> Metrics
|
|
```
|
|
|
|
## Monitoring & Observability
|
|
|
|
### Metrics Collection
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Metrics Pipeline"
|
|
App[Application] --> Metrics[Metrics Collector]
|
|
Metrics --> Export[Prometheus Exporter]
|
|
Export --> Prometheus[Prometheus Server]
|
|
Prometheus --> Grafana[Grafana Dashboard]
|
|
|
|
subgraph "Metric Types"
|
|
Counter[Counters]
|
|
Gauge[Gauges]
|
|
Histogram[Histograms]
|
|
Timer[Timers]
|
|
end
|
|
|
|
App --> Counter
|
|
App --> Gauge
|
|
App --> Histogram
|
|
App --> Timer
|
|
end
|
|
```
|
|
|
|
### Logging Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Logging Pipeline"
|
|
App[Application] --> Logger[Structured Logger]
|
|
Logger --> File[File Output]
|
|
Logger --> Console[Console Output]
|
|
Logger --> Syslog[Syslog Forwarder]
|
|
Syslog --> Aggregator[Log Aggregator]
|
|
Aggregator --> Storage[Log Storage]
|
|
Storage --> Viewer[Log Viewer]
|
|
end
|
|
```
|
|
|
|
## Deployment Architecture
|
|
|
|
### Container Deployment
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Deployment Stack"
|
|
Image[Container Image]
|
|
Registry[Container Registry]
|
|
Orchestrator[Docker Compose]
|
|
Config[ConfigMaps/Secrets]
|
|
Storage[Persistent Storage]
|
|
|
|
Image --> Registry
|
|
Registry --> Orchestrator
|
|
Config --> Orchestrator
|
|
Storage --> Orchestrator
|
|
end
|
|
```
|
|
|
|
### Service Discovery
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Service Mesh"
|
|
Gateway[API Gateway]
|
|
Discovery[Service Discovery]
|
|
Worker[Worker Service]
|
|
Data[Data Service]
|
|
Redis[Redis Cluster]
|
|
|
|
Gateway --> Discovery
|
|
Discovery --> Worker
|
|
Discovery --> Data
|
|
Discovery --> Redis
|
|
end
|
|
```
|
|
|
|
## Future Architecture Considerations
|
|
|
|
### Microservices Evolution
|
|
|
|
- **API Gateway**: Centralized routing and authentication
|
|
- **Service Mesh**: Inter-service communication
|
|
- **Event Streaming**: Kafka for job events
|
|
- **Distributed Tracing**: OpenTelemetry integration
|
|
- **Multi-tenant**: Tenant isolation and quotas
|
|
|
|
### Homelab Features
|
|
|
|
- **Docker Compose**: Simple container orchestration
|
|
- **Local Development**: Easy setup and testing
|
|
- **Security**: Built-in authentication and encryption
|
|
- **Monitoring**: Basic health checks and logging
|
|
|
|
---
|
|
|
|
This architecture provides a solid foundation for secure, scalable machine learning experiments while maintaining simplicity and developer productivity.
|