fetch_ml/docs/_pages/architecture.md
Jeremie Fraeys 385d2cf386 docs: add comprehensive documentation with MkDocs site
- Add complete API documentation and architecture guides
- Include quick start, installation, and deployment guides
- Add troubleshooting and security documentation
- Include CLI reference and configuration schema docs
- Add production monitoring and operations guides
- Implement MkDocs configuration with search functionality
- Include comprehensive user and developer documentation

Provides complete documentation for users and developers
covering all aspects of the FetchML platform.
2025-12-04 16:54:57 -05:00

738 lines
17 KiB
Markdown

---
layout: page
title: "Homelab Architecture"
permalink: /architecture/
nav_order: 1
---
# Homelab Architecture
Simple, secure architecture for ML experiments in your homelab.
## Components Overview
```mermaid
graph TB
subgraph "Homelab Stack"
CLI[Zig CLI]
API[HTTPS API]
REDIS[Redis Cache]
FS[Local Storage]
end
CLI --> API
API --> REDIS
API --> FS
```
## Core Services
### API Server
- **Purpose**: Secure HTTPS API for ML experiments
- **Port**: 9101 (HTTPS only)
- **Auth**: API key authentication
- **Security**: Rate limiting, IP whitelisting
### Redis
- **Purpose**: Caching and job queuing
- **Port**: 6379 (localhost only)
- **Storage**: Temporary data only
- **Persistence**: Local volume
### Zig CLI
- **Purpose**: High-performance experiment management
- **Language**: Zig for maximum speed and efficiency
- **Features**:
- Content-addressed storage with deduplication
- SHA256-based commit ID generation
- WebSocket communication for real-time updates
- Rsync-based incremental file transfers
- Multi-threaded operations
- Secure API key authentication
- Auto-sync monitoring with file system watching
- Priority-based job queuing
- Memory-efficient operations with arena allocators
## Security Architecture
```mermaid
graph LR
USER[User] --> AUTH[API Key Auth]
AUTH --> RATE[Rate Limiting]
RATE --> WHITELIST[IP Whitelist]
WHITELIST --> API[Secure API]
API --> AUDIT[Audit Logging]
```
### Security Layers
1. **API Key Authentication** - Hashed keys with roles
2. **Rate Limiting** - 30 requests/minute
3. **IP Whitelisting** - Local networks only
4. **Fail2Ban** - Automatic IP blocking
5. **HTTPS/TLS** - Encrypted communication
6. **Audit Logging** - Complete action tracking
## Data Flow
```mermaid
sequenceDiagram
participant CLI
participant API
participant Redis
participant Storage
CLI->>API: HTTPS Request
API->>API: Validate Auth
API->>Redis: Cache/Queue
API->>Storage: Experiment Data
Storage->>API: Results
API->>CLI: Response
```
## Deployment Options
### Docker Compose (Recommended)
```yaml
services:
redis:
image: redis:7-alpine
ports: ["6379:6379"]
volumes: [redis_data:/data]
api-server:
build: .
ports: ["9101:9101"]
depends_on: [redis]
```
### Local Setup
```bash
./setup.sh && ./manage.sh start
```
## Network Architecture
- **Private Network**: Docker internal network
- **Localhost Access**: Redis only on localhost
- **HTTPS API**: Port 9101, TLS encrypted
- **No External Dependencies**: Everything runs locally
## Storage Architecture
```
data/
├── experiments/ # ML experiment results
├── cache/ # Temporary cache files
└── backups/ # Local backups
logs/
├── app.log # Application logs
├── audit.log # Security events
└── access.log # API access logs
```
## Monitoring Architecture
Simple, lightweight monitoring:
- **Health Checks**: Service availability
- **Log Files**: Structured logging
- **Basic Metrics**: Request counts, error rates
- **Security Events**: Failed auth, rate limits
## Homelab Benefits
-**Simple Setup**: One-command installation
-**Local Only**: No external dependencies
-**Secure by Default**: HTTPS, auth, rate limiting
-**Low Resource**: Minimal CPU/memory usage
-**Easy Backup**: Local file system
-**Privacy**: Everything stays on your network
## High-Level Architecture
```mermaid
graph TB
subgraph "Client Layer"
CLI[CLI Tools]
TUI[Terminal UI]
API[REST API]
end
subgraph "Authentication Layer"
Auth[Authentication Service]
RBAC[Role-Based Access Control]
Perm[Permission Manager]
end
subgraph "Core Services"
Worker[ML Worker Service]
DataMgr[Data Manager Service]
Queue[Job Queue]
end
subgraph "Storage Layer"
Redis[(Redis Cache)]
DB[(SQLite/PostgreSQL)]
Files[File Storage]
end
subgraph "Container Runtime"
Podman[Podman/Docker]
Containers[ML Containers]
end
CLI --> Auth
TUI --> Auth
API --> Auth
Auth --> RBAC
RBAC --> Perm
Worker --> Queue
Worker --> DataMgr
Worker --> Podman
DataMgr --> DB
DataMgr --> Files
Queue --> Redis
Podman --> Containers
```
## Zig CLI Architecture
### Component Structure
```mermaid
graph TB
subgraph "Zig CLI Components"
Main[main.zig] --> Commands[commands/]
Commands --> Config[config.zig]
Commands --> Utils[utils/]
Commands --> Net[net/]
Commands --> Errors[errors.zig]
subgraph "Commands"
Init[init.zig]
Sync[sync.zig]
Queue[queue.zig]
Watch[watch.zig]
Status[status.zig]
Monitor[monitor.zig]
Cancel[cancel.zig]
Prune[prune.zig]
end
subgraph "Utils"
Crypto[crypto.zig]
Storage[storage.zig]
Rsync[rsync.zig]
end
subgraph "Network"
WS[ws.zig]
end
end
```
### Performance Optimizations
#### Content-Addressed Storage
- **Deduplication**: Files stored by SHA256 hash
- **Space Efficiency**: Shared files across experiments
- **Fast Lookup**: Hash-based file retrieval
#### Memory Management
- **Arena Allocators**: Efficient bulk allocation
- **Zero-Copy Operations**: Minimized memory copying
- **Automatic Cleanup**: Resource deallocation
#### Network Communication
- **WebSocket Protocol**: Real-time bidirectional communication
- **Connection Pooling**: Reused connections
- **Binary Messaging**: Efficient data transfer
### Security Implementation
```mermaid
graph LR
subgraph "CLI Security"
Config[Config File] --> Hash[SHA256 Hashing]
Hash --> Auth[API Authentication]
Auth --> SSH[SSH Transfer]
SSH --> WS[WebSocket Security]
end
```
## Core Components
### 1. Authentication & Authorization
```mermaid
graph LR
subgraph "Auth Flow"
Client[Client] --> APIKey[API Key]
APIKey --> Hash[Hash Validation]
Hash --> Roles[Role Resolution]
Roles --> Perms[Permission Check]
Perms --> Access[Grant/Deny Access]
end
subgraph "Permission Sources"
YAML[YAML Config]
Inline[Inline Fallback]
Roles --> YAML
Roles --> Inline
end
```
**Features:**
- API key-based authentication
- Role-based access control (RBAC)
- YAML-based permission configuration
- Fallback to inline permissions
- Admin wildcard permissions
### 2. Worker Service
```mermaid
graph TB
subgraph "Worker Architecture"
API[HTTP API] --> Router[Request Router]
Router --> Auth[Auth Middleware]
Auth --> Queue[Job Queue]
Queue --> Processor[Job Processor]
Processor --> Runtime[Container Runtime]
Runtime --> Storage[Result Storage]
subgraph "Job Lifecycle"
Submit[Submit Job] --> Queue
Queue --> Execute[Execute]
Execute --> Monitor[Monitor]
Monitor --> Complete[Complete]
Complete --> Store[Store Results]
end
end
```
**Responsibilities:**
- HTTP API for job submission
- Job queue management
- Container orchestration
- Result collection and storage
- Metrics and monitoring
### 3. Data Manager Service
```mermaid
graph TB
subgraph "Data Management"
API[Data API] --> Storage[Storage Layer]
Storage --> Metadata[Metadata DB]
Storage --> Files[File System]
Storage --> Cache[Redis Cache]
subgraph "Data Operations"
Upload[Upload Data] --> Validate[Validate]
Validate --> Store[Store]
Store --> Index[Index]
Index --> Catalog[Catalog]
end
end
```
**Features:**
- Data upload and validation
- Metadata management
- File system abstraction
- Caching layer
- Data catalog
### 4. Terminal UI (TUI)
```mermaid
graph TB
subgraph "TUI Architecture"
UI[UI Components] --> Model[Data Model]
Model --> Update[Update Loop]
Update --> Render[Render]
subgraph "UI Panels"
Jobs[Job List]
Details[Job Details]
Logs[Log Viewer]
Status[Status Bar]
end
UI --> Jobs
UI --> Details
UI --> Logs
UI --> Status
end
```
**Components:**
- Bubble Tea framework
- Component-based architecture
- Real-time updates
- Keyboard navigation
- Theme support
## Data Flow
### Job Execution Flow
```mermaid
sequenceDiagram
participant Client
participant Auth
participant Worker
participant Queue
participant Container
participant Storage
Client->>Auth: Submit job with API key
Auth->>Client: Validate and return job ID
Client->>Worker: Execute job request
Worker->>Queue: Queue job
Queue->>Worker: Job ready
Worker->>Container: Start ML container
Container->>Worker: Execute experiment
Worker->>Storage: Store results
Worker->>Client: Return results
```
### Authentication Flow
```mermaid
sequenceDiagram
participant Client
participant Auth
participant PermMgr
participant Config
Client->>Auth: Request with API key
Auth->>Auth: Validate key hash
Auth->>PermMgr: Get user permissions
PermMgr->>Config: Load YAML permissions
Config->>PermMgr: Return permissions
PermMgr->>Auth: Return resolved permissions
Auth->>Client: Grant/deny access
```
## Security Architecture
### Defense in Depth
```mermaid
graph TB
subgraph "Security Layers"
Network[Network Security]
Auth[Authentication]
AuthZ[Authorization]
Container[Container Security]
Data[Data Protection]
Audit[Audit Logging]
end
Network --> Auth
Auth --> AuthZ
AuthZ --> Container
Container --> Data
Data --> Audit
```
**Security Features:**
- API key authentication
- Role-based permissions
- Container isolation
- File system sandboxing
- Comprehensive audit logs
- Input validation and sanitization
### Container Security
```mermaid
graph TB
subgraph "Container Isolation"
Host[Host System]
Podman[Podman Runtime]
Network[Network Isolation]
FS[File System Isolation]
User[User Namespaces]
ML[ML Container]
Host --> Podman
Podman --> Network
Podman --> FS
Podman --> User
User --> ML
end
```
**Isolation Features:**
- Rootless containers
- Network isolation
- File system sandboxing
- User namespace mapping
- Resource limits
## Configuration Architecture
### Configuration Hierarchy
```mermaid
graph TB
subgraph "Config Sources"
Env[Environment Variables]
File[Config Files]
CLI[CLI Flags]
Defaults[Default Values]
end
subgraph "Config Processing"
Merge[Config Merger]
Validate[Schema Validator]
Apply[Config Applier]
end
Env --> Merge
File --> Merge
CLI --> Merge
Defaults --> Merge
Merge --> Validate
Validate --> Apply
```
**Configuration Priority:**
1. CLI flags (highest)
2. Environment variables
3. Configuration files
4. Default values (lowest)
## Scalability Architecture
### Horizontal Scaling
```mermaid
graph TB
subgraph "Scaled Architecture"
LB[Load Balancer]
W1[Worker 1]
W2[Worker 2]
W3[Worker N]
Redis[Redis Cluster]
Storage[Shared Storage]
LB --> W1
LB --> W2
LB --> W3
W1 --> Redis
W2 --> Redis
W3 --> Redis
W1 --> Storage
W2 --> Storage
W3 --> Storage
end
```
**Scaling Features:**
- Stateless worker services
- Shared job queue (Redis)
- Distributed storage
- Load balancer ready
- Health checks and monitoring
## Technology Stack
### Backend Technologies
| Component | Technology | Purpose |
|-----------|------------|---------|
| **Language** | Go 1.25+ | Core application |
| **Web Framework** | Standard library | HTTP server |
| **Authentication** | Custom | API key + RBAC |
| **Database** | SQLite/PostgreSQL | Metadata storage |
| **Cache** | Redis | Job queue & caching |
| **Containers** | Podman/Docker | Job isolation |
| **UI Framework** | Bubble Tea | Terminal UI |
### Dependencies
```go
// Core dependencies
require (
github.com/charmbracelet/bubbletea v1.3.10 // TUI framework
github.com/go-redis/redis/v8 v8.11.5 // Redis client
github.com/google/uuid v1.6.0 // UUID generation
github.com/mattn/go-sqlite3 v1.14.32 // SQLite driver
golang.org/x/crypto v0.45.0 // Crypto utilities
gopkg.in/yaml.v3 v3.0.1 // YAML parsing
)
```
## Development Architecture
### Project Structure
```
fetch_ml/
├── cmd/ # CLI applications
│ ├── worker/ # ML worker service
│ ├── tui/ # Terminal UI
│ ├── data_manager/ # Data management
│ └── user_manager/ # User management
├── internal/ # Internal packages
│ ├── auth/ # Authentication system
│ ├── config/ # Configuration management
│ ├── container/ # Container operations
│ ├── database/ # Database operations
│ ├── logging/ # Logging utilities
│ ├── metrics/ # Metrics collection
│ └── network/ # Network utilities
├── configs/ # Configuration files
├── scripts/ # Setup and utility scripts
├── tests/ # Test suites
└── docs/ # Documentation
```
### Package Dependencies
```mermaid
graph TB
subgraph "Application Layer"
Worker[cmd/worker]
TUI[cmd/tui]
DataMgr[cmd/data_manager]
UserMgr[cmd/user_manager]
end
subgraph "Service Layer"
Auth[internal/auth]
Config[internal/config]
Container[internal/container]
Database[internal/database]
end
subgraph "Utility Layer"
Logging[internal/logging]
Metrics[internal/metrics]
Network[internal/network]
end
Worker --> Auth
Worker --> Config
Worker --> Container
TUI --> Auth
DataMgr --> Database
UserMgr --> Auth
Auth --> Logging
Container --> Network
Database --> Metrics
```
## Monitoring & Observability
### Metrics Collection
```mermaid
graph TB
subgraph "Metrics Pipeline"
App[Application] --> Metrics[Metrics Collector]
Metrics --> Export[Prometheus Exporter]
Export --> Prometheus[Prometheus Server]
Prometheus --> Grafana[Grafana Dashboard]
subgraph "Metric Types"
Counter[Counters]
Gauge[Gauges]
Histogram[Histograms]
Timer[Timers]
end
App --> Counter
App --> Gauge
App --> Histogram
App --> Timer
end
```
### Logging Architecture
```mermaid
graph TB
subgraph "Logging Pipeline"
App[Application] --> Logger[Structured Logger]
Logger --> File[File Output]
Logger --> Console[Console Output]
Logger --> Syslog[Syslog Forwarder]
Syslog --> Aggregator[Log Aggregator]
Aggregator --> Storage[Log Storage]
Storage --> Viewer[Log Viewer]
end
```
## Deployment Architecture
### Container Deployment
```mermaid
graph TB
subgraph "Deployment Stack"
Image[Container Image]
Registry[Container Registry]
Orchestrator[Docker Compose]
Config[ConfigMaps/Secrets]
Storage[Persistent Storage]
Image --> Registry
Registry --> Orchestrator
Config --> Orchestrator
Storage --> Orchestrator
end
```
### Service Discovery
```mermaid
graph TB
subgraph "Service Mesh"
Gateway[API Gateway]
Discovery[Service Discovery]
Worker[Worker Service]
Data[Data Service]
Redis[Redis Cluster]
Gateway --> Discovery
Discovery --> Worker
Discovery --> Data
Discovery --> Redis
end
```
## Future Architecture Considerations
### Microservices Evolution
- **API Gateway**: Centralized routing and authentication
- **Service Mesh**: Inter-service communication
- **Event Streaming**: Kafka for job events
- **Distributed Tracing**: OpenTelemetry integration
- **Multi-tenant**: Tenant isolation and quotas
### Homelab Features
- **Docker Compose**: Simple container orchestration
- **Local Development**: Easy setup and testing
- **Security**: Built-in authentication and encryption
- **Monitoring**: Basic health checks and logging
---
This architecture provides a solid foundation for secure, scalable machine learning experiments while maintaining simplicity and developer productivity.