No description

Find a file

Jeremie Fraeys c5049a2fdf feat: initialize FetchML ML platform with core project structure - Add comprehensive README with architecture overview and quick start guide - Set up Go module with production-ready dependencies - Configure build system with Makefile for development and production builds - Add Docker Compose for local development environment - Include project configuration files (linting, Python, etc.) This establishes the foundation for a production-ready ML experiment platform with task queuing, monitoring, and modern CLI/API interface.		2025-12-04 16:52:09 -05:00
.flake8	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
.gitignore	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
.golangci.yml	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
.pylintrc	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
docker-compose.yml	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
go.mod	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
go.sum	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
LICENSE	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
Makefile	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
pyproject.toml	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
README.md	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00

README.md

FetchML - Machine Learning Platform

A production-ready ML experiment platform with task queuing, monitoring, and a modern CLI/API.

Features

🚀 Production Resilience - Task leasing, smart retries, dead-letter queues
📊 Monitoring - Grafana/Prometheus/Loki with auto-provisioned dashboards
🔐 Security - API key auth, TLS, rate limiting, IP whitelisting
⚡ Performance - Go API server + Zig CLI for speed
📦 Easy Deployment - Docker Compose (dev) or systemd (prod)

Quick Start

Development (macOS/Linux)

# Clone and start
git clone <your-repo>
cd fetch_ml
docker-compose up -d

# Access Grafana: http://localhost:3000 (admin/admin)

Production (Linux)

# Setup application
sudo ./scripts/setup-prod.sh

# Setup monitoring  
sudo ./scripts/setup-monitoring-prod.sh

# Build and install
make prod
make install

# Start services
sudo systemctl start fetchml-api fetchml-worker
sudo systemctl start prometheus grafana loki promtail

Architecture

┌──────────────┐   WebSocket   ┌──────────────┐
│  Zig CLI/TUI │◄─────────────►│  API Server  │
└──────────────┘               │    (Go)      │
                               └──────┬───────┘
                                      │
                        ┌─────────────┼─────────────┐
                        │             │             │
                   ┌────▼────┐   ┌───▼────┐   ┌───▼────┐
                   │  Redis  │   │ Worker │   │  Loki  │
                   │ (Queue) │   │  (Go)  │   │ (Logs) │
                   └─────────┘   └────────┘   └────────┘

Usage

API Server

# Development (stderr logging)
go run cmd/api-server/main.go --config configs/config-dev.yaml

# Production (file logging)
go run cmd/api-server/main.go --config configs/config-no-tls.yaml

CLI

# Build
cd cli && zig build prod

# Run experiment
./cli/zig-out/bin/ml run --config config.toml

# Check status  
./cli/zig-out/bin/ml status

Docker

make docker-run      # Start all services
make docker-logs     # View logs
make docker-stop     # Stop services

Development

Prerequisites

Go 1.21+
Zig 0.11+
Redis
Docker (for local dev)

Build

make build           # All components
make dev             # Fast dev build
make prod            # Optimized production build

Test

make test            # All tests
make test-unit       # Unit tests only
make test-coverage   # With coverage report

Configuration

Development (`configs/config-dev.yaml`)

logging:
  level: "info"
  file: ""  # stderr only

redis:
  url: "redis://localhost:6379"

Production (`configs/config-no-tls.yaml`)

logging:
  level: "info"
  file: "./logs/fetch_ml.log"  # file only

redis:
  url: "redis://redis:6379"

Monitoring

Grafana Dashboards (Auto-Provisioned)

ML Task Queue - Queue depth, task duration, failure rates
Application Logs - Log streams, error tracking, search

Access: http://localhost:3000 (dev) or http://YOUR_SERVER:3000 (prod)

Metrics

Queue depth and task processing rates
Retry attempts by error category
Dead letter queue size
Lease expirations

Documentation

Getting Started - Detailed setup guide
Production Deployment - Linux deployment
WebSocket API - Protocol documentation
Architecture - System design

Makefile Targets

# Build
make build               # Build all components
make prod                # Production build
make clean               # Clean artifacts

# Docker
make docker-build        # Build image
make docker-run          # Start services
make docker-stop         # Stop services

# Test
make test                # All tests
make test-coverage       # With coverage

# Production (Linux only)
make setup               # Setup app
make setup-monitoring    # Setup monitoring
make install             # Install binaries

Security

TLS/HTTPS - End-to-end encryption
API Keys - Hashed with SHA256
Rate Limiting - Per-user quotas
IP Whitelist - Network restrictions
Audit Logging - All API access logged

License

MIT - See LICENSE

Contributing

Contributions welcome! This is a personal homelab project but PRs are appreciated.