# FetchML - Machine Learning Platform A production-ready ML experiment platform with task queuing, monitoring, and a modern CLI/API. ## Features - **🚀 Production Resilience** - Task leasing, smart retries, dead-letter queues - **📊 Monitoring** - Grafana/Prometheus/Loki with auto-provisioned dashboards - **🔐 Security** - API key auth, TLS, rate limiting, IP whitelisting - **⚡ Performance** - Go API server + Zig CLI for speed - **📦 Easy Deployment** - Docker Compose (dev) or systemd (prod) ## Quick Start ### Development (macOS/Linux) ```bash # Clone and start git clone cd fetch_ml docker-compose up -d # Access Grafana: http://localhost:3000 (admin/admin) ``` ### Production (Linux) ```bash # Setup application sudo ./scripts/setup-prod.sh # Setup monitoring sudo ./scripts/setup-monitoring-prod.sh # Build and install make prod make install # Start services sudo systemctl start fetchml-api fetchml-worker sudo systemctl start prometheus grafana loki promtail ``` ## Architecture ``` ┌──────────────┐ WebSocket ┌──────────────┐ │ Zig CLI/TUI │◄─────────────►│ API Server │ └──────────────┘ │ (Go) │ └──────┬───────┘ │ ┌─────────────┼─────────────┐ │ │ │ ┌────▼────┐ ┌───▼────┐ ┌───▼────┐ │ Redis │ │ Worker │ │ Loki │ │ (Queue) │ │ (Go) │ │ (Logs) │ └─────────┘ └────────┘ └────────┘ ``` ## Usage ### API Server ```bash # Development (stderr logging) go run cmd/api-server/main.go --config configs/config-dev.yaml # Production (file logging) go run cmd/api-server/main.go --config configs/config-no-tls.yaml ``` ### CLI ```bash # Build cd cli && zig build prod # Run experiment ./cli/zig-out/bin/ml run --config config.toml # Check status ./cli/zig-out/bin/ml status ``` ### Docker ```bash make docker-run # Start all services make docker-logs # View logs make docker-stop # Stop services ``` ## Development ### Prerequisites - Go 1.21+ - Zig 0.11+ - Redis - Docker (for local dev) ### Build ```bash make build # All components make dev # Fast dev build make prod # Optimized production build ``` ### Test ```bash make test # All tests make test-unit # Unit tests only make test-coverage # With coverage report ``` ## Configuration ### Development (`configs/config-dev.yaml`) ```yaml logging: level: "info" file: "" # stderr only redis: url: "redis://localhost:6379" ``` ### Production (`configs/config-no-tls.yaml`) ```yaml logging: level: "info" file: "./logs/fetch_ml.log" # file only redis: url: "redis://redis:6379" ``` ## Monitoring ### Grafana Dashboards (Auto-Provisioned) - **ML Task Queue** - Queue depth, task duration, failure rates - **Application Logs** - Log streams, error tracking, search Access: `http://localhost:3000` (dev) or `http://YOUR_SERVER:3000` (prod) ### Metrics - Queue depth and task processing rates - Retry attempts by error category - Dead letter queue size - Lease expirations ## Documentation - **[Getting Started](docs/getting-started.md)** - Detailed setup guide - **[Production Deployment](docs/production-monitoring.md)** - Linux deployment - **[WebSocket API](docs/api/)** - Protocol documentation - **[Architecture](docs/architecture/)** - System design ## Makefile Targets ```bash # Build make build # Build all components make prod # Production build make clean # Clean artifacts # Docker make docker-build # Build image make docker-run # Start services make docker-stop # Stop services # Test make test # All tests make test-coverage # With coverage # Production (Linux only) make setup # Setup app make setup-monitoring # Setup monitoring make install # Install binaries ``` ## Security - **TLS/HTTPS** - End-to-end encryption - **API Keys** - Hashed with SHA256 - **Rate Limiting** - Per-user quotas - **IP Whitelist** - Network restrictions - **Audit Logging** - All API access logged ## License MIT - See [LICENSE](LICENSE) ## Contributing Contributions welcome! This is a personal homelab project but PRs are appreciated.