No description

Find a file

Jeremie Fraeys b75bd24bba Add CLI jupyter command for transparent Jupyter management - Add jupyter.zig command with start/stop/status actions - Update main.zig to include jupyter command in CLI - CLI now handles all Jupyter setup transparently - Data scientists only need: ml jupyter start - Auto-detects container runtime (Podman/Docker) - Manages container lifecycle automatically - Provides clear status and error messages		2025-12-06 16:07:09 -05:00
.github	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
.local-artifacts	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
.windsurf/rules	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
build	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
cli	Add CLI jupyter command for transparent Jupyter management	2025-12-06 16:07:09 -05:00
cmd	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
configs	Fix multi-user authentication and WebSocket issues	2025-12-06 13:38:08 -05:00
db	feat: add GitHub workflows and development tooling	2025-12-04 16:56:25 -05:00
deployments	Organize docker-compose files and fix test output paths	2025-12-06 13:45:05 -05:00
docs	Add Jupyter workflow documentation to docs/	2025-12-06 16:02:49 -05:00
examples	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
internal	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
monitoring	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
nginx	feat: add comprehensive configuration and deployment infrastructure	2025-12-04 16:54:02 -05:00
podman	Test and verify CLI-Jupyter workflow integration	2025-12-06 16:01:03 -05:00
scripts	Organize docker-compose files and fix test output paths	2025-12-06 13:45:05 -05:00
tests	Clean up README files and enhance testing documentation	2025-12-06 15:43:51 -05:00
tools	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
.env.dev	feat: add comprehensive configuration and deployment infrastructure	2025-12-04 16:54:02 -05:00
.env.example	feat: add comprehensive configuration and deployment infrastructure	2025-12-04 16:54:02 -05:00
.flake8	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
.gitignore	Clean up README files and enhance testing documentation	2025-12-06 15:43:51 -05:00
.golangci.yml	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
.golintrc	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
.pylintrc	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
DEVELOPMENT.md	docs: add comprehensive development guide with flexible tooling	2025-12-04 17:03:23 -05:00
docker-compose.yml	Fix multi-user authentication and WebSocket issues	2025-12-06 13:38:08 -05:00
go.mod	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
go.sum	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
LICENSE	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
Makefile	Organize docker-compose files and fix test output paths	2025-12-06 13:45:05 -05:00
pyproject.toml	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
README.md	Clean up README files and enhance testing documentation	2025-12-06 15:43:51 -05:00
setup.sh	feat: add comprehensive setup scripts and management tools	2025-12-04 16:55:04 -05:00

README.md

FetchML - Machine Learning Platform

A production-ready ML experiment platform with task queuing, monitoring, and a modern CLI/API.

Features

🚀 Production Resilience - Task leasing, smart retries, dead-letter queues
📊 Monitoring - Grafana/Prometheus/Loki with auto-provisioned dashboards
🔐 Security - API key auth, TLS, rate limiting, IP whitelisting
⚡ Performance - Go API server + Zig CLI for speed
📦 Easy Deployment - Docker Compose (dev) or systemd (prod)

Quick Start

Development (macOS/Linux)

# Clone and start
git clone <your-repo>
cd fetch_ml
docker-compose up -d

# Access Grafana: http://localhost:3000 (admin/admin)

Production (Linux)

# Setup application
sudo ./scripts/setup-prod.sh

# Setup monitoring  
sudo ./scripts/setup-monitoring-prod.sh

# Build and install
make prod
make install

# Start services
sudo systemctl start fetchml-api fetchml-worker
sudo systemctl start prometheus grafana loki promtail

Architecture

┌──────────────┐   WebSocket   ┌──────────────┐
│  Zig CLI/TUI │◄─────────────►│  API Server  │
└──────────────┘               │    (Go)      │
                               └──────┬───────┘
                                      │
                        ┌─────────────┼─────────────┐
                        │             │             │
                   ┌────▼────┐   ┌───▼────┐   ┌───▼────┐
                   │  Redis  │   │ Worker │   │  Loki  │
                   │ (Queue) │   │  (Go)  │   │ (Logs) │
                   └─────────┘   └────────┘   └────────┘

Usage

API Server

# Development (stderr logging)
go run cmd/api-server/main.go --config configs/config-dev.yaml

# Production (file logging)
go run cmd/api-server/main.go --config configs/config-no-tls.yaml

CLI

# Build
cd cli && zig build prod

# Run experiment
./cli/zig-out/bin/ml run --config config.toml

# Check status  
./cli/zig-out/bin/ml status

Docker

make docker-run      # Start all services
make docker-logs     # View logs
make docker-stop     # Stop services

Development

Prerequisites

Go 1.21+
Zig 0.11+
Redis
Docker (for local dev)

Build

make build           # All components
make dev             # Fast dev build
make prod            # Optimized production build

Testing

make test            # All tests
make test-unit       # Unit tests only
make test-coverage   # With coverage report
make test-auth       # Multi-user authentication tests

Quick Start Testing: See Testing Guide for comprehensive testing documentation, including a 5-minute quick start guide.

Configuration

Development (`configs/config-dev.yaml`)

logging:
  level: "info"
  file: ""  # stderr only

redis:
  url: "redis://localhost:6379"

Production (`configs/config-no-tls.yaml`)

logging:
  level: "info"
  file: "./logs/fetch_ml.log"  # file only

redis:
  url: "redis://redis:6379"

Monitoring

Grafana Dashboards (Auto-Provisioned)

ML Task Queue - Queue depth, task duration, failure rates
Application Logs - Log streams, error tracking, search

Access: http://localhost:3000 (dev) or http://YOUR_SERVER:3000 (prod)

Metrics

Queue depth and task processing rates
Retry attempts by error category
Dead letter queue size
Lease expirations

Documentation

Testing Guide - Comprehensive testing documentation
Quick Start Testing - 5-minute testing guide
Installation - Setup instructions
Architecture - System design
Configuration Reference - Configuration options
CLI Reference - Command-line interface
Deployment - Production deployment
Troubleshooting - Common issues

Makefile Targets

# Build
make build               # Build all components
make prod                # Production build
make clean               # Clean artifacts

# Docker
make docker-build        # Build image
make docker-run          # Start services
make docker-stop         # Stop services

# Test
make test                # All tests
make test-coverage       # With coverage

# Production (Linux only)
make setup               # Setup app
make setup-monitoring    # Setup monitoring
make install             # Install binaries

Security

TLS/HTTPS - End-to-end encryption
API Keys - Hashed with SHA256
Rate Limiting - Per-user quotas
IP Whitelist - Network restrictions
Audit Logging - All API access logged

License

MIT - See LICENSE

Contributing

Contributions welcome! This is a personal homelab project but PRs are appreciated.