No description
Find a file
Jeremie Fraeys b75bd24bba Add CLI jupyter command for transparent Jupyter management
- Add jupyter.zig command with start/stop/status actions
- Update main.zig to include jupyter command in CLI
- CLI now handles all Jupyter setup transparently
- Data scientists only need: ml jupyter start
- Auto-detects container runtime (Podman/Docker)
- Manages container lifecycle automatically
- Provides clear status and error messages
2025-12-06 16:07:09 -05:00
.github Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
.local-artifacts Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
.windsurf/rules Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
build Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
cli Add CLI jupyter command for transparent Jupyter management 2025-12-06 16:07:09 -05:00
cmd Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
configs Fix multi-user authentication and WebSocket issues 2025-12-06 13:38:08 -05:00
db feat: add GitHub workflows and development tooling 2025-12-04 16:56:25 -05:00
deployments Organize docker-compose files and fix test output paths 2025-12-06 13:45:05 -05:00
docs Add Jupyter workflow documentation to docs/ 2025-12-06 16:02:49 -05:00
examples Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
internal Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
monitoring Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
nginx feat: add comprehensive configuration and deployment infrastructure 2025-12-04 16:54:02 -05:00
podman Test and verify CLI-Jupyter workflow integration 2025-12-06 16:01:03 -05:00
scripts Organize docker-compose files and fix test output paths 2025-12-06 13:45:05 -05:00
tests Clean up README files and enhance testing documentation 2025-12-06 15:43:51 -05:00
tools Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
.env.dev feat: add comprehensive configuration and deployment infrastructure 2025-12-04 16:54:02 -05:00
.env.example feat: add comprehensive configuration and deployment infrastructure 2025-12-04 16:54:02 -05:00
.flake8 feat: initialize FetchML ML platform with core project structure 2025-12-04 16:52:09 -05:00
.gitignore Clean up README files and enhance testing documentation 2025-12-06 15:43:51 -05:00
.golangci.yml Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
.golintrc Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
.pylintrc feat: initialize FetchML ML platform with core project structure 2025-12-04 16:52:09 -05:00
DEVELOPMENT.md docs: add comprehensive development guide with flexible tooling 2025-12-04 17:03:23 -05:00
docker-compose.yml Fix multi-user authentication and WebSocket issues 2025-12-06 13:38:08 -05:00
go.mod feat: initialize FetchML ML platform with core project structure 2025-12-04 16:52:09 -05:00
go.sum feat: initialize FetchML ML platform with core project structure 2025-12-04 16:52:09 -05:00
LICENSE feat: initialize FetchML ML platform with core project structure 2025-12-04 16:52:09 -05:00
Makefile Organize docker-compose files and fix test output paths 2025-12-06 13:45:05 -05:00
pyproject.toml feat: initialize FetchML ML platform with core project structure 2025-12-04 16:52:09 -05:00
README.md Clean up README files and enhance testing documentation 2025-12-06 15:43:51 -05:00
setup.sh feat: add comprehensive setup scripts and management tools 2025-12-04 16:55:04 -05:00

FetchML - Machine Learning Platform

A production-ready ML experiment platform with task queuing, monitoring, and a modern CLI/API.

Features

  • 🚀 Production Resilience - Task leasing, smart retries, dead-letter queues
  • 📊 Monitoring - Grafana/Prometheus/Loki with auto-provisioned dashboards
  • 🔐 Security - API key auth, TLS, rate limiting, IP whitelisting
  • Performance - Go API server + Zig CLI for speed
  • 📦 Easy Deployment - Docker Compose (dev) or systemd (prod)

Quick Start

Development (macOS/Linux)

# Clone and start
git clone <your-repo>
cd fetch_ml
docker-compose up -d

# Access Grafana: http://localhost:3000 (admin/admin)

Production (Linux)

# Setup application
sudo ./scripts/setup-prod.sh

# Setup monitoring  
sudo ./scripts/setup-monitoring-prod.sh

# Build and install
make prod
make install

# Start services
sudo systemctl start fetchml-api fetchml-worker
sudo systemctl start prometheus grafana loki promtail

Architecture

┌──────────────┐   WebSocket   ┌──────────────┐
│  Zig CLI/TUI │◄─────────────►│  API Server  │
└──────────────┘               │    (Go)      │
                               └──────┬───────┘
                                      │
                        ┌─────────────┼─────────────┐
                        │             │             │
                   ┌────▼────┐   ┌───▼────┐   ┌───▼────┐
                   │  Redis  │   │ Worker │   │  Loki  │
                   │ (Queue) │   │  (Go)  │   │ (Logs) │
                   └─────────┘   └────────┘   └────────┘

Usage

API Server

# Development (stderr logging)
go run cmd/api-server/main.go --config configs/config-dev.yaml

# Production (file logging)
go run cmd/api-server/main.go --config configs/config-no-tls.yaml

CLI

# Build
cd cli && zig build prod

# Run experiment
./cli/zig-out/bin/ml run --config config.toml

# Check status  
./cli/zig-out/bin/ml status

Docker

make docker-run      # Start all services
make docker-logs     # View logs
make docker-stop     # Stop services

Development

Prerequisites

  • Go 1.21+
  • Zig 0.11+
  • Redis
  • Docker (for local dev)

Build

make build           # All components
make dev             # Fast dev build
make prod            # Optimized production build

Testing

make test            # All tests
make test-unit       # Unit tests only
make test-coverage   # With coverage report
make test-auth       # Multi-user authentication tests

Quick Start Testing: See Testing Guide for comprehensive testing documentation, including a 5-minute quick start guide.

Configuration

Development (configs/config-dev.yaml)

logging:
  level: "info"
  file: ""  # stderr only

redis:
  url: "redis://localhost:6379"

Production (configs/config-no-tls.yaml)

logging:
  level: "info"
  file: "./logs/fetch_ml.log"  # file only

redis:
  url: "redis://redis:6379"

Monitoring

Grafana Dashboards (Auto-Provisioned)

  • ML Task Queue - Queue depth, task duration, failure rates
  • Application Logs - Log streams, error tracking, search

Access: http://localhost:3000 (dev) or http://YOUR_SERVER:3000 (prod)

Metrics

  • Queue depth and task processing rates
  • Retry attempts by error category
  • Dead letter queue size
  • Lease expirations

Documentation

Makefile Targets

# Build
make build               # Build all components
make prod                # Production build
make clean               # Clean artifacts

# Docker
make docker-build        # Build image
make docker-run          # Start services
make docker-stop         # Stop services

# Test
make test                # All tests
make test-coverage       # With coverage

# Production (Linux only)
make setup               # Setup app
make setup-monitoring    # Setup monitoring
make install             # Install binaries

Security

  • TLS/HTTPS - End-to-end encryption
  • API Keys - Hashed with SHA256
  • Rate Limiting - Per-user quotas
  • IP Whitelist - Network restrictions
  • Audit Logging - All API access logged

License

MIT - See LICENSE

Contributing

Contributions welcome! This is a personal homelab project but PRs are appreciated.