fetch_ml/deployments
Jeremie Fraeys 6d200b5ac2
fix(docker): Use named volume for Redis to fix permission errors
Replace bind mount with Docker named volume for Redis data

This fixes 'operation not permitted' errors on macOS Docker Desktop

where bind mounts fail due to file sharing restrictions
2026-02-23 14:20:23 -05:00
..
Caddyfile.dev chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
Caddyfile.homelab-secure chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
Caddyfile.prod.smoke feat: add TUI SSH usability testing infrastructure 2026-02-18 17:48:02 -05:00
Caddyfile.smoke chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
deploy.sh chore(config): update configurations and deployment scripts 2026-02-12 12:05:37 -05:00
docker-compose.dev.yml fix(docker): Use named volume for Redis to fix permission errors 2026-02-23 14:20:23 -05:00
docker-compose.homelab-secure.yml chore: update configurations and deployment files 2026-02-16 20:38:19 -05:00
docker-compose.local.yml chore: update configurations and deployment files 2026-02-16 20:38:19 -05:00
docker-compose.prod.smoke.yml feat: add TUI SSH usability testing infrastructure 2026-02-18 17:48:02 -05:00
docker-compose.prod.yml chore: update configurations and deployment files 2026-02-16 20:38:19 -05:00
env.dev.example chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
env.prod.example chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
Makefile chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
README.md chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
setup.sh chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
tui-test-config.toml feat: add TUI SSH usability testing infrastructure 2026-02-18 17:48:02 -05:00

Docker Compose Deployments

This directory contains Docker Compose configurations for different deployment environments.

Environment Configurations

Development (docker-compose.dev.yml)

  • Full development stack with monitoring
  • Includes: API, Worker, Redis, MinIO (snapshots), Prometheus, Grafana, Loki, Promtail
  • Optimized for local development and testing
  • Usage: docker-compose -f deployments/docker-compose.dev.yml up -d

Homelab - Secure (docker-compose.homelab-secure.yml)

  • Secure homelab deployment with authentication and a Caddy reverse proxy
  • TLS is terminated at the reverse proxy (Approach A)
  • Includes: API, Redis (password protected), Caddy reverse proxy
  • Usage: docker-compose -f deployments/docker-compose.homelab-secure.yml up -d

Production (docker-compose.prod.yml)

  • Production deployment configuration
  • Optimized for performance and security
  • External services assumed (Redis, monitoring)
  • Usage: docker-compose -f deployments/docker-compose.prod.yml up -d

Note: docker-compose.prod.yml is a reproducible staging/testing harness. Real production deployments do not require Docker; you can run the Go services directly (systemd) and use Caddy for TLS/WSS termination.

TLS / WSS Policy

  • The Zig CLI currently supports ws:// only (native wss:// is not implemented).
  • Production deployments terminate TLS/WSS at a reverse proxy (Caddy in docker-compose.prod.yml) and keep the API server on internal ws://.
  • Homelab deployments terminate TLS/WSS at a reverse proxy (Caddy) and keep the API server on internal ws://.
  • Health checks in compose files should use http://localhost:9101/health when server.tls.enabled: false.

Required Volume Mounts

  • base_path (experiments) must be writable by the API server.
  • data_dir should be mounted if you want snapshot/dataset integrity validation via ml validate.

For the default configs:

  • base_path: /data/experiments (dev/homelab configs) or /app/data/experiments (prod configs)
  • data_dir: /data/active

Quick Start

# Development (most common)
docker-compose -f deployments/docker-compose.dev.yml up -d

# Check status
docker-compose -f deployments/docker-compose.dev.yml ps

# View logs
docker-compose -f deployments/docker-compose.dev.yml logs -f api-server

# Stop services
docker-compose -f deployments/docker-compose.dev.yml down

Dev: MinIO-backed snapshots (smoke test)

The dev compose file provisions a MinIO bucket and uploads a small example snapshot object at:

s3://fetchml-snapshots/snapshots/snap-1.tar.gz

To queue a task that forces the worker to pull the snapshot from MinIO:

  1. Start the dev stack: docker-compose -f deployments/docker-compose.dev.yml up -d

  2. Read the snapshot_sha256 printed by the init job: docker-compose -f deployments/docker-compose.dev.yml logs minio-init

  3. Queue a job using the snapshot fields: ml queue <job-name> --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>

Smoke tests

  • make dev-smoke runs the development stack smoke test.

  • make prod-smoke runs a Docker-based staging smoke test for the production stack, using a localhost-only Caddy configuration.

    Note: ml queue by itself will generate a random commit ID. For full provenance enforcement (manifest + dependency manifest), use ml sync ./your-project --queue so the server has real code + dependency files.

    Examples:

    • ml queue train-mnist --priority 3 --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>
    • ml queue train-a train-b train-c --priority 5 --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>

Environment Variables

Create a .env file in the project root:

# Grafana
GRAFANA_ADMIN_PASSWORD=your_secure_password

# API Configuration
LOG_LEVEL=info

# TLS (for secure deployments)
TLS_CERT_PATH=/app/ssl/cert.pem
TLS_KEY_PATH=/app/ssl/key.pem

Service Ports

Service Development Homelab Production
API Server 9101 9101 9101
Redis 6379 6379 -
Prometheus 9090 - -
Grafana 3000 - -
Loki 3100 - -

Monitoring

  • Development: Full monitoring stack included
  • Homelab: Basic monitoring (configurable)
  • Production: External monitoring assumed

Security Notes

  • If you need HTTPS externally, terminate TLS at a reverse proxy.
  • API keys should be managed via environment variables
  • Database credentials should use secrets management in production