History

Jeremie Fraeys 86f9ae5a7e docs(config): reorganize configuration structure and add documentation Restructure configuration files for better organization: - Add scheduler configuration examples (scheduler.yaml.example) - Reorganize worker configs into subdirectories: - distributed/ - Multi-node cluster configurations - standalone/ - Single-node deployment configs - Add environment-specific configs: - dev-local.yaml, docker-dev.yaml, docker-prod.yaml - homelab-secure.yaml, worker-prod.toml - Add deployment configs for different security modes: - docker-standard.yaml, docker-hipaa.yaml, docker-dev.yaml Add documentation: - configs/README.md with configuration guidelines - configs/SECURITY.md with security configuration best practices		2026-02-26 12:04:11 -05:00
..
configs/worker	docs(config): reorganize configuration structure and add documentation	2026-02-26 12:04:11 -05:00
Caddyfile.dev	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
Caddyfile.homelab-secure	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
Caddyfile.prod.smoke	feat: add TUI SSH usability testing infrastructure	2026-02-18 17:48:02 -05:00
Caddyfile.smoke	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
deploy.sh	chore(config): update configurations and deployment scripts	2026-02-12 12:05:37 -05:00
docker-compose.dev.yml	fix(deployments): use relative paths instead of FETCHML_REPO_ROOT with wrong fallback	2026-02-24 11:53:19 -05:00
docker-compose.homelab-secure.yml	fix(deployments): use relative paths instead of FETCHML_REPO_ROOT with wrong fallback	2026-02-24 11:53:19 -05:00
docker-compose.local.yml	fix(deployments): add env var support for data directories	2026-02-24 11:43:11 -05:00
docker-compose.prod.smoke.yml	fix(deployments): use relative paths instead of FETCHML_REPO_ROOT with wrong fallback	2026-02-24 11:53:19 -05:00
docker-compose.prod.yml	fix(deployments): use relative paths instead of FETCHML_REPO_ROOT with wrong fallback	2026-02-24 11:53:19 -05:00
env.dev.example	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
env.prod.example	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
Makefile	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
README.md	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
setup.sh	chore(ops): reorganize deployments/monitoring and remove legacy scripts	2026-01-05 12:31:26 -05:00
tui-test-config.toml	feat: add TUI SSH usability testing infrastructure	2026-02-18 17:48:02 -05:00

README.md

Docker Compose Deployments

This directory contains Docker Compose configurations for different deployment environments.

Environment Configurations

Development (`docker-compose.dev.yml`)

Full development stack with monitoring
Includes: API, Worker, Redis, MinIO (snapshots), Prometheus, Grafana, Loki, Promtail
Optimized for local development and testing
Usage: docker-compose -f deployments/docker-compose.dev.yml up -d

Homelab - Secure (`docker-compose.homelab-secure.yml`)

Secure homelab deployment with authentication and a Caddy reverse proxy
TLS is terminated at the reverse proxy (Approach A)
Includes: API, Redis (password protected), Caddy reverse proxy
Usage: docker-compose -f deployments/docker-compose.homelab-secure.yml up -d

Production (`docker-compose.prod.yml`)

Production deployment configuration
Optimized for performance and security
External services assumed (Redis, monitoring)
Usage: docker-compose -f deployments/docker-compose.prod.yml up -d

Note: docker-compose.prod.yml is a reproducible staging/testing harness. Real production deployments do not require Docker; you can run the Go services directly (systemd) and use Caddy for TLS/WSS termination.

TLS / WSS Policy

The Zig CLI currently supports ws:// only (native wss:// is not implemented).
Production deployments terminate TLS/WSS at a reverse proxy (Caddy in docker-compose.prod.yml) and keep the API server on internal ws://.
Homelab deployments terminate TLS/WSS at a reverse proxy (Caddy) and keep the API server on internal ws://.
Health checks in compose files should use http://localhost:9101/health when server.tls.enabled: false.

Required Volume Mounts

base_path (experiments) must be writable by the API server.
data_dir should be mounted if you want snapshot/dataset integrity validation via ml validate.

For the default configs:

base_path: /data/experiments (dev/homelab configs) or /app/data/experiments (prod configs)
data_dir: /data/active

Quick Start

# Development (most common)
docker-compose -f deployments/docker-compose.dev.yml up -d

# Check status
docker-compose -f deployments/docker-compose.dev.yml ps

# View logs
docker-compose -f deployments/docker-compose.dev.yml logs -f api-server

# Stop services
docker-compose -f deployments/docker-compose.dev.yml down

Dev: MinIO-backed snapshots (smoke test)

The dev compose file provisions a MinIO bucket and uploads a small example snapshot object at:

s3://fetchml-snapshots/snapshots/snap-1.tar.gz

To queue a task that forces the worker to pull the snapshot from MinIO:

Start the dev stack: docker-compose -f deployments/docker-compose.dev.yml up -d
Read the snapshot_sha256 printed by the init job: docker-compose -f deployments/docker-compose.dev.yml logs minio-init
Queue a job using the snapshot fields: ml queue <job-name> --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>

Smoke tests

make dev-smoke runs the development stack smoke test.
make prod-smoke runs a Docker-based staging smoke test for the production stack, using a localhost-only Caddy configuration.

Note: ml queue by itself will generate a random commit ID. For full provenance enforcement (manifest + dependency manifest), use ml sync ./your-project --queue so the server has real code + dependency files.

Examples:
- ml queue train-mnist --priority 3 --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>
- ml queue train-a train-b train-c --priority 5 --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>

Environment Variables

Create a .env file in the project root:

# Grafana
GRAFANA_ADMIN_PASSWORD=your_secure_password

# API Configuration
LOG_LEVEL=info

# TLS (for secure deployments)
TLS_CERT_PATH=/app/ssl/cert.pem
TLS_KEY_PATH=/app/ssl/key.pem

Service Ports

Service	Development	Homelab	Production
API Server	9101	9101	9101
Redis	6379	6379	-
Prometheus	9090	-	-
Grafana	3000	-	-
Loki	3100	-	-

Monitoring

Development: Full monitoring stack included
Homelab: Basic monitoring (configurable)
Production: External monitoring assumed

Security Notes

If you need HTTPS externally, terminate TLS at a reverse proxy.
API keys should be managed via environment variables
Database credentials should use secrets management in production