124 lines
4.4 KiB
Markdown
124 lines
4.4 KiB
Markdown
# Docker Compose Deployments
|
|
|
|
This directory contains Docker Compose configurations for different deployment environments.
|
|
|
|
## Environment Configurations
|
|
|
|
### Development (`docker-compose.dev.yml`)
|
|
- Full development stack with monitoring
|
|
- Includes: API, Worker, Redis, MinIO (snapshots), Prometheus, Grafana, Loki, Promtail
|
|
- Optimized for local development and testing
|
|
- **Usage**: `docker-compose -f deployments/docker-compose.dev.yml up -d`
|
|
|
|
### Homelab - Secure (`docker-compose.homelab-secure.yml`)
|
|
- Secure homelab deployment with authentication and a Caddy reverse proxy
|
|
- TLS is terminated at the reverse proxy (Approach A)
|
|
- Includes: API, Redis (password protected), Caddy reverse proxy
|
|
- **Usage**: `docker-compose -f deployments/docker-compose.homelab-secure.yml up -d`
|
|
|
|
### Production (`docker-compose.prod.yml`)
|
|
- Production deployment configuration
|
|
- Optimized for performance and security
|
|
- External services assumed (Redis, monitoring)
|
|
- **Usage**: `docker-compose -f deployments/docker-compose.prod.yml up -d`
|
|
|
|
Note: `docker-compose.prod.yml` is a reproducible staging/testing harness. Real production deployments do not require Docker; you can run the Go services directly (systemd) and use Caddy for TLS/WSS termination.
|
|
|
|
## TLS / WSS Policy
|
|
|
|
- The Zig CLI currently supports `ws://` only (native `wss://` is not implemented).
|
|
- Production deployments terminate TLS/WSS at a reverse proxy (Caddy in `docker-compose.prod.yml`) and keep the API server on internal `ws://`.
|
|
- Homelab deployments terminate TLS/WSS at a reverse proxy (Caddy) and keep the API server on internal `ws://`.
|
|
- Health checks in compose files should use `http://localhost:9101/health` when `server.tls.enabled: false`.
|
|
|
|
## Required Volume Mounts
|
|
|
|
- `base_path` (experiments) must be writable by the API server.
|
|
- `data_dir` should be mounted if you want snapshot/dataset integrity validation via `ml validate`.
|
|
|
|
For the default configs:
|
|
|
|
- `base_path`: `/data/experiments` (dev/homelab configs) or `/app/data/experiments` (prod configs)
|
|
- `data_dir`: `/data/active`
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Development (most common)
|
|
docker-compose -f deployments/docker-compose.dev.yml up -d
|
|
|
|
# Check status
|
|
docker-compose -f deployments/docker-compose.dev.yml ps
|
|
|
|
# View logs
|
|
docker-compose -f deployments/docker-compose.dev.yml logs -f api-server
|
|
|
|
# Stop services
|
|
docker-compose -f deployments/docker-compose.dev.yml down
|
|
```
|
|
|
|
## Dev: MinIO-backed snapshots (smoke test)
|
|
|
|
The dev compose file provisions a MinIO bucket and uploads a small example snapshot object at:
|
|
|
|
`s3://fetchml-snapshots/snapshots/snap-1.tar.gz`
|
|
|
|
To queue a task that forces the worker to pull the snapshot from MinIO:
|
|
|
|
1. Start the dev stack:
|
|
`docker-compose -f deployments/docker-compose.dev.yml up -d`
|
|
|
|
2. Read the `snapshot_sha256` printed by the init job:
|
|
`docker-compose -f deployments/docker-compose.dev.yml logs minio-init`
|
|
|
|
3. Queue a job using the snapshot fields:
|
|
`ml queue <job-name> --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>`
|
|
|
|
## Smoke tests
|
|
|
|
- `make dev-smoke` runs the development stack smoke test.
|
|
- `make prod-smoke` runs a Docker-based staging smoke test for the production stack, using a localhost-only Caddy configuration.
|
|
|
|
Note: `ml queue` by itself will generate a random commit ID. For full provenance enforcement (manifest + dependency manifest), use `ml sync ./your-project --queue` so the server has real code + dependency files.
|
|
|
|
Examples:
|
|
- `ml queue train-mnist --priority 3 --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>`
|
|
- `ml queue train-a train-b train-c --priority 5 --snapshot-id snap-1 --snapshot-sha256 <snapshot_sha256>`
|
|
|
|
## Environment Variables
|
|
|
|
Create a `.env` file in the project root:
|
|
|
|
```bash
|
|
# Grafana
|
|
GRAFANA_ADMIN_PASSWORD=your_secure_password
|
|
|
|
# API Configuration
|
|
LOG_LEVEL=info
|
|
|
|
# TLS (for secure deployments)
|
|
TLS_CERT_PATH=/app/ssl/cert.pem
|
|
TLS_KEY_PATH=/app/ssl/key.pem
|
|
```
|
|
|
|
## Service Ports
|
|
|
|
| Service | Development | Homelab | Production |
|
|
|---------|-------------|---------|------------|
|
|
| API Server | 9101 | 9101 | 9101 |
|
|
| Redis | 6379 | 6379 | - |
|
|
| Prometheus | 9090 | - | - |
|
|
| Grafana | 3000 | - | - |
|
|
| Loki | 3100 | - | - |
|
|
|
|
## Monitoring
|
|
|
|
- **Development**: Full monitoring stack included
|
|
- **Homelab**: Basic monitoring (configurable)
|
|
- **Production**: External monitoring assumed
|
|
|
|
## Security Notes
|
|
|
|
- If you need HTTPS externally, terminate TLS at a reverse proxy.
|
|
- API keys should be managed via environment variables
|
|
- Database credentials should use secrets management in production
|