- Remove redundant config examples (distributed/, standalone/, examples/) - Delete dev-local.yaml variants (use dev.yaml with env vars) - Delete prod.yaml (use multi-user.yaml or homelab-secure.yaml) - Clean up worker configs: remove docker.yaml, homelab-sandbox.yaml - Update remaining configs with current best practices - Simplify config schema and documentation |
||
|---|---|---|
| .. | ||
| api | ||
| examples | ||
| schema | ||
| seccomp | ||
| worker | ||
| README.md | ||
| SECURITY.md | ||
fetch_ml Configuration Guide
Quick Start
Docker Compose (Recommended)
# Development with 2 workers
cd deployments
CONFIG_DIR=../configs docker-compose -f docker-compose.dev.yml up -d
# Scale to 4 workers
docker-compose -f docker-compose.dev.yml up -d --scale worker=4
# Production with scheduler
CONFIG_DIR=../configs docker-compose -f docker-compose.prod.yml up -d
Key Environment Variables
| Variable | Description | Default |
|---|---|---|
CONFIG_DIR |
Path to config directory | ../configs |
DATA_DIR |
Path to data directory | ./data/<env> |
LOG_LEVEL |
Logging level | info |
REDIS_URL |
Redis connection URL | redis://redis:6379 |
Architecture Overview
┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐
│ API Server │────▶│ Redis │◀────│ Scheduler │
│ (with builtin │ │ Queue │ │ (in api-server)│
│ scheduler) │ └─────────────┘ └─────────────────┘
└─────────────────┘ │ │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
└────▶│ Worker 1│ │ Worker 2│ │
│ (Podman)│ │ (Podman)│ │
└─────────┘ └─────────┘ │
│ │ │
└──────────────────┴──────────┘
Heartbeats
The scheduler is built into the API server and manages multiple workers dynamically.
Configuration Structure
configs/
├── api/
│ ├── dev.yaml # Development API config
│ ├── multi-user.yaml # Production multi-worker
│ └── homelab-secure.yaml # Homelab secure config
├── worker/
│ ├── docker-dev.yaml # Development worker
│ ├── docker-prod.yaml # Production worker
│ ├── docker-staging.yaml # Staging worker
│ ├── docker-standard.yaml # Standard compliance
│ └── homelab-secure.yaml # Homelab secure worker
└── schema/
└── *.yaml # Validation schemas
Scheduler Configuration
The scheduler is configured in the API server config:
# configs/api/multi-user.yaml
resources:
max_workers: 4 # Max concurrent workers
desired_rps_per_worker: 3 # Target requests/sec per worker
scheduler:
enabled: true
strategy: "round-robin" # round-robin, least-loaded, priority
max_concurrent_jobs: 16 # Max jobs across all workers
queue:
type: "redis"
redis_addr: "redis:6379"
worker_discovery:
mode: "dynamic" # dynamic or static
heartbeat_timeout: "30s"
health_check_interval: "10s"
Scheduling Strategies
| Strategy | Description | Use Case |
|---|---|---|
round-robin |
Distribute evenly across workers | Balanced load |
least-loaded |
Send to worker with fewest jobs | Variable job sizes |
priority |
Respect job priorities first | Mixed priority workloads |
Worker Configuration
Workers connect to the scheduler via Redis queue:
# configs/worker/docker-prod.yaml
backend:
type: "redis"
redis:
addr: "redis:6379"
password: "" # Set via REDIS_PASSWORD env var
db: 0
worker:
id: "${FETCHML_WORKER_ID}" # Unique worker ID
mode: "distributed" # Uses scheduler via Redis
heartbeat_interval: "10s"
max_concurrent_jobs: 4 # Jobs this worker can run
sandbox:
type: "podman"
podman:
socket: "/run/podman/podman.sock"
cpus: "2"
memory: "4Gi"
Scaling Workers
Docker Compose (Recommended)
# Development - 2 workers by default
docker-compose -f deployments/docker-compose.dev.yml up -d
# Scale to 4 workers
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=4
# Scale down to 1 worker
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=1
Kubernetes / Manual Deployment
# Each worker needs unique ID
export FETCHML_WORKER_ID="worker-$(hostname)-$(date +%s)"
./worker -config configs/worker/docker-prod.yaml
Environment-Specific Setups
Development (docker-compose.dev.yml)
- 2 workers by default
- Redis for queue
- Local MinIO for storage
- Caddy reverse proxy
make dev-up # Start with 2 workers
make dev-up SCALE=4 # Start with 4 workers
Production (docker-compose.prod.yml)
- 4 workers configured
- Redis cluster recommended
- External MinIO/S3
- Health checks enabled
CONFIG_DIR=./configs DATA_DIR=/var/lib/fetchml \
docker-compose -f deployments/docker-compose.prod.yml up -d
Staging (docker-compose.staging.yml)
- 2 workers
- Audit logging enabled
- Same as prod but smaller scale
Monitoring
Check Worker Status
# Via API
curl http://localhost:9101/api/v1/workers
# Via Redis
redis-cli LRANGE fetchml:workers 0 -1
redis-cli HGETALL fetchml:worker:status
View Logs
# All workers
docker-compose -f deployments/docker-compose.dev.yml logs worker
# Specific worker (by container name)
docker logs ml-experiments-worker-1
docker logs ml-experiments-worker-2
Troubleshooting
Workers Not Registering
- Check Redis connection:
redis-cli ping - Verify worker config has
mode: distributed - Check API server scheduler is enabled
- Review worker logs:
docker logs <worker-container>
Jobs Stuck in Queue
- Check worker capacity:
max_concurrent_jobsnot exceeded - Verify workers are healthy:
docker ps - Check Redis queue length:
redis-cli LLEN fetchml:queue:pending
Worker ID Collisions
Ensure FETCHML_WORKER_ID is unique per worker instance:
environment:
- FETCHML_WORKER_ID=${HOSTNAME}-${COMPOSE_PROJECT_NAME}-${RANDOM}
Security Notes
- Workers run in privileged mode for Podman containers
- Redis should be firewalled (not exposed publicly in prod)
- Worker-to-scheduler communication is via Redis only
- No direct API-to-worker connections required
See Also
deployments/README.md- Deployment environmentsdocs/src/deployment.md- Full deployment guidedocs/src/cicd.md- CI/CD workflows