# fetch_ml Configuration Guide ## Quick Start ### Docker Compose (Recommended) ```bash # Development with 2 workers cd deployments CONFIG_DIR=../configs docker-compose -f docker-compose.dev.yml up -d # Scale to 4 workers docker-compose -f docker-compose.dev.yml up -d --scale worker=4 # Production with scheduler CONFIG_DIR=../configs docker-compose -f docker-compose.prod.yml up -d ``` ### Key Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `CONFIG_DIR` | Path to config directory | `../configs` | | `DATA_DIR` | Path to data directory | `./data/` | | `LOG_LEVEL` | Logging level | `info` | | `REDIS_URL` | Redis connection URL | `redis://redis:6379` | ## Architecture Overview ``` ┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐ │ API Server │────▶│ Redis │◀────│ Scheduler │ │ (with builtin │ │ Queue │ │ (in api-server)│ │ scheduler) │ └─────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │ ┌────────┴────────┐ │ │ ▼ ▼ │ │ ┌─────────┐ ┌─────────┐ │ └────▶│ Worker 1│ │ Worker 2│ │ │ (Podman)│ │ (Podman)│ │ └─────────┘ └─────────┘ │ │ │ │ └──────────────────┴──────────┘ Heartbeats ``` The scheduler is built into the API server and manages multiple workers dynamically. ## Configuration Structure ``` configs/ ├── api/ │ ├── dev.yaml # Development API config │ ├── multi-user.yaml # Production multi-worker │ └── homelab-secure.yaml # Homelab secure config ├── worker/ │ ├── docker-dev.yaml # Development worker │ ├── docker-prod.yaml # Production worker │ ├── docker-staging.yaml # Staging worker │ ├── docker-standard.yaml # Standard compliance │ └── homelab-secure.yaml # Homelab secure worker └── schema/ └── *.yaml # Validation schemas ``` ## Scheduler Configuration The scheduler is configured in the API server config: ```yaml # configs/api/multi-user.yaml resources: max_workers: 4 # Max concurrent workers desired_rps_per_worker: 3 # Target requests/sec per worker scheduler: enabled: true strategy: "round-robin" # round-robin, least-loaded, priority max_concurrent_jobs: 16 # Max jobs across all workers queue: type: "redis" redis_addr: "redis:6379" worker_discovery: mode: "dynamic" # dynamic or static heartbeat_timeout: "30s" health_check_interval: "10s" ``` ### Scheduling Strategies | Strategy | Description | Use Case | |----------|-------------|----------| | `round-robin` | Distribute evenly across workers | Balanced load | | `least-loaded` | Send to worker with fewest jobs | Variable job sizes | | `priority` | Respect job priorities first | Mixed priority workloads | ## Worker Configuration Workers connect to the scheduler via Redis queue: ```yaml # configs/worker/docker-prod.yaml backend: type: "redis" redis: addr: "redis:6379" password: "" # Set via REDIS_PASSWORD env var db: 0 worker: id: "${FETCHML_WORKER_ID}" # Unique worker ID mode: "distributed" # Uses scheduler via Redis heartbeat_interval: "10s" max_concurrent_jobs: 4 # Jobs this worker can run sandbox: type: "podman" podman: socket: "/run/podman/podman.sock" cpus: "2" memory: "4Gi" ``` ## Scaling Workers ### Docker Compose (Recommended) ```bash # Development - 2 workers by default docker-compose -f deployments/docker-compose.dev.yml up -d # Scale to 4 workers docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=4 # Scale down to 1 worker docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=1 ``` ### Kubernetes / Manual Deployment ```bash # Each worker needs unique ID export FETCHML_WORKER_ID="worker-$(hostname)-$(date +%s)" ./worker -config configs/worker/docker-prod.yaml ``` ## Environment-Specific Setups ### Development (docker-compose.dev.yml) - 2 workers by default - Redis for queue - Local MinIO for storage - Caddy reverse proxy ```bash make dev-up # Start with 2 workers make dev-up SCALE=4 # Start with 4 workers ``` ### Production (docker-compose.prod.yml) - 4 workers configured - Redis cluster recommended - External MinIO/S3 - Health checks enabled ```bash CONFIG_DIR=./configs DATA_DIR=/var/lib/fetchml \ docker-compose -f deployments/docker-compose.prod.yml up -d ``` ### Staging (docker-compose.staging.yml) - 2 workers - Audit logging enabled - Same as prod but smaller scale ## Monitoring ### Check Worker Status ```bash # Via API curl http://localhost:9101/api/v1/workers # Via Redis redis-cli LRANGE fetchml:workers 0 -1 redis-cli HGETALL fetchml:worker:status ``` ### View Logs ```bash # All workers docker-compose -f deployments/docker-compose.dev.yml logs worker # Specific worker (by container name) docker logs ml-experiments-worker-1 docker logs ml-experiments-worker-2 ``` ## Troubleshooting ### Workers Not Registering 1. Check Redis connection: `redis-cli ping` 2. Verify worker config has `mode: distributed` 3. Check API server scheduler is enabled 4. Review worker logs: `docker logs ` ### Jobs Stuck in Queue 1. Check worker capacity: `max_concurrent_jobs` not exceeded 2. Verify workers are healthy: `docker ps` 3. Check Redis queue length: `redis-cli LLEN fetchml:queue:pending` ### Worker ID Collisions Ensure `FETCHML_WORKER_ID` is unique per worker instance: ```yaml environment: - FETCHML_WORKER_ID=${HOSTNAME}-${COMPOSE_PROJECT_NAME}-${RANDOM} ``` ## Security Notes - Workers run in privileged mode for Podman containers - Redis should be firewalled (not exposed publicly in prod) - Worker-to-scheduler communication is via Redis only - No direct API-to-worker connections required ## See Also - `deployments/README.md` - Deployment environments - `docs/src/deployment.md` - Full deployment guide - `docs/src/cicd.md` - CI/CD workflows