fetch_ml/configs/README.md
Jeremie Fraeys 8a7e7695f4
config: consolidate and cleanup configuration files
- Remove redundant config examples (distributed/, standalone/, examples/)
- Delete dev-local.yaml variants (use dev.yaml with env vars)
- Delete prod.yaml (use multi-user.yaml or homelab-secure.yaml)
- Clean up worker configs: remove docker.yaml, homelab-sandbox.yaml
- Update remaining configs with current best practices
- Simplify config schema and documentation
2026-03-04 13:22:52 -05:00

239 lines
6.7 KiB
Markdown

# fetch_ml Configuration Guide
## Quick Start
### Docker Compose (Recommended)
```bash
# Development with 2 workers
cd deployments
CONFIG_DIR=../configs docker-compose -f docker-compose.dev.yml up -d
# Scale to 4 workers
docker-compose -f docker-compose.dev.yml up -d --scale worker=4
# Production with scheduler
CONFIG_DIR=../configs docker-compose -f docker-compose.prod.yml up -d
```
### Key Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `CONFIG_DIR` | Path to config directory | `../configs` |
| `DATA_DIR` | Path to data directory | `./data/<env>` |
| `LOG_LEVEL` | Logging level | `info` |
| `REDIS_URL` | Redis connection URL | `redis://redis:6379` |
## Architecture Overview
```
┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐
│ API Server │────▶│ Redis │◀────│ Scheduler │
│ (with builtin │ │ Queue │ │ (in api-server)│
│ scheduler) │ └─────────────┘ └─────────────────┘
└─────────────────┘ │ │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
└────▶│ Worker 1│ │ Worker 2│ │
│ (Podman)│ │ (Podman)│ │
└─────────┘ └─────────┘ │
│ │ │
└──────────────────┴──────────┘
Heartbeats
```
The scheduler is built into the API server and manages multiple workers dynamically.
## Configuration Structure
```
configs/
├── api/
│ ├── dev.yaml # Development API config
│ ├── multi-user.yaml # Production multi-worker
│ └── homelab-secure.yaml # Homelab secure config
├── worker/
│ ├── docker-dev.yaml # Development worker
│ ├── docker-prod.yaml # Production worker
│ ├── docker-staging.yaml # Staging worker
│ ├── docker-standard.yaml # Standard compliance
│ └── homelab-secure.yaml # Homelab secure worker
└── schema/
└── *.yaml # Validation schemas
```
## Scheduler Configuration
The scheduler is configured in the API server config:
```yaml
# configs/api/multi-user.yaml
resources:
max_workers: 4 # Max concurrent workers
desired_rps_per_worker: 3 # Target requests/sec per worker
scheduler:
enabled: true
strategy: "round-robin" # round-robin, least-loaded, priority
max_concurrent_jobs: 16 # Max jobs across all workers
queue:
type: "redis"
redis_addr: "redis:6379"
worker_discovery:
mode: "dynamic" # dynamic or static
heartbeat_timeout: "30s"
health_check_interval: "10s"
```
### Scheduling Strategies
| Strategy | Description | Use Case |
|----------|-------------|----------|
| `round-robin` | Distribute evenly across workers | Balanced load |
| `least-loaded` | Send to worker with fewest jobs | Variable job sizes |
| `priority` | Respect job priorities first | Mixed priority workloads |
## Worker Configuration
Workers connect to the scheduler via Redis queue:
```yaml
# configs/worker/docker-prod.yaml
backend:
type: "redis"
redis:
addr: "redis:6379"
password: "" # Set via REDIS_PASSWORD env var
db: 0
worker:
id: "${FETCHML_WORKER_ID}" # Unique worker ID
mode: "distributed" # Uses scheduler via Redis
heartbeat_interval: "10s"
max_concurrent_jobs: 4 # Jobs this worker can run
sandbox:
type: "podman"
podman:
socket: "/run/podman/podman.sock"
cpus: "2"
memory: "4Gi"
```
## Scaling Workers
### Docker Compose (Recommended)
```bash
# Development - 2 workers by default
docker-compose -f deployments/docker-compose.dev.yml up -d
# Scale to 4 workers
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=4
# Scale down to 1 worker
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=1
```
### Kubernetes / Manual Deployment
```bash
# Each worker needs unique ID
export FETCHML_WORKER_ID="worker-$(hostname)-$(date +%s)"
./worker -config configs/worker/docker-prod.yaml
```
## Environment-Specific Setups
### Development (docker-compose.dev.yml)
- 2 workers by default
- Redis for queue
- Local MinIO for storage
- Caddy reverse proxy
```bash
make dev-up # Start with 2 workers
make dev-up SCALE=4 # Start with 4 workers
```
### Production (docker-compose.prod.yml)
- 4 workers configured
- Redis cluster recommended
- External MinIO/S3
- Health checks enabled
```bash
CONFIG_DIR=./configs DATA_DIR=/var/lib/fetchml \
docker-compose -f deployments/docker-compose.prod.yml up -d
```
### Staging (docker-compose.staging.yml)
- 2 workers
- Audit logging enabled
- Same as prod but smaller scale
## Monitoring
### Check Worker Status
```bash
# Via API
curl http://localhost:9101/api/v1/workers
# Via Redis
redis-cli LRANGE fetchml:workers 0 -1
redis-cli HGETALL fetchml:worker:status
```
### View Logs
```bash
# All workers
docker-compose -f deployments/docker-compose.dev.yml logs worker
# Specific worker (by container name)
docker logs ml-experiments-worker-1
docker logs ml-experiments-worker-2
```
## Troubleshooting
### Workers Not Registering
1. Check Redis connection: `redis-cli ping`
2. Verify worker config has `mode: distributed`
3. Check API server scheduler is enabled
4. Review worker logs: `docker logs <worker-container>`
### Jobs Stuck in Queue
1. Check worker capacity: `max_concurrent_jobs` not exceeded
2. Verify workers are healthy: `docker ps`
3. Check Redis queue length: `redis-cli LLEN fetchml:queue:pending`
### Worker ID Collisions
Ensure `FETCHML_WORKER_ID` is unique per worker instance:
```yaml
environment:
- FETCHML_WORKER_ID=${HOSTNAME}-${COMPOSE_PROJECT_NAME}-${RANDOM}
```
## Security Notes
- Workers run in privileged mode for Podman containers
- Redis should be firewalled (not exposed publicly in prod)
- Worker-to-scheduler communication is via Redis only
- No direct API-to-worker connections required
## See Also
- `deployments/README.md` - Deployment environments
- `docs/src/deployment.md` - Full deployment guide
- `docs/src/cicd.md` - CI/CD workflows