|
Some checks failed
Build Pipeline / Build Binaries (push) Failing after 3m39s
Build Pipeline / Build Docker Images (push) Has been skipped
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 6s
CI Pipeline / Test (ubuntu-latest on self-hosted) (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Contract Tests / Spec Drift Detection (push) Failing after 11s
Contract Tests / API Contract Tests (push) Has been skipped
Deploy API Docs / Build API Documentation (push) Failing after 5s
Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped
Documentation / build-and-publish (push) Failing after 40s
Test Matrix / test-native-vs-pure (cgo) (push) Failing after 14s
Test Matrix / test-native-vs-pure (native) (push) Failing after 35s
Test Matrix / test-native-vs-pure (pure) (push) Failing after 18s
CI Pipeline / Trigger Build Workflow (push) Failing after 1s
Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (arm64) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (x86_64) (push) Has been cancelled
Security Scan / Security Analysis (push) Has been cancelled
Security Scan / Native Library Security (push) Has been cancelled
Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled
Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled
Verification & Maintenance / Verification Summary (push) Has been cancelled
- Introduce audit, plugin, and scheduler API handlers - Add spec_embed.go for OpenAPI spec embedding - Create modular build scripts (cli, go, native, cross-platform) - Add deployment cleanup and health-check utilities - New ADRs: hot reload, audit store, SSE updates, RBAC, caching, offline mode, KMS regions, tenant offboarding - Add KMS configuration schema and worker variants - Include KMS benchmark tests |
||
|---|---|---|
| .. | ||
| api | ||
| examples | ||
| schema | ||
| seccomp | ||
| worker | ||
| README.md | ||
| SECURITY.md | ||
fetch_ml Configuration Guide
Quick Start
Docker Compose (Recommended)
# Development with 2 workers
cd deployments
CONFIG_DIR=../configs docker-compose -f docker-compose.dev.yml up -d
# Scale to 4 workers
docker-compose -f docker-compose.dev.yml up -d --scale worker=4
# Production with scheduler
CONFIG_DIR=../configs docker-compose -f docker-compose.prod.yml up -d
Key Environment Variables
| Variable | Description | Default |
|---|---|---|
CONFIG_DIR |
Path to config directory | ../configs |
DATA_DIR |
Path to data directory | ./data/<env> |
LOG_LEVEL |
Logging level | info |
REDIS_URL |
Redis connection URL | redis://redis:6379 |
Architecture Overview
┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐
│ API Server │────▶│ Redis │◀────│ Scheduler │
│ (with builtin │ │ Queue │ │ (in api-server)│
│ scheduler) │ └─────────────┘ └─────────────────┘
└─────────────────┘ │ │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
└────▶│ Worker 1│ │ Worker 2│ │
│ (Podman)│ │ (Podman)│ │
└─────────┘ └─────────┘ │
│ │ │
└──────────────────┴──────────┘
Heartbeats
The scheduler is built into the API server and manages multiple workers dynamically.
Configuration Structure
configs/
├── api/
│ ├── dev.yaml # Development API config
│ ├── multi-user.yaml # Production multi-worker
│ └── homelab-secure.yaml # Homelab secure config
├── worker/
│ ├── docker-dev.yaml # Development worker
│ ├── docker-prod.yaml # Production worker
│ ├── docker-staging.yaml # Staging worker
│ ├── docker-standard.yaml # Standard compliance
│ └── homelab-secure.yaml # Homelab secure worker
└── schema/
└── *.yaml # Validation schemas
Scheduler Configuration
The scheduler is configured in the API server config:
# configs/api/multi-user.yaml
resources:
max_workers: 4 # Max concurrent workers
desired_rps_per_worker: 3 # Target requests/sec per worker
scheduler:
enabled: true
strategy: "round-robin" # round-robin, least-loaded, priority
max_concurrent_jobs: 16 # Max jobs across all workers
queue:
type: "redis"
redis_addr: "redis:6379"
worker_discovery:
mode: "dynamic" # dynamic or static
heartbeat_timeout: "30s"
health_check_interval: "10s"
Scheduling Strategies
| Strategy | Description | Use Case |
|---|---|---|
round-robin |
Distribute evenly across workers | Balanced load |
least-loaded |
Send to worker with fewest jobs | Variable job sizes |
priority |
Respect job priorities first | Mixed priority workloads |
Worker Configuration
Workers connect to the scheduler via Redis queue:
# configs/worker/docker-prod.yaml
backend:
type: "redis"
redis:
addr: "redis:6379"
password: "" # Set via REDIS_PASSWORD env var
db: 0
worker:
id: "${FETCHML_WORKER_ID}" # Unique worker ID
mode: "distributed" # Uses scheduler via Redis
heartbeat_interval: "10s"
max_concurrent_jobs: 4 # Jobs this worker can run
sandbox:
type: "podman"
podman:
socket: "/run/podman/podman.sock"
cpus: "2"
memory: "4Gi"
Scaling Workers
Docker Compose (Recommended)
# Development - 2 workers by default
docker-compose -f deployments/docker-compose.dev.yml up -d
# Scale to 4 workers
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=4
# Scale down to 1 worker
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=1
Kubernetes / Manual Deployment
# Each worker needs unique ID
export FETCHML_WORKER_ID="worker-$(hostname)-$(date +%s)"
./worker -config configs/worker/docker-prod.yaml
Environment-Specific Setups
Development (docker-compose.dev.yml)
- 2 workers by default
- Redis for queue
- Local MinIO for storage
- Caddy reverse proxy
make dev-up # Start with 2 workers
make dev-up SCALE=4 # Start with 4 workers
Production (docker-compose.prod.yml)
- 4 workers configured
- Redis cluster recommended
- External MinIO/S3
- Health checks enabled
CONFIG_DIR=./configs DATA_DIR=/var/lib/fetchml \
docker-compose -f deployments/docker-compose.prod.yml up -d
Staging (docker-compose.staging.yml)
- 2 workers
- Audit logging enabled
- Same as prod but smaller scale
Monitoring
Check Worker Status
# Via API
curl http://localhost:9101/api/v1/workers
# Via Redis
redis-cli LRANGE fetchml:workers 0 -1
redis-cli HGETALL fetchml:worker:status
View Logs
# All workers
docker-compose -f deployments/docker-compose.dev.yml logs worker
# Specific worker (by container name)
docker logs ml-experiments-worker-1
docker logs ml-experiments-worker-2
Troubleshooting
Workers Not Registering
- Check Redis connection:
redis-cli ping - Verify worker config has
mode: distributed - Check API server scheduler is enabled
- Review worker logs:
docker logs <worker-container>
Jobs Stuck in Queue
- Check worker capacity:
max_concurrent_jobsnot exceeded - Verify workers are healthy:
docker ps - Check Redis queue length:
redis-cli LLEN fetchml:queue:pending
Worker ID Collisions
Ensure FETCHML_WORKER_ID is unique per worker instance:
environment:
- FETCHML_WORKER_ID=${HOSTNAME}-${COMPOSE_PROJECT_NAME}-${RANDOM}
Security Notes
- Workers run in privileged mode for Podman containers
- Redis should be firewalled (not exposed publicly in prod)
- Worker-to-scheduler communication is via Redis only
- No direct API-to-worker connections required
See Also
deployments/README.md- Deployment environmentsdocs/src/deployment.md- Full deployment guidedocs/src/cicd.md- CI/CD workflows