fetch_ml/configs
Jeremie Fraeys 7cd86fb88a
Some checks failed
Build Pipeline / Build Binaries (push) Failing after 3m39s
Build Pipeline / Build Docker Images (push) Has been skipped
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 6s
CI Pipeline / Test (ubuntu-latest on self-hosted) (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Contract Tests / Spec Drift Detection (push) Failing after 11s
Contract Tests / API Contract Tests (push) Has been skipped
Deploy API Docs / Build API Documentation (push) Failing after 5s
Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped
Documentation / build-and-publish (push) Failing after 40s
Test Matrix / test-native-vs-pure (cgo) (push) Failing after 14s
Test Matrix / test-native-vs-pure (native) (push) Failing after 35s
Test Matrix / test-native-vs-pure (pure) (push) Failing after 18s
CI Pipeline / Trigger Build Workflow (push) Failing after 1s
Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (arm64) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (x86_64) (push) Has been cancelled
Security Scan / Security Analysis (push) Has been cancelled
Security Scan / Native Library Security (push) Has been cancelled
Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled
Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled
Verification & Maintenance / Verification Summary (push) Has been cancelled
feat: add new API handlers, build scripts, and ADRs
- Introduce audit, plugin, and scheduler API handlers
- Add spec_embed.go for OpenAPI spec embedding
- Create modular build scripts (cli, go, native, cross-platform)
- Add deployment cleanup and health-check utilities
- New ADRs: hot reload, audit store, SSE updates, RBAC, caching, offline mode, KMS regions, tenant offboarding
- Add KMS configuration schema and worker variants
- Include KMS benchmark tests
2026-03-04 13:24:27 -05:00
..
api config: consolidate and cleanup configuration files 2026-03-04 13:22:52 -05:00
examples config: consolidate and cleanup configuration files 2026-03-04 13:22:52 -05:00
schema feat: add new API handlers, build scripts, and ADRs 2026-03-04 13:24:27 -05:00
seccomp feat(security): implement comprehensive security hardening phases 1-5,7 2026-02-23 18:00:33 -05:00
worker feat: add new API handlers, build scripts, and ADRs 2026-03-04 13:24:27 -05:00
README.md config: consolidate and cleanup configuration files 2026-03-04 13:22:52 -05:00
SECURITY.md docs(config): reorganize configuration structure and add documentation 2026-02-26 12:04:11 -05:00

fetch_ml Configuration Guide

Quick Start

# Development with 2 workers
cd deployments
CONFIG_DIR=../configs docker-compose -f docker-compose.dev.yml up -d

# Scale to 4 workers
docker-compose -f docker-compose.dev.yml up -d --scale worker=4

# Production with scheduler
CONFIG_DIR=../configs docker-compose -f docker-compose.prod.yml up -d

Key Environment Variables

Variable Description Default
CONFIG_DIR Path to config directory ../configs
DATA_DIR Path to data directory ./data/<env>
LOG_LEVEL Logging level info
REDIS_URL Redis connection URL redis://redis:6379

Architecture Overview

┌─────────────────┐     ┌─────────────┐     ┌─────────────────┐
│   API Server    │────▶│   Redis     │◀────│   Scheduler     │
│  (with builtin  │     │   Queue     │     │  (in api-server)│
│   scheduler)    │     └─────────────┘     └─────────────────┘
└─────────────────┘            │                    │
         │                     │                    │
         │            ┌────────┴────────┐          │
         │            ▼                 ▼          │
         │     ┌─────────┐        ┌─────────┐      │
         └────▶│ Worker 1│        │ Worker 2│      │
               │ (Podman)│        │ (Podman)│      │
               └─────────┘        └─────────┘      │
                     │                  │          │
                     └──────────────────┴──────────┘
                              Heartbeats

The scheduler is built into the API server and manages multiple workers dynamically.

Configuration Structure

configs/
├── api/
│   ├── dev.yaml              # Development API config
│   ├── multi-user.yaml       # Production multi-worker
│   └── homelab-secure.yaml   # Homelab secure config
├── worker/
│   ├── docker-dev.yaml       # Development worker
│   ├── docker-prod.yaml      # Production worker
│   ├── docker-staging.yaml   # Staging worker
│   ├── docker-standard.yaml  # Standard compliance
│   └── homelab-secure.yaml   # Homelab secure worker
└── schema/
    └── *.yaml                # Validation schemas

Scheduler Configuration

The scheduler is configured in the API server config:

# configs/api/multi-user.yaml
resources:
  max_workers: 4              # Max concurrent workers
  desired_rps_per_worker: 3   # Target requests/sec per worker

scheduler:
  enabled: true
  strategy: "round-robin"     # round-robin, least-loaded, priority
  max_concurrent_jobs: 16     # Max jobs across all workers
  queue:
    type: "redis"
    redis_addr: "redis:6379"
  worker_discovery:
    mode: "dynamic"           # dynamic or static
    heartbeat_timeout: "30s"
    health_check_interval: "10s"

Scheduling Strategies

Strategy Description Use Case
round-robin Distribute evenly across workers Balanced load
least-loaded Send to worker with fewest jobs Variable job sizes
priority Respect job priorities first Mixed priority workloads

Worker Configuration

Workers connect to the scheduler via Redis queue:

# configs/worker/docker-prod.yaml
backend:
  type: "redis"
  redis:
    addr: "redis:6379"
    password: ""           # Set via REDIS_PASSWORD env var
    db: 0

worker:
  id: "${FETCHML_WORKER_ID}"  # Unique worker ID
  mode: "distributed"         # Uses scheduler via Redis
  heartbeat_interval: "10s"
  max_concurrent_jobs: 4      # Jobs this worker can run

sandbox:
  type: "podman"
  podman:
    socket: "/run/podman/podman.sock"
    cpus: "2"
    memory: "4Gi"

Scaling Workers

# Development - 2 workers by default
docker-compose -f deployments/docker-compose.dev.yml up -d

# Scale to 4 workers
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=4

# Scale down to 1 worker
docker-compose -f deployments/docker-compose.dev.yml up -d --scale worker=1

Kubernetes / Manual Deployment

# Each worker needs unique ID
export FETCHML_WORKER_ID="worker-$(hostname)-$(date +%s)"
./worker -config configs/worker/docker-prod.yaml

Environment-Specific Setups

Development (docker-compose.dev.yml)

  • 2 workers by default
  • Redis for queue
  • Local MinIO for storage
  • Caddy reverse proxy
make dev-up          # Start with 2 workers
make dev-up SCALE=4  # Start with 4 workers

Production (docker-compose.prod.yml)

  • 4 workers configured
  • Redis cluster recommended
  • External MinIO/S3
  • Health checks enabled
CONFIG_DIR=./configs DATA_DIR=/var/lib/fetchml \
  docker-compose -f deployments/docker-compose.prod.yml up -d

Staging (docker-compose.staging.yml)

  • 2 workers
  • Audit logging enabled
  • Same as prod but smaller scale

Monitoring

Check Worker Status

# Via API
curl http://localhost:9101/api/v1/workers

# Via Redis
redis-cli LRANGE fetchml:workers 0 -1
redis-cli HGETALL fetchml:worker:status

View Logs

# All workers
docker-compose -f deployments/docker-compose.dev.yml logs worker

# Specific worker (by container name)
docker logs ml-experiments-worker-1
docker logs ml-experiments-worker-2

Troubleshooting

Workers Not Registering

  1. Check Redis connection: redis-cli ping
  2. Verify worker config has mode: distributed
  3. Check API server scheduler is enabled
  4. Review worker logs: docker logs <worker-container>

Jobs Stuck in Queue

  1. Check worker capacity: max_concurrent_jobs not exceeded
  2. Verify workers are healthy: docker ps
  3. Check Redis queue length: redis-cli LLEN fetchml:queue:pending

Worker ID Collisions

Ensure FETCHML_WORKER_ID is unique per worker instance:

environment:
  - FETCHML_WORKER_ID=${HOSTNAME}-${COMPOSE_PROJECT_NAME}-${RANDOM}

Security Notes

  • Workers run in privileged mode for Podman containers
  • Redis should be firewalled (not exposed publicly in prod)
  • Worker-to-scheduler communication is via Redis only
  • No direct API-to-worker connections required

See Also

  • deployments/README.md - Deployment environments
  • docs/src/deployment.md - Full deployment guide
  • docs/src/cicd.md - CI/CD workflows