fetch_ml/configs/SECURITY.md
Jeremie Fraeys 86f9ae5a7e
docs(config): reorganize configuration structure and add documentation
Restructure configuration files for better organization:
- Add scheduler configuration examples (scheduler.yaml.example)
- Reorganize worker configs into subdirectories:
  - distributed/ - Multi-node cluster configurations
  - standalone/ - Single-node deployment configs
- Add environment-specific configs:
  - dev-local.yaml, docker-dev.yaml, docker-prod.yaml
  - homelab-secure.yaml, worker-prod.toml
- Add deployment configs for different security modes:
  - docker-standard.yaml, docker-hipaa.yaml, docker-dev.yaml

Add documentation:
- configs/README.md with configuration guidelines
- configs/SECURITY.md with security configuration best practices
2026-02-26 12:04:11 -05:00

3.1 KiB

Security Guidelines for fetch_ml Distributed Mode

Token Management

# 1. Generate config with tokens
scheduler -init -config scheduler.yaml

# 2. Or generate a single token
scheduler -generate-token

Generating Tokens

Option 1: Initialize full config (recommended)

# Generate config with 3 worker tokens
scheduler -init -config /etc/fetch_ml/scheduler.yaml

# Generate with more tokens
scheduler -init -config /etc/fetch_ml/scheduler.yaml -tokens 5

Option 2: Generate single token

# Generate one token
scheduler -generate-token
# Output: wkr_abc123...

Option 3: Using OpenSSL

openssl rand -hex 32

Token Storage

  • NEVER commit tokens to git — config files with real tokens are gitignored
  • Store tokens in environment variables or secure secret management
  • Use .env files locally (already gitignored)
  • Rotate tokens periodically

Config File Security

configs/
├── scheduler/scheduler.yaml          # ⛔ NEVER commit with real tokens
├── scheduler/scheduler.yaml.example  # ✅ Safe to commit (placeholders)
└── worker/distributed/worker.yaml    # ⛔ NEVER commit with real tokens

All *.yaml files in configs/ subdirectories are gitignored by default.

Distribution Workflow

# On scheduler host:
$ scheduler -init -config /etc/fetch_ml/scheduler.yaml
Config generated: /etc/fetch_ml/scheduler.yaml

Generated 3 worker tokens. Copy the appropriate token to each worker's config.

=== Generated Worker Tokens ===
Copy these to your worker configs:

Worker: worker-01
Token:  wkr_abc123...

Worker: worker-02
Token:  wkr_def456...

# On each worker host - copy the appropriate token:
$ cat > /etc/fetch_ml/worker.yaml <<EOF
scheduler:
  address: "scheduler-host:7777"
  cert: "/etc/fetch_ml/scheduler.crt"
  token: "wkr_abc123..."  # Copy from above
EOF

TLS Configuration

Self-Signed Certs (Development)

scheduler:
  auto_generate_certs: true
  cert_file: "/etc/fetch_ml/scheduler.crt"
  key_file: "/etc/fetch_ml/scheduler.key"

Auto-generated certs are for development only. The scheduler prints the cert path on first run — distribute this to workers securely.

Production TLS

Use proper certificates from your CA:

scheduler:
  auto_generate_certs: false
  cert_file: "/etc/ssl/certs/fetch_ml.crt"
  key_file: "/etc/ssl/private/fetch_ml.key"

Network Security

  • Scheduler bind address defaults to 0.0.0.0:7777 — firewall appropriately
  • WebSocket connections use WSS with cert pinning (no CA chain required)
  • Token authentication on every WebSocket connection
  • Metrics endpoint (/metrics) has no auth — bind to localhost or add proxy auth

Audit Logging

Enable audit logging to track job lifecycle:

scheduler:
  audit_log: "/var/log/fetch_ml/audit.log"

Security Checklist

  • Tokens generated via scheduler -init or scheduler -generate-token
  • Config files with tokens NOT in git
  • TLS certs distributed securely to workers
  • Scheduler bind address firewalled
  • Metrics endpoint protected (if exposed)
  • Audit logging enabled