Restructure configuration files for better organization: - Add scheduler configuration examples (scheduler.yaml.example) - Reorganize worker configs into subdirectories: - distributed/ - Multi-node cluster configurations - standalone/ - Single-node deployment configs - Add environment-specific configs: - dev-local.yaml, docker-dev.yaml, docker-prod.yaml - homelab-secure.yaml, worker-prod.toml - Add deployment configs for different security modes: - docker-standard.yaml, docker-hipaa.yaml, docker-dev.yaml Add documentation: - configs/README.md with configuration guidelines - configs/SECURITY.md with security configuration best practices
3.1 KiB
3.1 KiB
Security Guidelines for fetch_ml Distributed Mode
Token Management
Quick Start (Recommended)
# 1. Generate config with tokens
scheduler -init -config scheduler.yaml
# 2. Or generate a single token
scheduler -generate-token
Generating Tokens
Option 1: Initialize full config (recommended)
# Generate config with 3 worker tokens
scheduler -init -config /etc/fetch_ml/scheduler.yaml
# Generate with more tokens
scheduler -init -config /etc/fetch_ml/scheduler.yaml -tokens 5
Option 2: Generate single token
# Generate one token
scheduler -generate-token
# Output: wkr_abc123...
Option 3: Using OpenSSL
openssl rand -hex 32
Token Storage
- NEVER commit tokens to git — config files with real tokens are gitignored
- Store tokens in environment variables or secure secret management
- Use
.envfiles locally (already gitignored) - Rotate tokens periodically
Config File Security
configs/
├── scheduler/scheduler.yaml # ⛔ NEVER commit with real tokens
├── scheduler/scheduler.yaml.example # ✅ Safe to commit (placeholders)
└── worker/distributed/worker.yaml # ⛔ NEVER commit with real tokens
All *.yaml files in configs/ subdirectories are gitignored by default.
Distribution Workflow
# On scheduler host:
$ scheduler -init -config /etc/fetch_ml/scheduler.yaml
Config generated: /etc/fetch_ml/scheduler.yaml
Generated 3 worker tokens. Copy the appropriate token to each worker's config.
=== Generated Worker Tokens ===
Copy these to your worker configs:
Worker: worker-01
Token: wkr_abc123...
Worker: worker-02
Token: wkr_def456...
# On each worker host - copy the appropriate token:
$ cat > /etc/fetch_ml/worker.yaml <<EOF
scheduler:
address: "scheduler-host:7777"
cert: "/etc/fetch_ml/scheduler.crt"
token: "wkr_abc123..." # Copy from above
EOF
TLS Configuration
Self-Signed Certs (Development)
scheduler:
auto_generate_certs: true
cert_file: "/etc/fetch_ml/scheduler.crt"
key_file: "/etc/fetch_ml/scheduler.key"
Auto-generated certs are for development only. The scheduler prints the cert path on first run — distribute this to workers securely.
Production TLS
Use proper certificates from your CA:
scheduler:
auto_generate_certs: false
cert_file: "/etc/ssl/certs/fetch_ml.crt"
key_file: "/etc/ssl/private/fetch_ml.key"
Network Security
- Scheduler bind address defaults to
0.0.0.0:7777— firewall appropriately - WebSocket connections use WSS with cert pinning (no CA chain required)
- Token authentication on every WebSocket connection
- Metrics endpoint (
/metrics) has no auth — bind to localhost or add proxy auth
Audit Logging
Enable audit logging to track job lifecycle:
scheduler:
audit_log: "/var/log/fetch_ml/audit.log"
Security Checklist
- Tokens generated via
scheduler -initorscheduler -generate-token - Config files with tokens NOT in git
- TLS certs distributed securely to workers
- Scheduler bind address firewalled
- Metrics endpoint protected (if exposed)
- Audit logging enabled