Restructure configuration files for better organization: - Add scheduler configuration examples (scheduler.yaml.example) - Reorganize worker configs into subdirectories: - distributed/ - Multi-node cluster configurations - standalone/ - Single-node deployment configs - Add environment-specific configs: - dev-local.yaml, docker-dev.yaml, docker-prod.yaml - homelab-secure.yaml, worker-prod.toml - Add deployment configs for different security modes: - docker-standard.yaml, docker-hipaa.yaml, docker-dev.yaml Add documentation: - configs/README.md with configuration guidelines - configs/SECURITY.md with security configuration best practices
130 lines
3.1 KiB
Markdown
130 lines
3.1 KiB
Markdown
# Security Guidelines for fetch_ml Distributed Mode
|
|
|
|
## Token Management
|
|
|
|
### Quick Start (Recommended)
|
|
|
|
```bash
|
|
# 1. Generate config with tokens
|
|
scheduler -init -config scheduler.yaml
|
|
|
|
# 2. Or generate a single token
|
|
scheduler -generate-token
|
|
```
|
|
|
|
### Generating Tokens
|
|
|
|
**Option 1: Initialize full config (recommended)**
|
|
```bash
|
|
# Generate config with 3 worker tokens
|
|
scheduler -init -config /etc/fetch_ml/scheduler.yaml
|
|
|
|
# Generate with more tokens
|
|
scheduler -init -config /etc/fetch_ml/scheduler.yaml -tokens 5
|
|
```
|
|
|
|
**Option 2: Generate single token**
|
|
```bash
|
|
# Generate one token
|
|
scheduler -generate-token
|
|
# Output: wkr_abc123...
|
|
```
|
|
|
|
**Option 3: Using OpenSSL**
|
|
```bash
|
|
openssl rand -hex 32
|
|
```
|
|
|
|
### Token Storage
|
|
|
|
- **NEVER commit tokens to git** — config files with real tokens are gitignored
|
|
- Store tokens in environment variables or secure secret management
|
|
- Use `.env` files locally (already gitignored)
|
|
- Rotate tokens periodically
|
|
|
|
### Config File Security
|
|
|
|
```
|
|
configs/
|
|
├── scheduler/scheduler.yaml # ⛔ NEVER commit with real tokens
|
|
├── scheduler/scheduler.yaml.example # ✅ Safe to commit (placeholders)
|
|
└── worker/distributed/worker.yaml # ⛔ NEVER commit with real tokens
|
|
```
|
|
|
|
All `*.yaml` files in `configs/` subdirectories are gitignored by default.
|
|
|
|
### Distribution Workflow
|
|
|
|
```bash
|
|
# On scheduler host:
|
|
$ scheduler -init -config /etc/fetch_ml/scheduler.yaml
|
|
Config generated: /etc/fetch_ml/scheduler.yaml
|
|
|
|
Generated 3 worker tokens. Copy the appropriate token to each worker's config.
|
|
|
|
=== Generated Worker Tokens ===
|
|
Copy these to your worker configs:
|
|
|
|
Worker: worker-01
|
|
Token: wkr_abc123...
|
|
|
|
Worker: worker-02
|
|
Token: wkr_def456...
|
|
|
|
# On each worker host - copy the appropriate token:
|
|
$ cat > /etc/fetch_ml/worker.yaml <<EOF
|
|
scheduler:
|
|
address: "scheduler-host:7777"
|
|
cert: "/etc/fetch_ml/scheduler.crt"
|
|
token: "wkr_abc123..." # Copy from above
|
|
EOF
|
|
```
|
|
|
|
## TLS Configuration
|
|
|
|
### Self-Signed Certs (Development)
|
|
|
|
```yaml
|
|
scheduler:
|
|
auto_generate_certs: true
|
|
cert_file: "/etc/fetch_ml/scheduler.crt"
|
|
key_file: "/etc/fetch_ml/scheduler.key"
|
|
```
|
|
|
|
Auto-generated certs are for development only. The scheduler prints the cert path on first run — distribute this to workers securely.
|
|
|
|
### Production TLS
|
|
|
|
Use proper certificates from your CA:
|
|
|
|
```yaml
|
|
scheduler:
|
|
auto_generate_certs: false
|
|
cert_file: "/etc/ssl/certs/fetch_ml.crt"
|
|
key_file: "/etc/ssl/private/fetch_ml.key"
|
|
```
|
|
|
|
## Network Security
|
|
|
|
- Scheduler bind address defaults to `0.0.0.0:7777` — firewall appropriately
|
|
- WebSocket connections use WSS with cert pinning (no CA chain required)
|
|
- Token authentication on every WebSocket connection
|
|
- Metrics endpoint (`/metrics`) has no auth — bind to localhost or add proxy auth
|
|
|
|
## Audit Logging
|
|
|
|
Enable audit logging to track job lifecycle:
|
|
|
|
```yaml
|
|
scheduler:
|
|
audit_log: "/var/log/fetch_ml/audit.log"
|
|
```
|
|
|
|
## Security Checklist
|
|
|
|
- [ ] Tokens generated via `scheduler -init` or `scheduler -generate-token`
|
|
- [ ] Config files with tokens NOT in git
|
|
- [ ] TLS certs distributed securely to workers
|
|
- [ ] Scheduler bind address firewalled
|
|
- [ ] Metrics endpoint protected (if exposed)
|
|
- [ ] Audit logging enabled
|