diff --git a/configs/README.md b/configs/README.md new file mode 100644 index 0000000..31e5042 --- /dev/null +++ b/configs/README.md @@ -0,0 +1,60 @@ +# fetch_ml Configuration Guide + +## Quick Start + +### Standalone Mode (Existing Behavior) +```bash +# Single worker, direct queue access +go run ./cmd/worker -config configs/worker/standalone/worker.yaml +``` + +### Distributed Mode +```bash +# Terminal 1: Start scheduler +go run ./cmd/scheduler -config configs/scheduler/scheduler.yaml + +# Terminal 2: Start worker +go run ./cmd/worker -config configs/worker/distributed/worker.yaml +``` + +### Single-Node Mode (Zero Config) +```bash +# Both scheduler and worker in one process +go run ./cmd/fetch_ml -config configs/multi-node/single-node.yaml +``` + +## Config Structure + +``` +configs/ +├── scheduler/ +│ └── scheduler.yaml # Central scheduler configuration +├── worker/ +│ ├── standalone/ +│ │ └── worker.yaml # Direct queue access (Redis/SQLite) +│ └── distributed/ +│ └── worker.yaml # WebSocket to scheduler +└── multi-node/ + └── single-node.yaml # Combined scheduler+worker +``` + +## Key Configuration Modes + +| Mode | Use Case | Backend | +|------|----------|---------| +| `standalone` | Single machine, existing behavior | Redis/SQLite/Filesystem | +| `distributed` | Multiple workers, central scheduler | WebSocket to scheduler | +| `both` | Quick testing, single process | In-process scheduler | + +## Worker Mode Selection + +Set `worker.mode` to switch between implementations: + +```yaml +worker: + mode: "standalone" # Uses Redis/SQLite queue.Backend + # OR + mode: "distributed" # Uses SchedulerBackend over WebSocket +``` + +The worker code is unchanged — only the backend implementation changes. diff --git a/configs/SECURITY.md b/configs/SECURITY.md new file mode 100644 index 0000000..7138ff1 --- /dev/null +++ b/configs/SECURITY.md @@ -0,0 +1,130 @@ +# Security Guidelines for fetch_ml Distributed Mode + +## Token Management + +### Quick Start (Recommended) + +```bash +# 1. Generate config with tokens +scheduler -init -config scheduler.yaml + +# 2. Or generate a single token +scheduler -generate-token +``` + +### Generating Tokens + +**Option 1: Initialize full config (recommended)** +```bash +# Generate config with 3 worker tokens +scheduler -init -config /etc/fetch_ml/scheduler.yaml + +# Generate with more tokens +scheduler -init -config /etc/fetch_ml/scheduler.yaml -tokens 5 +``` + +**Option 2: Generate single token** +```bash +# Generate one token +scheduler -generate-token +# Output: wkr_abc123... +``` + +**Option 3: Using OpenSSL** +```bash +openssl rand -hex 32 +``` + +### Token Storage + +- **NEVER commit tokens to git** — config files with real tokens are gitignored +- Store tokens in environment variables or secure secret management +- Use `.env` files locally (already gitignored) +- Rotate tokens periodically + +### Config File Security + +``` +configs/ +├── scheduler/scheduler.yaml # ⛔ NEVER commit with real tokens +├── scheduler/scheduler.yaml.example # ✅ Safe to commit (placeholders) +└── worker/distributed/worker.yaml # ⛔ NEVER commit with real tokens +``` + +All `*.yaml` files in `configs/` subdirectories are gitignored by default. + +### Distribution Workflow + +```bash +# On scheduler host: +$ scheduler -init -config /etc/fetch_ml/scheduler.yaml +Config generated: /etc/fetch_ml/scheduler.yaml + +Generated 3 worker tokens. Copy the appropriate token to each worker's config. + +=== Generated Worker Tokens === +Copy these to your worker configs: + +Worker: worker-01 +Token: wkr_abc123... + +Worker: worker-02 +Token: wkr_def456... + +# On each worker host - copy the appropriate token: +$ cat > /etc/fetch_ml/worker.yaml <