# Security Guide This document outlines security features, best practices, and hardening procedures for FetchML. ## Overview FetchML implements defense-in-depth security with multiple layers of protection: 1. **File Ingestion Security** - Path traversal prevention, file type validation 2. **Sandbox Hardening** - Container isolation with seccomp, capability dropping 3. **Secrets Management** - Environment-based credential injection with plaintext detection 4. **Audit Logging** - Tamper-evident logging for compliance (HIPAA) 5. **Authentication** - API key-based access control with RBAC --- ## Security Features ### Authentication & Authorization - **API Keys**: SHA256-hashed with role-based access control (RBAC) - **Permissions**: Granular read/write/delete permissions per user - **IP Whitelisting**: Network-level access control - **Rate Limiting**: Per-user request quotas ### Communication Security - **TLS/HTTPS**: End-to-end encryption for API traffic - **WebSocket Auth**: API key required before upgrade - **Redis Auth**: Password-protected task queue ### Data Privacy - **Log Sanitization**: Automatically redacts API keys, passwords, tokens - **Experiment Isolation**: User-specific experiment directories - **No Anonymous Access**: All services require authentication ### Network Security - **Internal Networks**: Backend services (Redis, Loki) not exposed publicly - **Firewall Rules**: Restrictive port access - **Container Isolation**: Services run in separate containers/pods --- ## Comprehensive Security Hardening (2026-02) ### File Ingestion Security All file operations are protected against path traversal attacks: ```go // All paths are validated with symlink resolution validator := fileutil.NewSecurePathValidator(basePath) cleanPath, err := validator.ValidatePath(userInput) if err != nil { return fmt.Errorf("path validation failed: %w", err) } ``` **Features:** - Symlink resolution and canonicalization - Path boundary enforcement (cannot escape base directory) - Magic bytes validation for ML artifacts (safetensors, GGUF, HDF5) - Dangerous extension blocking (.pt, .pkl, .exe, .sh) - Upload limits (size, rate, frequency) ### Sandbox Hardening Containers run with hardened security defaults: ```yaml # configs/worker/homelab-sandbox.yaml sandbox: network_mode: "none" # No network access by default read_only_root: true # Read-only filesystem no_new_privileges: true # Prevent privilege escalation drop_all_caps: true # Drop all capabilities allowed_caps: [] # Add CAP_ only if required user_ns: true # User namespace isolation run_as_uid: 1000 # Run as non-root user run_as_gid: 1000 seccomp_profile: "default-hardened" # Restricted syscall profile max_runtime_hours: 24 max_upload_size_bytes: 10737418240 # 10GB max_upload_rate_bps: 104857600 # 100MB/s max_uploads_per_minute: 10 ``` **Seccomp Profile** (`configs/seccomp/default-hardened.json`): - Blocks: `ptrace`, `mount`, `umount2`, `reboot`, `kexec_load` - Blocks: `open_by_handle_at`, `perf_event_open` - Default action: `SCMP_ACT_ERRNO` (deny by default) ### Secrets Management **Environment Variable Expansion:** ```yaml # config.yaml - use ${VAR} syntax for secrets redis_password: "${REDIS_PASSWORD}" snapshot_store: access_key: "${AWS_ACCESS_KEY_ID}" secret_key: "${AWS_SECRET_ACCESS_KEY}" ``` **Plaintext Detection:** The system detects and rejects plaintext secrets using: - Shannon entropy calculation (>4 bits/char indicates secret) - Pattern matching: AWS keys (`AKIA`, `ASIA`), GitHub tokens (`ghp_`), etc. **Loading Process:** 1. Config loaded from YAML 2. Environment variables expanded (`${VAR}` → value) 3. Plaintext secrets detected and rejected 4. Validation fails if secrets don't use env reference syntax ### HIPAA-Compliant Audit Logging **Tamper-Evident Logging:** ```go // Each event includes chain hash for integrity audit.Log(audit.Event{ EventType: audit.EventFileRead, UserID: "user1", Resource: "/data/file.txt", }) ``` **Event Types:** - `file_read` - File access logged - `file_write` - File modification logged - `file_delete` - File deletion logged - `auth_success` / `auth_failure` - Authentication events - `job_queued` / `job_started` / `job_completed` - Job lifecycle **Chain Hashing:** - Each event includes SHA-256 hash of previous event - Modification of any log entry breaks the chain - `VerifyChain()` function detects tampering --- ## Security Checklist ### Initial Setup 1. **Generate Strong Passwords** ```bash # Grafana admin password openssl rand -base64 32 > .grafana-password # Redis password openssl rand -base64 32 ``` 2. **Configure Environment Variables** ```bash cp .env.example .env # Edit .env and set: # - GRAFANA_ADMIN_PASSWORD ``` 3. **Enable TLS** (Production only) ```yaml # configs/api/prod.yaml server: tls: enabled: true cert_file: "/secrets/cert.pem" key_file: "/secrets/key.pem" ``` 4. **Configure Firewall** ```bash # Allow only necessary ports sudo ufw allow 22/tcp # SSH sudo ufw allow 443/tcp # HTTPS sudo ufw allow 80/tcp # HTTP (redirect to HTTPS) sudo ufw enable ``` ### Production Hardening 5. **Restrict IP Access** ```yaml # configs/api/prod.yaml auth: ip_whitelist: - "10.0.0.0/8" - "192.168.0.0/16" - "127.0.0.1" ``` 6. **Enable Audit Logging** ```yaml logging: level: "info" audit: true file: "/var/log/fetch_ml/audit.log" ``` 7. **Harden Redis** ```bash # Redis security redis-cli CONFIG SET requirepass "your-strong-password" redis-cli CONFIG SET rename-command FLUSHDB "" redis-cli CONFIG SET rename-command FLUSHALL "" ``` 8. **Secure Grafana** ```bash # Change default admin password docker-compose exec grafana grafana-cli admin reset-admin-password new-strong-password ``` 9. **Regular Updates** ```bash # Update system packages sudo apt update && sudo apt upgrade -y # Update containers docker-compose pull docker-compose up -d (testing only) ``` ## Password Management ### Generate Secure Passwords ```bash # Method 1: OpenSSL openssl rand -base64 32 # Method 2: pwgen (if installed) pwgen -s 32 1 # Method 3: /dev/urandom head -c 32 /dev/urandom | base64 ``` ### Store Passwords Securely **Development**: Use `.env` file (gitignored) ```bash echo "REDIS_PASSWORD=$(openssl rand -base64 32)" >> .env echo "GRAFANA_ADMIN_PASSWORD=$(openssl rand -base64 32)" >> .env ``` **Production**: Use systemd environment files ```bash sudo mkdir -p /etc/fetch_ml/secrets sudo chmod 700 /etc/fetch_ml/secrets echo "REDIS_PASSWORD=..." | sudo tee /etc/fetch_ml/secrets/redis.env sudo chmod 600 /etc/fetch_ml/secrets/redis.env ``` ## API Key Management ### Generate API Keys ```bash # Generate random API key openssl rand -hex 32 # Hash for storage echo -n "your-api-key" | sha256sum ``` ### Rotate API Keys 1. Generate new API key 2. Update your chosen API server config (for example a private copy of `configs/api/homelab-secure.yaml`) with the new hash 3. Distribute new key to users 4. Remove old key after grace period ### Revoke API Keys Remove user entry from your API server config file: ```yaml auth: api_keys: # user_to_revoke: # Comment out or delete ``` ## Secret Flow (What lives where) - **API server config (`configs/api/*.yaml`)** - Stores **SHA256 hashes** of API keys (never raw keys). - The repo-shipped configs intentionally contain `CHANGE_ME_...` placeholders. - For real deployments, make a private copy (e.g. `/etc/fetch_ml/config.yaml`) and fill in real hashes. - **Docker Compose `.env` / secret files** - Used for values that should not be committed (e.g. `REDIS_PASSWORD`, Grafana admin password). - `deployments/docker-compose.homelab-secure.yml` requires `REDIS_PASSWORD` to be set explicitly. - **TLS certs** - Provided as mounted files (e.g. `/app/ssl/cert.pem`, `/app/ssl/key.pem`). ## Network Security ### Production Network Topology ``` Internet ↓ [Firewall] (ports 3000, 9102) ↓ [Reverse Proxy] (nginx/Apache) - TLS termination ↓ ┌─────────────────────┐ │ Application Pod │ │ │ │ ┌──────────────┐ │ │ │ API Server │ │ ← Public (via reverse proxy) │ └──────────────┘ │ │ │ │ ┌──────────────┐ │ │ │ Redis │ │ ← Internal only │ └──────────────┘ │ │ │ │ ┌──────────────┐ │ │ │ Grafana │ │ ← Public (via reverse proxy) │ └──────────────┘ │ │ │ │ ┌──────────────┐ │ │ │ Prometheus │ │ ← Internal only │ └──────────────┘ │ │ │ │ ┌──────────────┐ │ │ │ Loki │ │ ← Internal only │ └──────────────┘ │ └─────────────────────┘ ``` ### Recommended Firewall Rules ```bash # Allow only necessary inbound connections sudo firewall-cmd --permanent --zone=public --add-rich-rule=' rule family="ipv4" source address="YOUR_NETWORK" port port="3000" protocol="tcp" accept' sudo firewall-cmd --permanent --zone=public --add-rich-rule=' rule family="ipv4" source address="YOUR_NETWORK" port port="9102" protocol="tcp" accept' # Block all other traffic sudo firewall-cmd --permanent --set-default-zone=drop sudo firewall-cmd --reload ``` ## Incident Response ### Suspected Breach 1. **Immediate Actions** 2. **Investigation** 3. **Recovery** - Rotate all API keys - Stop affected services - Review audit logs 2. **Investigation** ```bash # Check recent logins sudo journalctl -u fetchml-api --since "1 hour ago" # Review failed auth attempts grep "authentication failed" /var/log/fetch_ml/*.log # Check active connections ss -tnp | grep :9102 ``` 3. **Recovery** - Rotate all passwords and API keys - Update firewall rules - Patch vulnerabilities - Resume services ### Security Monitoring ```bash # Monitor failed authentication tail -f /var/log/fetch_ml/api.log | grep "auth.*failed" # Monitor unusual activity journalctl -u fetchml-api -f | grep -E "(ERROR|WARN)" # Check open ports nmap -p- localhost ``` ## Security Best Practices 1. **Principle of Least Privilege**: Grant minimum necessary permissions 2. **Defense in Depth**: Multiple security layers (firewall + auth + TLS) 3. **Regular Updates**: Keep all components patched 4. **Audit Regularly**: Review logs and access patterns 5. **Secure Secrets**: Never commit passwords/keys to git 6. **Network Segmentation**: Isolate services with internal networks 7. **Monitor Everything**: Enable comprehensive logging and alerting 8. **Test Security**: Regular penetration testing and vulnerability scans ## Compliance ### Data Privacy - Logs are sanitized (no passwords/API keys) - Experiment data is user-isolated - No telemetry or external data sharing ### Audit Trail All API access is logged with: - Timestamp - User/API key - Action performed - Source IP - Result (success/failure) ## Getting Help - **Security Issues**: Report privately via email - **Questions**: See documentation or create issue - **Updates**: Monitor releases for security patches