docs(security): document comprehensive security hardening

Updates documentation with new security features and hardening guide:

**CHANGELOG.md:**
- Added detailed security hardening section (2026-02-23)
- Documents all phases: file ingestion, sandbox, secrets, audit logging, tests
- Lists specific files changed and security controls implemented

**docs/src/security.md:**
- Added Overview section with defense-in-depth layers
- Added Comprehensive Security Hardening section with:
  - File ingestion security with code examples
  - Sandbox hardening with complete YAML config
  - Secrets management with env expansion syntax
  - HIPAA audit logging with tamper-evident chain hashing

2026-02-23 18:03:25 -05:00

12 KiB

Raw Blame History

Security Guide

This document outlines security features, best practices, and hardening procedures for FetchML.

Overview

FetchML implements defense-in-depth security with multiple layers of protection:

File Ingestion Security - Path traversal prevention, file type validation
Sandbox Hardening - Container isolation with seccomp, capability dropping
Secrets Management - Environment-based credential injection with plaintext detection
Audit Logging - Tamper-evident logging for compliance (HIPAA)
Authentication - API key-based access control with RBAC

Security Features

Authentication & Authorization

API Keys: SHA256-hashed with role-based access control (RBAC)
Permissions: Granular read/write/delete permissions per user
IP Whitelisting: Network-level access control
Rate Limiting: Per-user request quotas

Communication Security

TLS/HTTPS: End-to-end encryption for API traffic
WebSocket Auth: API key required before upgrade
Redis Auth: Password-protected task queue

Data Privacy

Log Sanitization: Automatically redacts API keys, passwords, tokens
Experiment Isolation: User-specific experiment directories
No Anonymous Access: All services require authentication

Network Security

Internal Networks: Backend services (Redis, Loki) not exposed publicly
Firewall Rules: Restrictive port access
Container Isolation: Services run in separate containers/pods

Comprehensive Security Hardening (2026-02)

File Ingestion Security

All file operations are protected against path traversal attacks:

// All paths are validated with symlink resolution
validator := fileutil.NewSecurePathValidator(basePath)
cleanPath, err := validator.ValidatePath(userInput)
if err != nil {
    return fmt.Errorf("path validation failed: %w", err)
}

Features:

Symlink resolution and canonicalization
Path boundary enforcement (cannot escape base directory)
Magic bytes validation for ML artifacts (safetensors, GGUF, HDF5)
Dangerous extension blocking (.pt, .pkl, .exe, .sh)
Upload limits (size, rate, frequency)

Sandbox Hardening

Containers run with hardened security defaults:

# configs/worker/homelab-sandbox.yaml
sandbox:
  network_mode: "none"           # No network access by default
  read_only_root: true          # Read-only filesystem
  no_new_privileges: true       # Prevent privilege escalation
  drop_all_caps: true           # Drop all capabilities
  allowed_caps: []              # Add CAP_ only if required
  user_ns: true                 # User namespace isolation
  run_as_uid: 1000               # Run as non-root user
  run_as_gid: 1000
  seccomp_profile: "default-hardened"  # Restricted syscall profile
  max_runtime_hours: 24
  max_upload_size_bytes: 10737418240   # 10GB
  max_upload_rate_bps: 104857600       # 100MB/s
  max_uploads_per_minute: 10

Seccomp Profile (configs/seccomp/default-hardened.json):

Blocks: ptrace, mount, umount2, reboot, kexec_load
Blocks: open_by_handle_at, perf_event_open
Default action: SCMP_ACT_ERRNO (deny by default)

Secrets Management

Environment Variable Expansion:

# config.yaml - use ${VAR} syntax for secrets
redis_password: "${REDIS_PASSWORD}"
snapshot_store:
  access_key: "${AWS_ACCESS_KEY_ID}"
  secret_key: "${AWS_SECRET_ACCESS_KEY}"

Plaintext Detection: The system detects and rejects plaintext secrets using:

Shannon entropy calculation (>4 bits/char indicates secret)
Pattern matching: AWS keys (AKIA, ASIA), GitHub tokens (ghp_), etc.

Loading Process:

Config loaded from YAML
Environment variables expanded (${VAR} → value)
Plaintext secrets detected and rejected
Validation fails if secrets don't use env reference syntax

HIPAA-Compliant Audit Logging

Tamper-Evident Logging:

// Each event includes chain hash for integrity
audit.Log(audit.Event{
    EventType: audit.EventFileRead,
    UserID:    "user1",
    Resource:  "/data/file.txt",
})

Event Types:

file_read - File access logged
file_write - File modification logged
file_delete - File deletion logged
auth_success / auth_failure - Authentication events
job_queued / job_started / job_completed - Job lifecycle

Chain Hashing:

Each event includes SHA-256 hash of previous event
Modification of any log entry breaks the chain
VerifyChain() function detects tampering

Security Checklist

Initial Setup

Generate Strong Passwords

# Grafana admin password
openssl rand -base64 32 > .grafana-password

# Redis password
openssl rand -base64 32

Configure Environment Variables

cp .env.example .env
# Edit .env and set:
# - GRAFANA_ADMIN_PASSWORD

Enable TLS (Production only)

# configs/api/prod.yaml
server:
  tls:
    enabled: true
    cert_file: "/secrets/cert.pem"
    key_file: "/secrets/key.pem"

Configure Firewall

# Allow only necessary ports
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 443/tcp   # HTTPS
sudo ufw allow 80/tcp    # HTTP (redirect to HTTPS)
sudo ufw enable

Production Hardening

Restrict IP Access

# configs/api/prod.yaml
auth:
  ip_whitelist:
    - "10.0.0.0/8"
    - "192.168.0.0/16"
    - "127.0.0.1"

Enable Audit Logging

logging:
  level: "info"
  audit: true
  file: "/var/log/fetch_ml/audit.log"

Harden Redis

# Redis security
redis-cli CONFIG SET requirepass "your-strong-password"
redis-cli CONFIG SET rename-command FLUSHDB ""
redis-cli CONFIG SET rename-command FLUSHALL ""

Secure Grafana

# Change default admin password
docker-compose exec grafana grafana-cli admin reset-admin-password new-strong-password

Regular Updates

# Update system packages
sudo apt update && sudo apt upgrade -y

# Update containers
docker-compose pull
docker-compose up -d (testing only)

Password Management

Generate Secure Passwords

# Method 1: OpenSSL
openssl rand -base64 32

# Method 2: pwgen (if installed)
pwgen -s 32 1

# Method 3: /dev/urandom
head -c 32 /dev/urandom | base64

Store Passwords Securely

Development: Use .env file (gitignored)

echo "REDIS_PASSWORD=$(openssl rand -base64 32)" >> .env
echo "GRAFANA_ADMIN_PASSWORD=$(openssl rand -base64 32)" >> .env

Production: Use systemd environment files

sudo mkdir -p /etc/fetch_ml/secrets
sudo chmod 700 /etc/fetch_ml/secrets
echo "REDIS_PASSWORD=..." | sudo tee /etc/fetch_ml/secrets/redis.env
sudo chmod 600 /etc/fetch_ml/secrets/redis.env

API Key Management

Generate API Keys

# Generate random API key
openssl rand -hex 32

# Hash for storage
echo -n "your-api-key" | sha256sum

Rotate API Keys

Generate new API key
Update your chosen API server config (for example a private copy of configs/api/homelab-secure.yaml) with the new hash
Distribute new key to users
Remove old key after grace period

Revoke API Keys

Remove user entry from your API server config file:

auth:
  api_keys:
    # user_to_revoke:  # Comment out or delete

Secret Flow (What lives where)

API server config (configs/api/*.yaml)
- Stores SHA256 hashes of API keys (never raw keys).
- The repo-shipped configs intentionally contain CHANGE_ME_... placeholders.
- For real deployments, make a private copy (e.g. /etc/fetch_ml/config.yaml) and fill in real hashes.
Docker Compose .env / secret files
- Used for values that should not be committed (e.g. REDIS_PASSWORD, Grafana admin password).
- deployments/docker-compose.homelab-secure.yml requires REDIS_PASSWORD to be set explicitly.
TLS certs
- Provided as mounted files (e.g. /app/ssl/cert.pem, /app/ssl/key.pem).

Network Security

Production Network Topology

Internet
    ↓
[Firewall] (ports 3000, 9102)
    ↓
[Reverse Proxy] (nginx/Apache) - TLS termination
    ↓
┌─────────────────────┐
│   Application Pod   │
│                     │
│  ┌──────────────┐   │
│  │ API Server   │   │  ← Public (via reverse proxy)
│  └──────────────┘   │
│                     │
│  ┌──────────────┐   │
│  │   Redis      │   │  ← Internal only
│  └──────────────┘   │
│                     │
│  ┌──────────────┐   │
│  │   Grafana    │   │  ← Public (via reverse proxy)
│  └──────────────┘   │
│                     │
│  ┌──────────────┐   │
│  │ Prometheus   │   │  ← Internal only
│  └──────────────┘   │
│                     │
│  ┌──────────────┐   │
│  │    Loki      │   │  ← Internal only
│  └──────────────┘   │
└─────────────────────┘

Recommended Firewall Rules

# Allow only necessary inbound connections
sudo firewall-cmd --permanent --zone=public --add-rich-rule='
  rule family="ipv4"
  source address="YOUR_NETWORK"
  port port="3000" protocol="tcp" accept'

sudo firewall-cmd --permanent --zone=public --add-rich-rule='
  rule family="ipv4"
  source address="YOUR_NETWORK"
  port port="9102" protocol="tcp" accept'

# Block all other traffic
sudo firewall-cmd --permanent --set-default-zone=drop
sudo firewall-cmd --reload

Incident Response

Suspected Breach

Immediate Actions
Investigation
Recovery
- Rotate all API keys
- Stop affected services
- Review audit logs

Investigation

# Check recent logins
sudo journalctl -u fetchml-api --since "1 hour ago"

# Review failed auth attempts
grep "authentication failed" /var/log/fetch_ml/*.log

# Check active connections
ss -tnp | grep :9102

Recovery
- Rotate all passwords and API keys
- Update firewall rules
- Patch vulnerabilities
- Resume services

Security Monitoring

# Monitor failed authentication
tail -f /var/log/fetch_ml/api.log | grep "auth.*failed"

# Monitor unusual activity
journalctl -u fetchml-api -f | grep -E "(ERROR|WARN)"

# Check open ports
nmap -p- localhost

Security Best Practices

Principle of Least Privilege: Grant minimum necessary permissions
Defense in Depth: Multiple security layers (firewall + auth + TLS)
Regular Updates: Keep all components patched
Audit Regularly: Review logs and access patterns
Secure Secrets: Never commit passwords/keys to git
Network Segmentation: Isolate services with internal networks
Monitor Everything: Enable comprehensive logging and alerting
Test Security: Regular penetration testing and vulnerability scans

Compliance

Data Privacy

Logs are sanitized (no passwords/API keys)
Experiment data is user-isolated
No telemetry or external data sharing

Audit Trail

All API access is logged with:

Timestamp
User/API key
Action performed
Source IP
Result (success/failure)

Getting Help

Security Issues: Report privately via email
Questions: See documentation or create issue
Updates: Monitor releases for security patches

12 KiB Raw Blame History

Security Guide

Overview

Security Features

Authentication & Authorization

Communication Security

Data Privacy

Network Security

Comprehensive Security Hardening (2026-02)

File Ingestion Security

Sandbox Hardening

Secrets Management

HIPAA-Compliant Audit Logging

Security Checklist

Initial Setup

Production Hardening

Password Management

Generate Secure Passwords

Store Passwords Securely

API Key Management

Generate API Keys

Rotate API Keys

Revoke API Keys

Secret Flow (What lives where)

Network Security

Production Network Topology

Recommended Firewall Rules

Incident Response

Suspected Breach

Security Monitoring

Security Best Practices

Compliance

Data Privacy

Audit Trail

Getting Help

12 KiB

Raw Blame History