Updates documentation with new security features and hardening guide: **CHANGELOG.md:** - Added detailed security hardening section (2026-02-23) - Documents all phases: file ingestion, sandbox, secrets, audit logging, tests - Lists specific files changed and security controls implemented **docs/src/security.md:** - Added Overview section with defense-in-depth layers - Added Comprehensive Security Hardening section with: - File ingestion security with code examples - Sandbox hardening with complete YAML config - Secrets management with env expansion syntax - HIPAA audit logging with tamper-evident chain hashing
12 KiB
Security Guide
This document outlines security features, best practices, and hardening procedures for FetchML.
Overview
FetchML implements defense-in-depth security with multiple layers of protection:
- File Ingestion Security - Path traversal prevention, file type validation
- Sandbox Hardening - Container isolation with seccomp, capability dropping
- Secrets Management - Environment-based credential injection with plaintext detection
- Audit Logging - Tamper-evident logging for compliance (HIPAA)
- Authentication - API key-based access control with RBAC
Security Features
Authentication & Authorization
- API Keys: SHA256-hashed with role-based access control (RBAC)
- Permissions: Granular read/write/delete permissions per user
- IP Whitelisting: Network-level access control
- Rate Limiting: Per-user request quotas
Communication Security
- TLS/HTTPS: End-to-end encryption for API traffic
- WebSocket Auth: API key required before upgrade
- Redis Auth: Password-protected task queue
Data Privacy
- Log Sanitization: Automatically redacts API keys, passwords, tokens
- Experiment Isolation: User-specific experiment directories
- No Anonymous Access: All services require authentication
Network Security
- Internal Networks: Backend services (Redis, Loki) not exposed publicly
- Firewall Rules: Restrictive port access
- Container Isolation: Services run in separate containers/pods
Comprehensive Security Hardening (2026-02)
File Ingestion Security
All file operations are protected against path traversal attacks:
// All paths are validated with symlink resolution
validator := fileutil.NewSecurePathValidator(basePath)
cleanPath, err := validator.ValidatePath(userInput)
if err != nil {
return fmt.Errorf("path validation failed: %w", err)
}
Features:
- Symlink resolution and canonicalization
- Path boundary enforcement (cannot escape base directory)
- Magic bytes validation for ML artifacts (safetensors, GGUF, HDF5)
- Dangerous extension blocking (.pt, .pkl, .exe, .sh)
- Upload limits (size, rate, frequency)
Sandbox Hardening
Containers run with hardened security defaults:
# configs/worker/homelab-sandbox.yaml
sandbox:
network_mode: "none" # No network access by default
read_only_root: true # Read-only filesystem
no_new_privileges: true # Prevent privilege escalation
drop_all_caps: true # Drop all capabilities
allowed_caps: [] # Add CAP_ only if required
user_ns: true # User namespace isolation
run_as_uid: 1000 # Run as non-root user
run_as_gid: 1000
seccomp_profile: "default-hardened" # Restricted syscall profile
max_runtime_hours: 24
max_upload_size_bytes: 10737418240 # 10GB
max_upload_rate_bps: 104857600 # 100MB/s
max_uploads_per_minute: 10
Seccomp Profile (configs/seccomp/default-hardened.json):
- Blocks:
ptrace,mount,umount2,reboot,kexec_load - Blocks:
open_by_handle_at,perf_event_open - Default action:
SCMP_ACT_ERRNO(deny by default)
Secrets Management
Environment Variable Expansion:
# config.yaml - use ${VAR} syntax for secrets
redis_password: "${REDIS_PASSWORD}"
snapshot_store:
access_key: "${AWS_ACCESS_KEY_ID}"
secret_key: "${AWS_SECRET_ACCESS_KEY}"
Plaintext Detection: The system detects and rejects plaintext secrets using:
- Shannon entropy calculation (>4 bits/char indicates secret)
- Pattern matching: AWS keys (
AKIA,ASIA), GitHub tokens (ghp_), etc.
Loading Process:
- Config loaded from YAML
- Environment variables expanded (
${VAR}→ value) - Plaintext secrets detected and rejected
- Validation fails if secrets don't use env reference syntax
HIPAA-Compliant Audit Logging
Tamper-Evident Logging:
// Each event includes chain hash for integrity
audit.Log(audit.Event{
EventType: audit.EventFileRead,
UserID: "user1",
Resource: "/data/file.txt",
})
Event Types:
file_read- File access loggedfile_write- File modification loggedfile_delete- File deletion loggedauth_success/auth_failure- Authentication eventsjob_queued/job_started/job_completed- Job lifecycle
Chain Hashing:
- Each event includes SHA-256 hash of previous event
- Modification of any log entry breaks the chain
VerifyChain()function detects tampering
Security Checklist
Initial Setup
- Generate Strong Passwords
# Grafana admin password
openssl rand -base64 32 > .grafana-password
# Redis password
openssl rand -base64 32
- Configure Environment Variables
cp .env.example .env
# Edit .env and set:
# - GRAFANA_ADMIN_PASSWORD
- Enable TLS (Production only)
# configs/api/prod.yaml
server:
tls:
enabled: true
cert_file: "/secrets/cert.pem"
key_file: "/secrets/key.pem"
- Configure Firewall
# Allow only necessary ports
sudo ufw allow 22/tcp # SSH
sudo ufw allow 443/tcp # HTTPS
sudo ufw allow 80/tcp # HTTP (redirect to HTTPS)
sudo ufw enable
Production Hardening
- Restrict IP Access
# configs/api/prod.yaml
auth:
ip_whitelist:
- "10.0.0.0/8"
- "192.168.0.0/16"
- "127.0.0.1"
- Enable Audit Logging
logging:
level: "info"
audit: true
file: "/var/log/fetch_ml/audit.log"
- Harden Redis
# Redis security
redis-cli CONFIG SET requirepass "your-strong-password"
redis-cli CONFIG SET rename-command FLUSHDB ""
redis-cli CONFIG SET rename-command FLUSHALL ""
- Secure Grafana
# Change default admin password
docker-compose exec grafana grafana-cli admin reset-admin-password new-strong-password
- Regular Updates
# Update system packages
sudo apt update && sudo apt upgrade -y
# Update containers
docker-compose pull
docker-compose up -d (testing only)
Password Management
Generate Secure Passwords
# Method 1: OpenSSL
openssl rand -base64 32
# Method 2: pwgen (if installed)
pwgen -s 32 1
# Method 3: /dev/urandom
head -c 32 /dev/urandom | base64
Store Passwords Securely
Development: Use .env file (gitignored)
echo "REDIS_PASSWORD=$(openssl rand -base64 32)" >> .env
echo "GRAFANA_ADMIN_PASSWORD=$(openssl rand -base64 32)" >> .env
Production: Use systemd environment files
sudo mkdir -p /etc/fetch_ml/secrets
sudo chmod 700 /etc/fetch_ml/secrets
echo "REDIS_PASSWORD=..." | sudo tee /etc/fetch_ml/secrets/redis.env
sudo chmod 600 /etc/fetch_ml/secrets/redis.env
API Key Management
Generate API Keys
# Generate random API key
openssl rand -hex 32
# Hash for storage
echo -n "your-api-key" | sha256sum
Rotate API Keys
- Generate new API key
- Update your chosen API server config (for example a private copy of
configs/api/homelab-secure.yaml) with the new hash - Distribute new key to users
- Remove old key after grace period
Revoke API Keys
Remove user entry from your API server config file:
auth:
api_keys:
# user_to_revoke: # Comment out or delete
Secret Flow (What lives where)
-
API server config (
configs/api/*.yaml)- Stores SHA256 hashes of API keys (never raw keys).
- The repo-shipped configs intentionally contain
CHANGE_ME_...placeholders. - For real deployments, make a private copy (e.g.
/etc/fetch_ml/config.yaml) and fill in real hashes.
-
Docker Compose
.env/ secret files- Used for values that should not be committed (e.g.
REDIS_PASSWORD, Grafana admin password). deployments/docker-compose.homelab-secure.ymlrequiresREDIS_PASSWORDto be set explicitly.
- Used for values that should not be committed (e.g.
-
TLS certs
- Provided as mounted files (e.g.
/app/ssl/cert.pem,/app/ssl/key.pem).
- Provided as mounted files (e.g.
Network Security
Production Network Topology
Internet
↓
[Firewall] (ports 3000, 9102)
↓
[Reverse Proxy] (nginx/Apache) - TLS termination
↓
┌─────────────────────┐
│ Application Pod │
│ │
│ ┌──────────────┐ │
│ │ API Server │ │ ← Public (via reverse proxy)
│ └──────────────┘ │
│ │
│ ┌──────────────┐ │
│ │ Redis │ │ ← Internal only
│ └──────────────┘ │
│ │
│ ┌──────────────┐ │
│ │ Grafana │ │ ← Public (via reverse proxy)
│ └──────────────┘ │
│ │
│ ┌──────────────┐ │
│ │ Prometheus │ │ ← Internal only
│ └──────────────┘ │
│ │
│ ┌──────────────┐ │
│ │ Loki │ │ ← Internal only
│ └──────────────┘ │
└─────────────────────┘
Recommended Firewall Rules
# Allow only necessary inbound connections
sudo firewall-cmd --permanent --zone=public --add-rich-rule='
rule family="ipv4"
source address="YOUR_NETWORK"
port port="3000" protocol="tcp" accept'
sudo firewall-cmd --permanent --zone=public --add-rich-rule='
rule family="ipv4"
source address="YOUR_NETWORK"
port port="9102" protocol="tcp" accept'
# Block all other traffic
sudo firewall-cmd --permanent --set-default-zone=drop
sudo firewall-cmd --reload
Incident Response
Suspected Breach
-
Immediate Actions
-
Investigation
-
Recovery
- Rotate all API keys
- Stop affected services
- Review audit logs
-
Investigation
# Check recent logins sudo journalctl -u fetchml-api --since "1 hour ago" # Review failed auth attempts grep "authentication failed" /var/log/fetch_ml/*.log # Check active connections ss -tnp | grep :9102 -
Recovery
- Rotate all passwords and API keys
- Update firewall rules
- Patch vulnerabilities
- Resume services
Security Monitoring
# Monitor failed authentication
tail -f /var/log/fetch_ml/api.log | grep "auth.*failed"
# Monitor unusual activity
journalctl -u fetchml-api -f | grep -E "(ERROR|WARN)"
# Check open ports
nmap -p- localhost
Security Best Practices
- Principle of Least Privilege: Grant minimum necessary permissions
- Defense in Depth: Multiple security layers (firewall + auth + TLS)
- Regular Updates: Keep all components patched
- Audit Regularly: Review logs and access patterns
- Secure Secrets: Never commit passwords/keys to git
- Network Segmentation: Isolate services with internal networks
- Monitor Everything: Enable comprehensive logging and alerting
- Test Security: Regular penetration testing and vulnerability scans
Compliance
Data Privacy
- Logs are sanitized (no passwords/API keys)
- Experiment data is user-isolated
- No telemetry or external data sharing
Audit Trail
All API access is logged with:
- Timestamp
- User/API key
- Action performed
- Source IP
- Result (success/failure)
Getting Help
- Security Issues: Report privately via email
- Questions: See documentation or create issue
- Updates: Monitor releases for security patches