Updates documentation with new security features and hardening guide: **CHANGELOG.md:** - Added detailed security hardening section (2026-02-23) - Documents all phases: file ingestion, sandbox, secrets, audit logging, tests - Lists specific files changed and security controls implemented **docs/src/security.md:** - Added Overview section with defense-in-depth layers - Added Comprehensive Security Hardening section with: - File ingestion security with code examples - Sandbox hardening with complete YAML config - Secrets management with env expansion syntax - HIPAA audit logging with tamper-evident chain hashing
422 lines
12 KiB
Markdown
422 lines
12 KiB
Markdown
# Security Guide
|
|
|
|
This document outlines security features, best practices, and hardening procedures for FetchML.
|
|
|
|
## Overview
|
|
|
|
FetchML implements defense-in-depth security with multiple layers of protection:
|
|
|
|
1. **File Ingestion Security** - Path traversal prevention, file type validation
|
|
2. **Sandbox Hardening** - Container isolation with seccomp, capability dropping
|
|
3. **Secrets Management** - Environment-based credential injection with plaintext detection
|
|
4. **Audit Logging** - Tamper-evident logging for compliance (HIPAA)
|
|
5. **Authentication** - API key-based access control with RBAC
|
|
|
|
---
|
|
|
|
## Security Features
|
|
|
|
### Authentication & Authorization
|
|
- **API Keys**: SHA256-hashed with role-based access control (RBAC)
|
|
- **Permissions**: Granular read/write/delete permissions per user
|
|
- **IP Whitelisting**: Network-level access control
|
|
- **Rate Limiting**: Per-user request quotas
|
|
|
|
### Communication Security
|
|
- **TLS/HTTPS**: End-to-end encryption for API traffic
|
|
- **WebSocket Auth**: API key required before upgrade
|
|
- **Redis Auth**: Password-protected task queue
|
|
|
|
### Data Privacy
|
|
- **Log Sanitization**: Automatically redacts API keys, passwords, tokens
|
|
- **Experiment Isolation**: User-specific experiment directories
|
|
- **No Anonymous Access**: All services require authentication
|
|
|
|
### Network Security
|
|
- **Internal Networks**: Backend services (Redis, Loki) not exposed publicly
|
|
- **Firewall Rules**: Restrictive port access
|
|
- **Container Isolation**: Services run in separate containers/pods
|
|
|
|
---
|
|
|
|
## Comprehensive Security Hardening (2026-02)
|
|
|
|
### File Ingestion Security
|
|
|
|
All file operations are protected against path traversal attacks:
|
|
|
|
```go
|
|
// All paths are validated with symlink resolution
|
|
validator := fileutil.NewSecurePathValidator(basePath)
|
|
cleanPath, err := validator.ValidatePath(userInput)
|
|
if err != nil {
|
|
return fmt.Errorf("path validation failed: %w", err)
|
|
}
|
|
```
|
|
|
|
**Features:**
|
|
- Symlink resolution and canonicalization
|
|
- Path boundary enforcement (cannot escape base directory)
|
|
- Magic bytes validation for ML artifacts (safetensors, GGUF, HDF5)
|
|
- Dangerous extension blocking (.pt, .pkl, .exe, .sh)
|
|
- Upload limits (size, rate, frequency)
|
|
|
|
### Sandbox Hardening
|
|
|
|
Containers run with hardened security defaults:
|
|
|
|
```yaml
|
|
# configs/worker/homelab-sandbox.yaml
|
|
sandbox:
|
|
network_mode: "none" # No network access by default
|
|
read_only_root: true # Read-only filesystem
|
|
no_new_privileges: true # Prevent privilege escalation
|
|
drop_all_caps: true # Drop all capabilities
|
|
allowed_caps: [] # Add CAP_ only if required
|
|
user_ns: true # User namespace isolation
|
|
run_as_uid: 1000 # Run as non-root user
|
|
run_as_gid: 1000
|
|
seccomp_profile: "default-hardened" # Restricted syscall profile
|
|
max_runtime_hours: 24
|
|
max_upload_size_bytes: 10737418240 # 10GB
|
|
max_upload_rate_bps: 104857600 # 100MB/s
|
|
max_uploads_per_minute: 10
|
|
```
|
|
|
|
**Seccomp Profile** (`configs/seccomp/default-hardened.json`):
|
|
- Blocks: `ptrace`, `mount`, `umount2`, `reboot`, `kexec_load`
|
|
- Blocks: `open_by_handle_at`, `perf_event_open`
|
|
- Default action: `SCMP_ACT_ERRNO` (deny by default)
|
|
|
|
### Secrets Management
|
|
|
|
**Environment Variable Expansion:**
|
|
```yaml
|
|
# config.yaml - use ${VAR} syntax for secrets
|
|
redis_password: "${REDIS_PASSWORD}"
|
|
snapshot_store:
|
|
access_key: "${AWS_ACCESS_KEY_ID}"
|
|
secret_key: "${AWS_SECRET_ACCESS_KEY}"
|
|
```
|
|
|
|
**Plaintext Detection:**
|
|
The system detects and rejects plaintext secrets using:
|
|
- Shannon entropy calculation (>4 bits/char indicates secret)
|
|
- Pattern matching: AWS keys (`AKIA`, `ASIA`), GitHub tokens (`ghp_`), etc.
|
|
|
|
**Loading Process:**
|
|
1. Config loaded from YAML
|
|
2. Environment variables expanded (`${VAR}` → value)
|
|
3. Plaintext secrets detected and rejected
|
|
4. Validation fails if secrets don't use env reference syntax
|
|
|
|
### HIPAA-Compliant Audit Logging
|
|
|
|
**Tamper-Evident Logging:**
|
|
```go
|
|
// Each event includes chain hash for integrity
|
|
audit.Log(audit.Event{
|
|
EventType: audit.EventFileRead,
|
|
UserID: "user1",
|
|
Resource: "/data/file.txt",
|
|
})
|
|
```
|
|
|
|
**Event Types:**
|
|
- `file_read` - File access logged
|
|
- `file_write` - File modification logged
|
|
- `file_delete` - File deletion logged
|
|
- `auth_success` / `auth_failure` - Authentication events
|
|
- `job_queued` / `job_started` / `job_completed` - Job lifecycle
|
|
|
|
**Chain Hashing:**
|
|
- Each event includes SHA-256 hash of previous event
|
|
- Modification of any log entry breaks the chain
|
|
- `VerifyChain()` function detects tampering
|
|
|
|
---
|
|
|
|
## Security Checklist
|
|
|
|
### Initial Setup
|
|
|
|
1. **Generate Strong Passwords**
|
|
```bash
|
|
# Grafana admin password
|
|
openssl rand -base64 32 > .grafana-password
|
|
|
|
# Redis password
|
|
openssl rand -base64 32
|
|
```
|
|
|
|
2. **Configure Environment Variables**
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env and set:
|
|
# - GRAFANA_ADMIN_PASSWORD
|
|
```
|
|
|
|
3. **Enable TLS** (Production only)
|
|
```yaml
|
|
# configs/api/prod.yaml
|
|
server:
|
|
tls:
|
|
enabled: true
|
|
cert_file: "/secrets/cert.pem"
|
|
key_file: "/secrets/key.pem"
|
|
```
|
|
|
|
4. **Configure Firewall**
|
|
```bash
|
|
# Allow only necessary ports
|
|
sudo ufw allow 22/tcp # SSH
|
|
sudo ufw allow 443/tcp # HTTPS
|
|
sudo ufw allow 80/tcp # HTTP (redirect to HTTPS)
|
|
sudo ufw enable
|
|
```
|
|
|
|
### Production Hardening
|
|
|
|
5. **Restrict IP Access**
|
|
```yaml
|
|
# configs/api/prod.yaml
|
|
auth:
|
|
ip_whitelist:
|
|
- "10.0.0.0/8"
|
|
- "192.168.0.0/16"
|
|
- "127.0.0.1"
|
|
```
|
|
|
|
6. **Enable Audit Logging**
|
|
```yaml
|
|
logging:
|
|
level: "info"
|
|
audit: true
|
|
file: "/var/log/fetch_ml/audit.log"
|
|
```
|
|
|
|
7. **Harden Redis**
|
|
```bash
|
|
# Redis security
|
|
redis-cli CONFIG SET requirepass "your-strong-password"
|
|
redis-cli CONFIG SET rename-command FLUSHDB ""
|
|
redis-cli CONFIG SET rename-command FLUSHALL ""
|
|
```
|
|
|
|
8. **Secure Grafana**
|
|
```bash
|
|
# Change default admin password
|
|
docker-compose exec grafana grafana-cli admin reset-admin-password new-strong-password
|
|
```
|
|
|
|
9. **Regular Updates**
|
|
```bash
|
|
# Update system packages
|
|
sudo apt update && sudo apt upgrade -y
|
|
|
|
# Update containers
|
|
docker-compose pull
|
|
docker-compose up -d (testing only)
|
|
```
|
|
|
|
## Password Management
|
|
|
|
### Generate Secure Passwords
|
|
|
|
```bash
|
|
# Method 1: OpenSSL
|
|
openssl rand -base64 32
|
|
|
|
# Method 2: pwgen (if installed)
|
|
pwgen -s 32 1
|
|
|
|
# Method 3: /dev/urandom
|
|
head -c 32 /dev/urandom | base64
|
|
```
|
|
|
|
### Store Passwords Securely
|
|
|
|
**Development**: Use `.env` file (gitignored)
|
|
```bash
|
|
echo "REDIS_PASSWORD=$(openssl rand -base64 32)" >> .env
|
|
echo "GRAFANA_ADMIN_PASSWORD=$(openssl rand -base64 32)" >> .env
|
|
```
|
|
|
|
**Production**: Use systemd environment files
|
|
```bash
|
|
sudo mkdir -p /etc/fetch_ml/secrets
|
|
sudo chmod 700 /etc/fetch_ml/secrets
|
|
echo "REDIS_PASSWORD=..." | sudo tee /etc/fetch_ml/secrets/redis.env
|
|
sudo chmod 600 /etc/fetch_ml/secrets/redis.env
|
|
```
|
|
|
|
## API Key Management
|
|
|
|
### Generate API Keys
|
|
|
|
```bash
|
|
# Generate random API key
|
|
openssl rand -hex 32
|
|
|
|
# Hash for storage
|
|
echo -n "your-api-key" | sha256sum
|
|
```
|
|
|
|
### Rotate API Keys
|
|
|
|
1. Generate new API key
|
|
2. Update your chosen API server config (for example a private copy of `configs/api/homelab-secure.yaml`) with the new hash
|
|
3. Distribute new key to users
|
|
4. Remove old key after grace period
|
|
|
|
### Revoke API Keys
|
|
|
|
Remove user entry from your API server config file:
|
|
```yaml
|
|
auth:
|
|
api_keys:
|
|
# user_to_revoke: # Comment out or delete
|
|
```
|
|
|
|
## Secret Flow (What lives where)
|
|
|
|
- **API server config (`configs/api/*.yaml`)**
|
|
- Stores **SHA256 hashes** of API keys (never raw keys).
|
|
- The repo-shipped configs intentionally contain `CHANGE_ME_...` placeholders.
|
|
- For real deployments, make a private copy (e.g. `/etc/fetch_ml/config.yaml`) and fill in real hashes.
|
|
|
|
- **Docker Compose `.env` / secret files**
|
|
- Used for values that should not be committed (e.g. `REDIS_PASSWORD`, Grafana admin password).
|
|
- `deployments/docker-compose.homelab-secure.yml` requires `REDIS_PASSWORD` to be set explicitly.
|
|
|
|
- **TLS certs**
|
|
- Provided as mounted files (e.g. `/app/ssl/cert.pem`, `/app/ssl/key.pem`).
|
|
|
|
## Network Security
|
|
|
|
### Production Network Topology
|
|
|
|
```
|
|
Internet
|
|
↓
|
|
[Firewall] (ports 3000, 9102)
|
|
↓
|
|
[Reverse Proxy] (nginx/Apache) - TLS termination
|
|
↓
|
|
┌─────────────────────┐
|
|
│ Application Pod │
|
|
│ │
|
|
│ ┌──────────────┐ │
|
|
│ │ API Server │ │ ← Public (via reverse proxy)
|
|
│ └──────────────┘ │
|
|
│ │
|
|
│ ┌──────────────┐ │
|
|
│ │ Redis │ │ ← Internal only
|
|
│ └──────────────┘ │
|
|
│ │
|
|
│ ┌──────────────┐ │
|
|
│ │ Grafana │ │ ← Public (via reverse proxy)
|
|
│ └──────────────┘ │
|
|
│ │
|
|
│ ┌──────────────┐ │
|
|
│ │ Prometheus │ │ ← Internal only
|
|
│ └──────────────┘ │
|
|
│ │
|
|
│ ┌──────────────┐ │
|
|
│ │ Loki │ │ ← Internal only
|
|
│ └──────────────┘ │
|
|
└─────────────────────┘
|
|
```
|
|
|
|
### Recommended Firewall Rules
|
|
|
|
```bash
|
|
# Allow only necessary inbound connections
|
|
sudo firewall-cmd --permanent --zone=public --add-rich-rule='
|
|
rule family="ipv4"
|
|
source address="YOUR_NETWORK"
|
|
port port="3000" protocol="tcp" accept'
|
|
|
|
sudo firewall-cmd --permanent --zone=public --add-rich-rule='
|
|
rule family="ipv4"
|
|
source address="YOUR_NETWORK"
|
|
port port="9102" protocol="tcp" accept'
|
|
|
|
# Block all other traffic
|
|
sudo firewall-cmd --permanent --set-default-zone=drop
|
|
sudo firewall-cmd --reload
|
|
```
|
|
|
|
## Incident Response
|
|
|
|
### Suspected Breach
|
|
|
|
1. **Immediate Actions**
|
|
2. **Investigation**
|
|
3. **Recovery**
|
|
- Rotate all API keys
|
|
- Stop affected services
|
|
- Review audit logs
|
|
|
|
2. **Investigation**
|
|
```bash
|
|
# Check recent logins
|
|
sudo journalctl -u fetchml-api --since "1 hour ago"
|
|
|
|
# Review failed auth attempts
|
|
grep "authentication failed" /var/log/fetch_ml/*.log
|
|
|
|
# Check active connections
|
|
ss -tnp | grep :9102
|
|
```
|
|
|
|
3. **Recovery**
|
|
- Rotate all passwords and API keys
|
|
- Update firewall rules
|
|
- Patch vulnerabilities
|
|
- Resume services
|
|
|
|
### Security Monitoring
|
|
|
|
```bash
|
|
# Monitor failed authentication
|
|
tail -f /var/log/fetch_ml/api.log | grep "auth.*failed"
|
|
|
|
# Monitor unusual activity
|
|
journalctl -u fetchml-api -f | grep -E "(ERROR|WARN)"
|
|
|
|
# Check open ports
|
|
nmap -p- localhost
|
|
```
|
|
|
|
## Security Best Practices
|
|
|
|
1. **Principle of Least Privilege**: Grant minimum necessary permissions
|
|
2. **Defense in Depth**: Multiple security layers (firewall + auth + TLS)
|
|
3. **Regular Updates**: Keep all components patched
|
|
4. **Audit Regularly**: Review logs and access patterns
|
|
5. **Secure Secrets**: Never commit passwords/keys to git
|
|
6. **Network Segmentation**: Isolate services with internal networks
|
|
7. **Monitor Everything**: Enable comprehensive logging and alerting
|
|
8. **Test Security**: Regular penetration testing and vulnerability scans
|
|
|
|
## Compliance
|
|
|
|
### Data Privacy
|
|
- Logs are sanitized (no passwords/API keys)
|
|
- Experiment data is user-isolated
|
|
- No telemetry or external data sharing
|
|
|
|
### Audit Trail
|
|
All API access is logged with:
|
|
- Timestamp
|
|
- User/API key
|
|
- Action performed
|
|
- Source IP
|
|
- Result (success/failure)
|
|
|
|
## Getting Help
|
|
|
|
- **Security Issues**: Report privately via email
|
|
- **Questions**: See documentation or create issue
|
|
- **Updates**: Monitor releases for security patches
|