fetch_ml/SECURITY.md
Jeremie Fraeys 6580917ba8
refactor: extract domain types and consolidate error system (Phases 1-2)
Phase 1: Extract Domain Types
=============================
- Create internal/domain/ package with canonical types:
  - domain/task.go: Task, Attempt structs
  - domain/tracking.go: TrackingConfig and MLflow/TensorBoard/Wandb configs
  - domain/dataset.go: DatasetSpec
  - domain/status.go: JobStatus constants
  - domain/errors.go: FailureClass system with classification functions
  - domain/doc.go: package documentation

- Update queue/task.go to re-export domain types (backward compatibility)
- Update TUI model/state.go to use domain types via type aliases
- Simplify TUI services: remove ~60 lines of conversion functions

Phase 2: Delete ErrorCategory System
====================================
- Remove deprecated ErrorCategory type and constants
- Remove TaskError struct and related functions
- Remove mapping functions: ClassifyError, IsRetryable, GetUserMessage, RetryDelay
- Update all queue implementations to use domain.FailureClass directly:
  - queue/metrics.go: RecordTaskFailure/Retry now take FailureClass
  - queue/queue.go: RetryTask uses domain.ClassifyFailure
  - queue/filesystem_queue.go: RetryTask and MoveToDeadLetterQueue updated
  - queue/sqlite_queue.go: RetryTask and MoveToDeadLetterQueue updated

Lines eliminated: ~190 lines of conversion and mapping code
Result: Single source of truth for domain types and error classification
2026-02-17 12:34:28 -05:00

210 lines
5.3 KiB
Markdown

# Security Guide for Fetch ML Homelab
This guide covers security best practices for deploying Fetch ML in a homelab environment.
## Quick Setup
Secure setup requires manual configuration:
1. **Generate API keys**: Use the instructions in [API Security](#api-security) below
2. **Create TLS certificates**: Use OpenSSL commands in [Troubleshooting](#troubleshooting)
3. **Configure security**: Copy and edit `configs/api/homelab-secure.yaml`
4. **Set permissions**: Ensure `.api-keys` and `.env.secure` have 600 permissions
## Security Features
### Authentication
- **API Key Authentication**: SHA256 hashed API keys
- **Role-based Access Control**: Admin, researcher, analyst roles
- **Permission System**: Granular permissions per resource
### Network Security
- **TLS/SSL**: HTTPS encrypted communication
- **IP Whitelisting**: Restrict access to trusted networks
- **Rate Limiting**: Prevent abuse and DoS attacks
- **Reverse Proxy**: Caddy with security headers
### Data Protection
- **Path Traversal Protection**: Prevents directory escape attacks
- **Package Validation**: Blocks dangerous Python packages
- **Input Validation**: Comprehensive input sanitization
## Configuration Files
### Secure Config Location
- `configs/api/homelab-secure.yaml` - Main secure configuration
### API Keys
- `.api-keys` - Generated API keys (600 permissions)
- Never commit to version control
- Store in password manager
### TLS Certificates
- `ssl/cert.pem` - TLS certificate
- `ssl/key.pem` - Private key (600 permissions)
### Environment Variables
- `.env.secure` - JWT secret and other secrets (600 permissions)
## Deployment Options
### Option 1: Docker Compose (Recommended)
```bash
# Configure secure setup manually (see Quick Setup above)
# Copy and edit the secure configuration
cp configs/api/homelab-secure.yaml configs/api/my-secure.yaml
# Edit with your API keys, TLS settings, and IP whitelist
# Deploy with security overlay
docker-compose -f docker-compose.yml -f docker-compose.homelab-secure.yml up -d
```
### Option 2: Direct Deployment
```bash
# Configure secure setup manually (see Quick Setup above)
# Copy and edit the secure configuration
cp configs/api/homelab-secure.yaml configs/api/my-secure.yaml
# Edit with your API keys, TLS settings, and IP whitelist
# Load environment variables
source .env.secure
# Start server
./api-server -config configs/api/my-secure.yaml
```
## Security Checklist
### Before Deployment
- [ ] Generate unique API keys (don't use defaults)
- [ ] Set strong JWT secret
- [ ] Enable TLS/SSL
- [ ] Configure IP whitelist for your network
- [ ] Set up rate limiting
- [ ] Enable Redis authentication
### Network Security
- [ ] Use HTTPS only (disable HTTP)
- [ ] Restrict API access to trusted IPs
- [ ] Use reverse proxy (caddy)
- [ ] Enable security headers
- [ ] Monitor access logs
### Data Protection
- [ ] Regular backups of configuration
- [ ] Secure storage of API keys
- [ ] Encrypt sensitive data at rest
- [ ] Regular security updates
### Monitoring
- [ ] Enable security logging
- [ ] Monitor failed authentication attempts
- [ ] Set up alerts for suspicious activity
- [ ] Regular security audits
## API Security
### Authentication Headers
```bash
# Use API key in header
curl -H "X-API-Key: your-api-key" https://localhost:9101/health
# Or Bearer token
curl -H "Authorization: Bearer your-api-key" https://localhost:9101/health
```
### Rate Limits
- Default: 60 requests per minute
- Burst: 10 requests
- Per IP address
### IP Whitelisting
Configure in `config-homelab-secure.yaml`:
```yaml
security:
ip_whitelist:
- "127.0.0.1"
- "192.168.1.0/24" # Your local network
```
## Container Security
### Docker Security
- Use non-root users
- Minimal container images
- Resource limits
- Network segmentation
### Podman Security
- Rootless containers
- SELinux confinement
- Seccomp profiles
- Read-only filesystems where possible
## Troubleshooting
### Common Issues
**TLS Certificate Errors**
```bash
# Regenerate certificates
openssl req -x509 -newkey rsa:4096 -keyout ssl/key.pem -out ssl/cert.pem -days 365 -nodes \
-subj "/C=US/ST=Homelab/L=Local/O=FetchML/CN=localhost"
```
**API Key Authentication Failed**
```bash
# Check your API key
grep "ADMIN_API_KEY" .api-keys
# Verify hash matches config
echo -n "your-api-key" | sha256sum | cut -d' ' -f1
```
**IP Whitelist Blocking**
```bash
# Check your IP
curl -s https://api.ipify.org
# Add to whitelist in config
```
### Security Logs
Monitor these files:
- `logs/fetch_ml.log` - Application logs
- Caddy access logs (configure if enabled)
- Docker logs: `docker logs ml-experiments-api`
## Best Practices
1. **Regular Updates**: Keep dependencies updated
2. **Principle of Least Privilege**: Minimal required permissions
3. **Defense in Depth**: Multiple security layers
4. **Monitor and Alert**: Security monitoring
5. **Backup and Recovery**: Regular secure backups
## Emergency Procedures
### Compromised API Keys
1. Immediately revoke compromised keys
2. Generate new API keys
3. Update all clients
4. Review access logs
### Security Incident
1. Isolate affected systems
2. Preserve evidence
3. Review access logs
4. Update security measures
5. Document incident
## Support
For security issues:
- Check logs for error messages
- Review configuration files
- Test with minimal setup
- Report security vulnerabilities responsibly