fetch_ml/CHANGELOG.md
Jeremie Fraeys b00439b86e
docs(security): document comprehensive security hardening
Updates documentation with new security features and hardening guide:

**CHANGELOG.md:**
- Added detailed security hardening section (2026-02-23)
- Documents all phases: file ingestion, sandbox, secrets, audit logging, tests
- Lists specific files changed and security controls implemented

**docs/src/security.md:**
- Added Overview section with defense-in-depth layers
- Added Comprehensive Security Hardening section with:
  - File ingestion security with code examples
  - Sandbox hardening with complete YAML config
  - Secrets management with env expansion syntax
  - HIPAA audit logging with tamper-evident chain hashing
2026-02-23 18:03:25 -05:00

105 lines
6.6 KiB
Markdown

## [Unreleased]
### Security - Comprehensive Hardening (2026-02-23)
**File Ingestion Security (Phase 1):**
- `internal/fileutil/secure.go`: Added `SecurePathValidator` with symlink resolution and path boundary enforcement to prevent path traversal attacks
- `internal/fileutil/filetype.go`: New file with magic bytes validation for ML artifacts (safetensors, GGUF, HDF5, numpy)
- `internal/fileutil/filetype.go`: Dangerous extension blocking (.pt, .pkl, .pickle, .exe, .sh, .zip) to prevent pickle deserialization and executable injection
- `internal/worker/artifacts.go`: Integrated `SecurePathValidator` for artifact path validation
- `internal/worker/config.go`: Added upload limits to `SandboxConfig` (MaxUploadSizeBytes: 10GB, MaxUploadRateBps: 100MB/s, MaxUploadsPerMinute: 10)
**Sandbox Hardening (Phase 2):**
- `internal/worker/config.go`: Added `ApplySecurityDefaults()` with secure-by-default principle
- NetworkMode: "none" (was empty string)
- ReadOnlyRoot: true
- NoNewPrivileges: true
- DropAllCaps: true
- UserNS: true (user namespace)
- RunAsUID/RunAsGID: 1000 (non-root)
- SeccompProfile: "default-hardened"
- `internal/container/podman.go`: Added `PodmanSecurityConfig` struct and `BuildSecurityArgs()` function
- `internal/container/podman.go`: `BuildPodmanCommand` now accepts security config with full sandbox hardening
- `internal/worker/executor/container.go`: Container executor now passes `SandboxConfig` to Podman command builder
- `configs/seccomp/default-hardened.json`: New hardened seccomp profile blocking dangerous syscalls (ptrace, mount, reboot, kexec_load)
**Secrets Management (Phase 3):**
- `internal/worker/config.go`: Added `expandSecrets()` for environment variable expansion using `${VAR}` syntax
- `internal/worker/config.go`: Added `validateNoPlaintextSecrets()` with entropy-based detection and pattern matching
- `internal/worker/config.go`: Detects AWS keys (AKIA/ASIA), GitHub tokens (ghp_/gho_), GitLab (glpat-), OpenAI/Stripe (sk-)
- `internal/worker/config.go`: Shannon entropy calculation to detect high-entropy secrets (>4 bits/char)
- Secrets are expanded from environment during `LoadConfig()` before validation
**HIPAA-Compliant Audit Logging (Phase 5):**
- `internal/audit/audit.go`: Added tamper-evident chain hashing with SHA-256
- `internal/audit/audit.go`: New file access event types: `EventFileRead`, `EventFileWrite`, `EventFileDelete`
- `internal/audit/audit.go`: `Event` struct extended with `PrevHash`, `EventHash`, `SequenceNum` for integrity chain
- `internal/audit/audit.go`: Added `LogFileAccess()` helper for HIPAA file access logging
- `internal/audit/audit.go`: Added `VerifyChain()` function for tamper detection
**Security Testing (Phase 7):**
- `tests/unit/security/path_traversal_test.go`: 3 tests for `SecurePathValidator` including symlink escape prevention
- `tests/unit/security/filetype_test.go`: 3 tests for magic bytes validation and dangerous extension detection
- `tests/unit/security/secrets_test.go`: 3 tests for env expansion and plaintext secret detection with entropy validation
- `tests/unit/security/audit_test.go`: 4 tests for audit logger chain integrity and file access logging
**Supporting Changes:**
- `internal/storage/db_jobs.go`: Added `DeleteJob()` and `DeleteJobsByPrefix()` methods
- `tests/benchmarks/payload_performance_test.go`: Updated to use `DeleteJob()` for proper test isolation
### Added - CSV Export Features (2026-02-18)
- CLI: `ml compare --csv` - Export run comparisons as CSV with actual run IDs as column headers
- CLI: `ml find --csv` - Export search results as CSV for spreadsheet analysis
- CLI: `ml dataset verify --csv` - Export dataset verification metrics as CSV
- Shell: Updated bash/zsh completions with --csv flags for compare, find commands
### Added - Phase 3 Features (2026-02-18)
- CLI: `ml requeue --with-changes` - Iterative experimentation with config overrides (--lr=0.002, etc.)
- CLI: `ml requeue --inherit-narrative` - Copy hypothesis/context from parent run
- CLI: `ml requeue --inherit-config` - Copy metadata from parent run
- CLI: `ml requeue --parent` - Link as child run for provenance tracking
- CLI: `ml dataset verify` - Fast dataset checksum validation
- CLI: `ml logs --follow` - Real-time log streaming via WebSocket
- API/WebSocket: Add opcodes for compare (0x30), find (0x31), export (0x32), set outcome (0x33)
### Added - Phase 2 Features (2026-02-18)
- CLI: `ml compare` - Diff two runs showing narrative/metadata/metrics differences
- CLI: `ml find` - Search experiments by tags, outcome, dataset, experiment-group, author
- CLI: `ml export --anonymize` - Export bundles with path/IP/username redaction
- CLI: `ml export --anonymize-level` - 'metadata-only' or 'full' anonymization
- CLI: `ml outcome set` - Post-run outcome tracking (validates/refutes/inconclusive/partial)
- CLI: Error suggestions with Levenshtein distance for typos
- Shell: Updated bash/zsh completions for all new commands
- Tests: E2E tests for compare, find, export, requeue changes
### Added - Phase 0 Features (2026-02-18)
- CLI: Queue-time narrative flags (--hypothesis, --context, --intent, --expected-outcome, --experiment-group, --tags)
- CLI: Enhanced `ml status` output with queue position [pos N] and priority (P:N)
- CLI: `ml narrative set` command for setting run narrative fields
- Shell: Updated completions with new commands and flags
### Security
- Native: fix buffer overflow vulnerabilities in `dataset_hash` (replaced `strcpy` with `strncpy` + null termination)
- Native: fix unsafe `memcpy` in `queue_index` priority queue (added explicit null terminators for string fields)
- Native: add path traversal protection in `queue_index` storage (rejects `..` and null bytes in queue directory paths)
- Native: add mmap size limits (100MB max) to prevent unbounded memory mapping exposure
- Native: modularize C++ libraries with clean layering (common, queue_index, dataset_hash)
### Added
- API/WebSocket: add dataset handlers (list, register, info, search) with DB integration
- API/WebSocket: add metrics persistence to `handleLogMetric` with `websocket_metrics` table
- Storage: add `db_metrics.go` with `RecordMetric`, `GetMetrics`, `GetMetricSummary` methods
- Tests: add payload parsing tests for WebSocket handlers
### Changed
- Config: replace `panic()` with error returns in `smart_defaults.go` for better error handling
- Tests: move WebSocket handler tests to `tests/unit/api/ws/`
### Fixed
- Storage: remove duplicate `db_datasets.go`, consolidate with `db_experiments.go`
### Deprecated
- Config: `ToTUIConfig()` now returns `(*Config, error)` instead of `*Config`
### Removed
- Storage: deleted `internal/storage/db_datasets.go` (duplicate implementation)