fetch_ml/CHANGELOG.md
Jeremie Fraeys f357624685
docs: Update CHANGELOG and add feature documentation
Update documentation for new features:
- Add CHANGELOG entries for research features and privacy enhancements
- Update README with new CLI commands and security features
- Add privacy-security.md documentation for PII detection
- Add research-features.md for narrative and outcome tracking
2026-02-18 21:28:25 -05:00

3.3 KiB

[Unreleased]

Added - CSV Export Features (2026-02-18)

  • CLI: ml compare --csv - Export run comparisons as CSV with actual run IDs as column headers
  • CLI: ml find --csv - Export search results as CSV for spreadsheet analysis
  • CLI: ml dataset verify --csv - Export dataset verification metrics as CSV
  • Shell: Updated bash/zsh completions with --csv flags for compare, find commands

Added - Phase 3 Features (2026-02-18)

  • CLI: ml requeue --with-changes - Iterative experimentation with config overrides (--lr=0.002, etc.)
  • CLI: ml requeue --inherit-narrative - Copy hypothesis/context from parent run
  • CLI: ml requeue --inherit-config - Copy metadata from parent run
  • CLI: ml requeue --parent - Link as child run for provenance tracking
  • CLI: ml dataset verify - Fast dataset checksum validation
  • CLI: ml logs --follow - Real-time log streaming via WebSocket
  • API/WebSocket: Add opcodes for compare (0x30), find (0x31), export (0x32), set outcome (0x33)

Added - Phase 2 Features (2026-02-18)

  • CLI: ml compare - Diff two runs showing narrative/metadata/metrics differences
  • CLI: ml find - Search experiments by tags, outcome, dataset, experiment-group, author
  • CLI: ml export --anonymize - Export bundles with path/IP/username redaction
  • CLI: ml export --anonymize-level - 'metadata-only' or 'full' anonymization
  • CLI: ml outcome set - Post-run outcome tracking (validates/refutes/inconclusive/partial)
  • CLI: Error suggestions with Levenshtein distance for typos
  • Shell: Updated bash/zsh completions for all new commands
  • Tests: E2E tests for compare, find, export, requeue changes

Added - Phase 0 Features (2026-02-18)

  • CLI: Queue-time narrative flags (--hypothesis, --context, --intent, --expected-outcome, --experiment-group, --tags)
  • CLI: Enhanced ml status output with queue position [pos N] and priority (P:N)
  • CLI: ml narrative set command for setting run narrative fields
  • Shell: Updated completions with new commands and flags

Security

  • Native: fix buffer overflow vulnerabilities in dataset_hash (replaced strcpy with strncpy + null termination)
  • Native: fix unsafe memcpy in queue_index priority queue (added explicit null terminators for string fields)
  • Native: add path traversal protection in queue_index storage (rejects .. and null bytes in queue directory paths)
  • Native: add mmap size limits (100MB max) to prevent unbounded memory mapping exposure
  • Native: modularize C++ libraries with clean layering (common, queue_index, dataset_hash)

Added

  • API/WebSocket: add dataset handlers (list, register, info, search) with DB integration
  • API/WebSocket: add metrics persistence to handleLogMetric with websocket_metrics table
  • Storage: add db_metrics.go with RecordMetric, GetMetrics, GetMetricSummary methods
  • Tests: add payload parsing tests for WebSocket handlers

Changed

  • Config: replace panic() with error returns in smart_defaults.go for better error handling
  • Tests: move WebSocket handler tests to tests/unit/api/ws/

Fixed

  • Storage: remove duplicate db_datasets.go, consolidate with db_experiments.go

Deprecated

  • Config: ToTUIConfig() now returns (*Config, error) instead of *Config

Removed

  • Storage: deleted internal/storage/db_datasets.go (duplicate implementation)