fetch_ml/CHANGELOG.md
Jeremie Fraeys 355d2e311a
docs: update README and CHANGELOG
- Update project documentation with latest features
- Update manage-artifacts.sh script
2026-02-16 20:38:57 -05:00

2 KiB

[Unreleased]

Security

  • Native: fix buffer overflow vulnerabilities in dataset_hash (replaced strcpy with strncpy + null termination)
  • Native: fix unsafe memcpy in queue_index priority queue (added explicit null terminators for string fields)
  • Native: add path traversal protection in queue_index storage (rejects .. and null bytes in queue directory paths)
  • Native: add mmap size limits (100MB max) to prevent unbounded memory mapping exposure
  • Native: modularize C++ libraries with clean layering (common, queue_index, dataset_hash)

Added

  • Tests: add e2e coverage for wss:// upgrade through a TLS-terminating reverse proxy.
  • Worker: verify dataset_specs[].checksum when provided and fail tasks on mismatch.
  • Worker: verify snapshot_id using snapshot_sha256 and fail-closed (supports local data_dir/snapshots/<snapshot_id> and optional S3-backed snapshot_store).
  • Worker: stage verified snapshot_id into each task workspace and expose it to training code via FETCH_ML_SNAPSHOT_DIR.
  • Worker: provenance enforcement is trustworthiness-by-default (fail-closed) with provenance_best_effort opt-in.
  • CLI/API: add ml validate to fetch a validation report (commit/task) for provenance + integrity checks.
  • Worker: persist discovered artifacts into run_manifest.json (artifacts.discovery_time, artifacts.files[], artifacts.total_size_bytes) at task completion.
  • Worker: best-effort environment prewarm can build a warmed Podman image keyed by deps_manifest_sha256 and reuse it for subsequent tasks.
  • Worker: export env prewarm hit/miss/built counters and total build time via the worker Prometheus metrics endpoint.
  • API/Worker: ml prune also triggers best-effort garbage collection of warmed env images.
  • API: add /health/ok (when health checks are enabled) and wrap HTTP handlers with Prometheus HTTP request metrics when Prometheus is enabled.
  • CLI/API: add ml logs command to fetch and follow job logs from running or completed experiments via WebSocket.