fetch_ml/CHANGELOG.md

1.3 KiB

[Unreleased]

  • Deployments: production now terminates TLS/WSS at Caddy (reverse proxy) and keeps the API server on internal HTTP/WS.
  • Tests: add e2e coverage for wss:// upgrade through a TLS-terminating reverse proxy.
  • Worker: verify dataset_specs[].checksum when provided and fail tasks on mismatch.
  • Worker: verify snapshot_id using snapshot_sha256 and fail-closed (supports local data_dir/snapshots/<snapshot_id> and optional S3-backed snapshot_store).
  • Worker: stage verified snapshot_id into each task workspace and expose it to training code via FETCH_ML_SNAPSHOT_DIR.
  • Worker: provenance enforcement is trustworthiness-by-default (fail-closed) with provenance_best_effort opt-in.
  • CLI/API: add ml validate to fetch a validation report (commit/task) for provenance + integrity checks.
  • Worker: best-effort environment prewarm can build a warmed Podman image keyed by deps_manifest_sha256 and reuse it for subsequent tasks.
  • Worker: export env prewarm hit/miss/built counters and total build time via the worker Prometheus metrics endpoint.
  • API/Worker: ml prune also triggers best-effort garbage collection of warmed env images.
  • API: add /health/ok (when health checks are enabled) and wrap HTTP handlers with Prometheus HTTP request metrics when Prometheus is enabled.