- Add architecture, CI/CD, CLI reference documentation - Update installation, operations, and quick-start guides - Add Jupyter workflow and queue documentation - New landing page and research runner plan
14 lines
1.4 KiB
Markdown
14 lines
1.4 KiB
Markdown
## [Unreleased]
|
|
|
|
- Deployments: production now terminates TLS/WSS at Caddy (reverse proxy) and keeps the API server on internal HTTP/WS.
|
|
- Tests: add e2e coverage for `wss://` upgrade through a TLS-terminating reverse proxy.
|
|
- Worker: verify `dataset_specs[].checksum` when provided and fail tasks on mismatch.
|
|
- Worker: verify `snapshot_id` using `snapshot_sha256` and fail-closed (supports local `data_dir/snapshots/<snapshot_id>` and optional S3-backed `snapshot_store`).
|
|
- Worker: stage verified `snapshot_id` into each task workspace and expose it to training code via `FETCH_ML_SNAPSHOT_DIR`.
|
|
- Worker: provenance enforcement is trustworthiness-by-default (fail-closed) with `provenance_best_effort` opt-in.
|
|
- CLI/API: add `ml validate` to fetch a validation report (commit/task) for provenance + integrity checks.
|
|
- Worker: persist discovered artifacts into `run_manifest.json` (`artifacts.discovery_time`, `artifacts.files[]`, `artifacts.total_size_bytes`) at task completion.
|
|
- Worker: best-effort environment prewarm can build a warmed Podman image keyed by `deps_manifest_sha256` and reuse it for subsequent tasks.
|
|
- Worker: export env prewarm hit/miss/built counters and total build time via the worker Prometheus metrics endpoint.
|
|
- API/Worker: `ml prune` also triggers best-effort garbage collection of warmed env images.
|
|
- API: add `/health/ok` (when health checks are enabled) and wrap HTTP handlers with Prometheus HTTP request metrics when Prometheus is enabled.
|