fetch_ml/docs/src/validate.md
Jeremie Fraeys 5144d291cb
docs: comprehensive documentation updates
- Add architecture, CI/CD, CLI reference documentation
- Update installation, operations, and quick-start guides
- Add Jupyter workflow and queue documentation
- New landing page and research runner plan
2026-02-12 12:05:27 -05:00

100 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Validation (ml validate)"
url: "/validate/"
---
# Validation (`ml validate`)
The `ml validate` command verifies experiment integrity and provenance.
It can be run against:
- A **commit id** (validates the experiment tree + dependency manifest)
- A **task id** (additionally validates the runs `run_manifest.json` provenance and lifecycle)
## CLI usage
```bash
# Validate by commit
ml validate <commit_id> [--json] [--verbose]
# Validate by task
ml validate --task <task_id> [--json] [--verbose]
```
### Output modes
- Default (human): prints a summary with `errors`, `warnings`, and `failed_checks`.
- `--verbose`: prints all checks under `checks` and includes `expected/actual/details` when present.
- `--json`: prints the raw JSON payload.
## Report shape
The API returns a JSON report of the form:
- `ok`: overall boolean
- `commit_id`: commit being validated (if known)
- `task_id`: task being validated (when validating by task)
- `checks`: map of check name → `{ ok, expected?, actual?, details? }`
- `errors`: list of high-level failures
- `warnings`: list of non-fatal issues
- `ts`: UTC timestamp
## Check semantics
- For **task statuses** `running`, `completed`, or `failed`, run-manifest issues are treated as **errors**.
- For **queued/pending** tasks, run-manifest issues are usually **warnings** (the job may not have started yet).
## Notable checks
### Experiment integrity
- `experiment_manifest`: validates the experiment manifest (content-addressed integrity)
- `deps_manifest`: validates that a dependency manifest exists and can be hashed
- `expected_manifest_overall_sha`: compares the tasks recorded manifest SHA to the current manifest SHA
- `expected_deps_manifest`: compares the tasks recorded deps manifest name/SHA to what exists on disk
### Run manifest provenance (task validation)
- `run_manifest`: whether `run_manifest.json` could be found and loaded
- `run_manifest_location`: verifies the manifest was found in the expected bucket:
- `pending` for queued/pending
- `running` for running
- `finished` for completed
- `failed` for failed
- `run_manifest_task_id`: task id match
- `run_manifest_commit_id`: commit id match
- `run_manifest_deps`: deps manifest name/SHA match
- `run_manifest_snapshot_id`: snapshot id match (when snapshot is part of the task)
- `run_manifest_snapshot_sha256`: snapshot sha256 match (when snapshot sha is recorded)
### Run manifest lifecycle (task validation)
- `run_manifest_lifecycle`:
- `running`: must have `started_at`, must not have `ended_at`/`exit_code`
- `completed`/`failed`: must have `started_at`, `ended_at`, `exit_code`, and `ended_at >= started_at`
- `queued`/`pending`: must not have `ended_at`/`exit_code`
## Example report (task validation)
```json
{
"ok": false,
"commit_id": "6161616161616161616161616161616161616161",
"task_id": "task-run-manifest-location-mismatch",
"checks": {
"experiment_manifest": {"ok": true},
"deps_manifest": {"ok": true, "actual": "requirements.txt:..."},
"run_manifest": {"ok": true},
"run_manifest_location": {
"ok": false,
"expected": "running",
"actual": "finished"
}
},
"errors": [
"run manifest location mismatch"
],
"ts": "2025-12-17T18:43:00Z"
}
```