fetch_ml/docs/src/validate.md
Jeremie Fraeys 5144d291cb
docs: comprehensive documentation updates
- Add architecture, CI/CD, CLI reference documentation
- Update installation, operations, and quick-start guides
- Add Jupyter workflow and queue documentation
- New landing page and research runner plan
2026-02-12 12:05:27 -05:00

3.2 KiB
Raw Blame History

title url
Validation (ml validate) /validate/

Validation (ml validate)

The ml validate command verifies experiment integrity and provenance.

It can be run against:

  • A commit id (validates the experiment tree + dependency manifest)
  • A task id (additionally validates the runs run_manifest.json provenance and lifecycle)

CLI usage

# Validate by commit
ml validate <commit_id> [--json] [--verbose]

# Validate by task
ml validate --task <task_id> [--json] [--verbose]

Output modes

  • Default (human): prints a summary with errors, warnings, and failed_checks.
  • --verbose: prints all checks under checks and includes expected/actual/details when present.
  • --json: prints the raw JSON payload.

Report shape

The API returns a JSON report of the form:

  • ok: overall boolean
  • commit_id: commit being validated (if known)
  • task_id: task being validated (when validating by task)
  • checks: map of check name → { ok, expected?, actual?, details? }
  • errors: list of high-level failures
  • warnings: list of non-fatal issues
  • ts: UTC timestamp

Check semantics

  • For task statuses running, completed, or failed, run-manifest issues are treated as errors.
  • For queued/pending tasks, run-manifest issues are usually warnings (the job may not have started yet).

Notable checks

Experiment integrity

  • experiment_manifest: validates the experiment manifest (content-addressed integrity)
  • deps_manifest: validates that a dependency manifest exists and can be hashed
  • expected_manifest_overall_sha: compares the tasks recorded manifest SHA to the current manifest SHA
  • expected_deps_manifest: compares the tasks recorded deps manifest name/SHA to what exists on disk

Run manifest provenance (task validation)

  • run_manifest: whether run_manifest.json could be found and loaded
  • run_manifest_location: verifies the manifest was found in the expected bucket:
    • pending for queued/pending
    • running for running
    • finished for completed
    • failed for failed
  • run_manifest_task_id: task id match
  • run_manifest_commit_id: commit id match
  • run_manifest_deps: deps manifest name/SHA match
  • run_manifest_snapshot_id: snapshot id match (when snapshot is part of the task)
  • run_manifest_snapshot_sha256: snapshot sha256 match (when snapshot sha is recorded)

Run manifest lifecycle (task validation)

  • run_manifest_lifecycle:
    • running: must have started_at, must not have ended_at/exit_code
    • completed/failed: must have started_at, ended_at, exit_code, and ended_at >= started_at
    • queued/pending: must not have ended_at/exit_code

Example report (task validation)

{
  "ok": false,
  "commit_id": "6161616161616161616161616161616161616161",
  "task_id": "task-run-manifest-location-mismatch",
  "checks": {
    "experiment_manifest": {"ok": true},
    "deps_manifest": {"ok": true, "actual": "requirements.txt:..."},
    "run_manifest": {"ok": true},
    "run_manifest_location": {
      "ok": false,
      "expected": "running",
      "actual": "finished"
    }
  },
  "errors": [
    "run manifest location mismatch"
  ],
  "ts": "2025-12-17T18:43:00Z"
}