docs: comprehensive documentation updates

- Add architecture, CI/CD, CLI reference documentation
- Update installation, operations, and quick-start guides
- Add Jupyter workflow and queue documentation
- New landing page and research runner plan

2026-02-12 12:05:27 -05:00

3.2 KiB

Raw Blame History

title	url
Validation (ml validate)	/validate/

Validation (`ml validate`)

The ml validate command verifies experiment integrity and provenance.

It can be run against:

A commit id (validates the experiment tree + dependency manifest)
A task id (additionally validates the run’s run_manifest.json provenance and lifecycle)

CLI usage

# Validate by commit
ml validate <commit_id> [--json] [--verbose]

# Validate by task
ml validate --task <task_id> [--json] [--verbose]

Output modes

Default (human): prints a summary with errors, warnings, and failed_checks.
--verbose: prints all checks under checks and includes expected/actual/details when present.
--json: prints the raw JSON payload.

Report shape

The API returns a JSON report of the form:

ok: overall boolean
commit_id: commit being validated (if known)
task_id: task being validated (when validating by task)
checks: map of check name → { ok, expected?, actual?, details? }
errors: list of high-level failures
warnings: list of non-fatal issues
ts: UTC timestamp

Check semantics

For task statuses running, completed, or failed, run-manifest issues are treated as errors.
For queued/pending tasks, run-manifest issues are usually warnings (the job may not have started yet).

Notable checks

Experiment integrity

experiment_manifest: validates the experiment manifest (content-addressed integrity)
deps_manifest: validates that a dependency manifest exists and can be hashed
expected_manifest_overall_sha: compares the task’s recorded manifest SHA to the current manifest SHA
expected_deps_manifest: compares the task’s recorded deps manifest name/SHA to what exists on disk

Run manifest provenance (task validation)

run_manifest: whether run_manifest.json could be found and loaded
run_manifest_location: verifies the manifest was found in the expected bucket:
- pending for queued/pending
- running for running
- finished for completed
- failed for failed
run_manifest_task_id: task id match
run_manifest_commit_id: commit id match
run_manifest_deps: deps manifest name/SHA match
run_manifest_snapshot_id: snapshot id match (when snapshot is part of the task)
run_manifest_snapshot_sha256: snapshot sha256 match (when snapshot sha is recorded)

Run manifest lifecycle (task validation)

run_manifest_lifecycle:
- running: must have started_at, must not have ended_at/exit_code
- completed/failed: must have started_at, ended_at, exit_code, and ended_at >= started_at
- queued/pending: must not have ended_at/exit_code

Example report (task validation)

{
  "ok": false,
  "commit_id": "6161616161616161616161616161616161616161",
  "task_id": "task-run-manifest-location-mismatch",
  "checks": {
    "experiment_manifest": {"ok": true},
    "deps_manifest": {"ok": true, "actual": "requirements.txt:..."},
    "run_manifest": {"ok": true},
    "run_manifest_location": {
      "ok": false,
      "expected": "running",
      "actual": "finished"
    }
  },
  "errors": [
    "run manifest location mismatch"
  ],
  "ts": "2025-12-17T18:43:00Z"
}

3.2 KiB Raw Blame History Unescape Escape

Validation (ml validate)

CLI usage

Output modes

Report shape

Check semantics

Notable checks

Experiment integrity

Run manifest provenance (task validation)

Run manifest lifecycle (task validation)

Example report (task validation)

3.2 KiB

Raw Blame History

Validation (`ml validate`)