- Add architecture, CI/CD, CLI reference documentation - Update installation, operations, and quick-start guides - Add Jupyter workflow and queue documentation - New landing page and research runner plan
3.2 KiB
3.2 KiB
| title | url |
|---|---|
| Validation (ml validate) | /validate/ |
Validation (ml validate)
The ml validate command verifies experiment integrity and provenance.
It can be run against:
- A commit id (validates the experiment tree + dependency manifest)
- A task id (additionally validates the run’s
run_manifest.jsonprovenance and lifecycle)
CLI usage
# Validate by commit
ml validate <commit_id> [--json] [--verbose]
# Validate by task
ml validate --task <task_id> [--json] [--verbose]
Output modes
- Default (human): prints a summary with
errors,warnings, andfailed_checks. --verbose: prints all checks underchecksand includesexpected/actual/detailswhen present.--json: prints the raw JSON payload.
Report shape
The API returns a JSON report of the form:
ok: overall booleancommit_id: commit being validated (if known)task_id: task being validated (when validating by task)checks: map of check name →{ ok, expected?, actual?, details? }errors: list of high-level failureswarnings: list of non-fatal issuests: UTC timestamp
Check semantics
- For task statuses
running,completed, orfailed, run-manifest issues are treated as errors. - For queued/pending tasks, run-manifest issues are usually warnings (the job may not have started yet).
Notable checks
Experiment integrity
experiment_manifest: validates the experiment manifest (content-addressed integrity)deps_manifest: validates that a dependency manifest exists and can be hashedexpected_manifest_overall_sha: compares the task’s recorded manifest SHA to the current manifest SHAexpected_deps_manifest: compares the task’s recorded deps manifest name/SHA to what exists on disk
Run manifest provenance (task validation)
run_manifest: whetherrun_manifest.jsoncould be found and loadedrun_manifest_location: verifies the manifest was found in the expected bucket:pendingfor queued/pendingrunningfor runningfinishedfor completedfailedfor failed
run_manifest_task_id: task id matchrun_manifest_commit_id: commit id matchrun_manifest_deps: deps manifest name/SHA matchrun_manifest_snapshot_id: snapshot id match (when snapshot is part of the task)run_manifest_snapshot_sha256: snapshot sha256 match (when snapshot sha is recorded)
Run manifest lifecycle (task validation)
run_manifest_lifecycle:running: must havestarted_at, must not haveended_at/exit_codecompleted/failed: must havestarted_at,ended_at,exit_code, andended_at >= started_atqueued/pending: must not haveended_at/exit_code
Example report (task validation)
{
"ok": false,
"commit_id": "6161616161616161616161616161616161616161",
"task_id": "task-run-manifest-location-mismatch",
"checks": {
"experiment_manifest": {"ok": true},
"deps_manifest": {"ok": true, "actual": "requirements.txt:..."},
"run_manifest": {"ok": true},
"run_manifest_location": {
"ok": false,
"expected": "running",
"actual": "finished"
}
},
"errors": [
"run manifest location mismatch"
],
"ts": "2025-12-17T18:43:00Z"
}