fetch_ml/cli/README.md

# ML CLI

Fast CLI tool for managing ML experiments.

## Quick Start

```bash
# 1. Build
zig build

# 2. Setup configuration
./zig-out/bin/ml init

# 3. Run experiment
./zig-out/bin/ml sync ./my-experiment --queue
```

## Commands

- `ml init` - Setup configuration
- `ml sync <path>` - Sync project to server
- `ml queue <job1> [job2 ...] [--commit <id>] [--priority N] [--note <text>]` - Queue one or more jobs
- `ml status` - Check system/queue status for your API key
- `ml validate <commit_id> [--json] [--task <task_id>]` - Validate provenance + integrity for a commit or task (includes `run_manifest.json` consistency checks when validating by task)
- `ml info <path|id> [--json] [--base <path>]` - Show run info from `run_manifest.json` (by path or by scanning `finished/failed/running/pending`)
- `ml annotate <path|run_id|task_id> --note <text> [--author <name>] [--base <path>] [--json]` - Append a human annotation to `run_manifest.json`
- `ml narrative set <path|run_id|task_id> [--hypothesis <text>] [--context <text>] [--intent <text>] [--expected-outcome <text>] [--parent-run <id>] [--experiment-group <text>] [--tags <csv>] [--base <path>] [--json]` - Patch the `narrative` field in `run_manifest.json`
- `ml monitor` - Launch monitoring interface (TUI)
- `ml cancel <job>` - Cancel a running/queued job you own
- `ml prune --keep N` - Keep N recent experiments
- `ml watch <path>` - Auto-sync directory
- `ml experiment log|show|list|delete` - Manage experiments and metrics

Notes:

- `--json` mode is designed to be pipe-friendly: machine-readable JSON is emitted to stdout, while user-facing messages/errors go to stderr.
- When running `ml validate --task <task_id>`, the server will try to locate the job's `run_manifest.json` under the configured base path (pending/running/finished/failed) and cross-check key fields (task id, commit id, deps, snapshot).
- For tasks in `running`, `completed`, or `failed` state, a missing `run_manifest.json` is treated as a validation failure. For `queued` tasks, it is treated as a warning (the job may not have started yet).

### Experiment workflow (minimal)

- `ml sync ./my-experiment --queue`
  Syncs files, computes a unique commit ID for the directory, and queues a job.

- `ml queue my-job`
  Queues a job named `my-job`. If `--commit` is omitted, the CLI generates a random commit ID
  and records `(job_name, commit_id)` in `~/.ml/history.log` so you don't have to remember hashes.

- `ml queue my-job --note "baseline run; lr=1e-3"`
  Adds a human-readable note to the run; it will be persisted into the run's `run_manifest.json` (under `metadata.note`).

- `ml experiment list`
  Shows recent experiments from history with alias (job name) and commit ID.

- `ml experiment delete <alias|commit>`
  Cancels a running/queued experiment by job name, full commit ID, or short commit prefix.

## Configuration

Create `~/.ml/config.toml`:

```toml
worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"
```

## Install

```bash
# Install to system
make install

# Or copy binary manually
cp zig-out/bin/ml /usr/local/bin/
```

## Need Help?

- `ml --help` - Show command help
- `ml <command> --help` - Show command-specific help