fetch_ml/cli/README.md

77 lines
2.5 KiB
Markdown

# ML CLI
Fast CLI tool for managing ML experiments.
## Quick Start
```bash
# 1. Build
zig build
# 2. Setup configuration
./zig-out/bin/ml init
# 3. Run experiment
./zig-out/bin/ml sync ./my-experiment --queue
```
## Commands
- `ml init` - Setup configuration
- `ml sync <path>` - Sync project to server
- `ml queue <job1> [job2 ...] [--commit <id>] [--priority N]` - Queue one or more jobs
- `ml status` - Check system/queue status for your API key
- `ml validate <commit_id> [--json] [--task <task_id>]` - Validate provenance + integrity for a commit or task (includes `run_manifest.json` consistency checks when validating by task)
- `ml info <path|id> [--json] [--base <path>]` - Show run info from `run_manifest.json` (by path or by scanning `finished/failed/running/pending`)
- `ml monitor` - Launch monitoring interface (TUI)
- `ml cancel <job>` - Cancel a running/queued job you own
- `ml prune --keep N` - Keep N recent experiments
- `ml watch <path>` - Auto-sync directory
- `ml experiment log|show|list|delete` - Manage experiments and metrics
Notes:
- When running `ml validate --task <task_id>`, the server will try to locate the job's `run_manifest.json` under the configured base path (pending/running/finished/failed) and cross-check key fields (task id, commit id, deps, snapshot).
- For tasks in `running`, `completed`, or `failed` state, a missing `run_manifest.json` is treated as a validation failure. For `queued` tasks, it is treated as a warning (the job may not have started yet).
### Experiment workflow (minimal)
- `ml sync ./my-experiment --queue`
Syncs files, computes a unique commit ID for the directory, and queues a job.
- `ml queue my-job`
Queues a job named `my-job`. If `--commit` is omitted, the CLI generates a random commit ID
and records `(job_name, commit_id)` in `~/.ml/history.log` so you don't have to remember hashes.
- `ml experiment list`
Shows recent experiments from history with alias (job name) and commit ID.
- `ml experiment delete <alias|commit>`
Cancels a running/queued experiment by job name, full commit ID, or short commit prefix.
## Configuration
Create `~/.ml/config.toml`:
```toml
worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"
```
## Install
```bash
# Install to system
make install
# Or copy binary manually
cp zig-out/bin/ml /usr/local/bin/
```
## Need Help?
- `ml --help` - Show command help
- `ml <command> --help` - Show command-specific help