docs: update changelog, readme and CLI reference for 0.1.0
- Add 0.1.0 release entry to CHANGELOG.md with CLI and C++ native libs highlights - Update README.md with current project status - Sync CLI reference documentation with recent command changes
This commit is contained in:
parent
e3dad32029
commit
8b83d60452
3 changed files with 114 additions and 93 deletions
18
CHANGELOG.md
18
CHANGELOG.md
|
|
@ -1,5 +1,23 @@
|
|||
## [Unreleased]
|
||||
|
||||
### Added - CLI Connection Handling & Execution Mode (2026-03-05)
|
||||
- CLI: Standardized `ConnectionContext` with `isConnected()` method across all commands
|
||||
- CLI: Unified `ml info` command - queries remote server when connected, falls back to local manifests
|
||||
- CLI: Added `execution_mode` config setting (`local`/`remote`/`auto`) for persistent execution preference
|
||||
- CLI: Added `--mode` flag to `ml init` for setting execution mode during initialization
|
||||
- CLI: Added `--rerun` flag to `run` command for re-running previous jobs with full config inheritance (dataset, snapshot, resources, narrative)
|
||||
- CLI: Added `--inherit-narrative` and `--inherit-config` flags for provenance tracking
|
||||
- WebSocket: Added `query_run_info` opcode (0x28) for remote run info queries
|
||||
- Server: Added `handleQueryRunInfo` handler to serve run details via WebSocket
|
||||
|
||||
### Changed - CLI Command Behavior (2026-03-05)
|
||||
- CLI: Removed `--local` and `--remote` flags from `ml run`, `ml exec`, and `ml info` commands
|
||||
- CLI: Execution mode now determined by config setting only (no per-command flags needed)
|
||||
- CLI: Removed obsolete `printUsage` from `exec/mod.zig` (now internal module used by `ml run`)
|
||||
- CLI: Removed `queue` and `requeue` commands - functionality merged into unified `run` command
|
||||
- CLI: Renamed `exec/` directory to `executor/` (execution logic for local/remote)
|
||||
- CLI: Renamed `queue/` directory to `submission/` (job submission logic for `run` command)
|
||||
|
||||
### Security - Comprehensive Hardening (2026-02-23)
|
||||
|
||||
**Test Coverage Implementation (Phase 8):**
|
||||
|
|
|
|||
|
|
@ -104,6 +104,7 @@ worker_user = "dev_user"
|
|||
worker_base = "/tmp/ml-experiments"
|
||||
worker_port = 22
|
||||
api_key = "your-api-key"
|
||||
execution_mode = "auto" # local | remote | auto
|
||||
EOF
|
||||
|
||||
# Core commands
|
||||
|
|
@ -167,8 +168,9 @@ make cross-platform # builds for Linux/macOS/Windows
|
|||
## Docs
|
||||
|
||||
See `docs/` for detailed guides:
|
||||
- `docs/src/cli-reference.md` – CLI reference (execution mode, commands, options)
|
||||
- `docs/src/native-libraries.md` – Native C++ libraries (build, test, deploy)
|
||||
- `docs/src/zig-cli.md` – CLI reference
|
||||
- `docs/src/zig-cli.md` – CLI architecture
|
||||
- `docs/src/quick-start.md` – Full setup guide
|
||||
- `docs/src/deployment.md` – Production deployment
|
||||
- `docs/src/research-features.md` – Research workflow features (narrative capture, outcomes, search)
|
||||
|
|
|
|||
|
|
@ -26,23 +26,16 @@ High-performance command-line interface for experiment management, written in Zi
|
|||
| Command | Description | Example |
|
||||
|---------|-------------|----------|
|
||||
| `init` | Interactive configuration setup | `ml init` |
|
||||
| `sync` | Sync project to worker with deduplication | `ml sync ./project --name myjob --queue` |
|
||||
| `queue` | Queue job for execution | `ml queue myjob --commit abc123 --priority 8` |
|
||||
| `sync` | Sync project to worker with deduplication | `ml sync ./project --name myjob` |
|
||||
| `run` | Execute job locally or remotely (respects execution_mode) | `ml run myjob --commit abc123` |
|
||||
| `status` | Get system and worker status | `ml status` |
|
||||
| `monitor` | Launch TUI monitoring via SSH | `ml monitor` |
|
||||
| `cancel` | Cancel running job | `ml cancel job123` |
|
||||
| `prune` | Clean up old experiments | `ml prune --keep 10` |
|
||||
| `watch` | Auto-sync directory on changes | `ml watch ./project --queue` |
|
||||
| `watch` | Auto-sync directory on changes | `ml watch ./project` |
|
||||
| `jupyter` | Manage Jupyter notebook services | `ml jupyter start --name my-nb` |
|
||||
| `validate` | Validate provenance/integrity for a commit or task | `ml validate <commit_id> --verbose` |
|
||||
| `info` | Show run info from `run_manifest.json` | `ml info <run_dir>` |
|
||||
| `requeue` | Re-submit an existing run/commit with new args/resources | `ml requeue <commit_id|run_id|task_id|path> -- --epochs 20` |
|
||||
| `logs` | Fetch and follow job logs | `ml logs job123 -n 100` |
|
||||
| `compare` | Diff two runs showing narrative/metadata/metrics | `ml compare run1 run2` |
|
||||
| `find` | Search experiments by tags/outcome/dataset | `ml find --outcome validated --tag bert` |
|
||||
| `export` | Export run bundles with optional anonymization | `ml export run123 --anonymize` |
|
||||
| `outcome` | Set post-run outcome tracking | `ml outcome set run123 --outcome validated` |
|
||||
| `narrative` | Set run narrative fields | `ml narrative set run123 --hypothesis "LR matters"` |
|
||||
| `dataset` | Dataset operations (verify, list, etc.) | `ml dataset verify ./data` |
|
||||
|
||||
### Command Details
|
||||
|
|
@ -55,14 +48,93 @@ Creates a configuration template at `~/.ml/config.toml` with:
|
|||
- Worker connection details
|
||||
- API authentication
|
||||
- Base paths and ports
|
||||
- Execution mode preference
|
||||
|
||||
**Execution Mode Options:**
|
||||
```bash
|
||||
# Set default execution mode during init
|
||||
ml init --mode=local # Always execute locally
|
||||
ml init --mode=remote # Always execute on remote server
|
||||
ml init --mode=auto # Auto-detect based on server connectivity (default)
|
||||
```
|
||||
|
||||
The `execution_mode` setting controls whether commands like `ml run`, `ml exec`, and `ml info` operate locally or remotely. When set to:
|
||||
- `local`: Commands always execute locally using SQLite storage
|
||||
- `remote`: Commands always attempt remote server execution
|
||||
- `auto`: Commands check server connectivity and fallback to local if unavailable
|
||||
|
||||
**Modifying Execution Mode:**
|
||||
Edit `~/.ml/config.toml` directly:
|
||||
```toml
|
||||
execution_mode = "local" # Options: local, remote, auto
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### `run` - Execute Job (Local or Remote)
|
||||
|
||||
Execute a job either locally or on the remote server based on `execution_mode` configuration.
|
||||
|
||||
```bash
|
||||
# Run with commit ID (uses execution_mode from config)
|
||||
ml run my-job --commit abc123def456
|
||||
|
||||
# Run with priority
|
||||
ml run my-job --commit abc123 --priority 8
|
||||
|
||||
# Run with resource requirements
|
||||
ml run my-job --commit abc123 --cpu 4 --memory 16 --gpu 1
|
||||
|
||||
# Run with runner args
|
||||
ml run my-job --commit abc123 -- --epochs 5 --lr 1e-3
|
||||
```
|
||||
|
||||
**Execution Behavior:**
|
||||
- `execution_mode = "local"`: Runs job locally using SQLite storage
|
||||
- `execution_mode = "remote"`: Submits job to remote server via WebSocket
|
||||
- `execution_mode = "auto"`: Connects to server if available, otherwise runs locally
|
||||
|
||||
**Features:**
|
||||
- Respects `execution_mode` from `~/.ml/config.toml`
|
||||
- Same interface regardless of local or remote execution
|
||||
- WebSocket-based communication for remote execution
|
||||
- SQLite-based tracking for local execution
|
||||
|
||||
**Re-running Previous Runs:**
|
||||
```bash
|
||||
# Re-run with ABSOLUTE REPRODUCIBILITY (inherits everything by default)
|
||||
ml run my-job --rerun run-id-abc123
|
||||
|
||||
# Re-run with custom overrides (inherit all, then modify)
|
||||
ml run my-job --rerun run-id-abc123 --cpu 8 --memory 32
|
||||
|
||||
# Selective inheritance (only inherit specific aspects)
|
||||
ml run my-job --rerun run-id-abc123 --inherit-config # Only resources/dataset
|
||||
ml run my-job --rerun run-id-abc123 --inherit-narrative # Only hypothesis/context/tags
|
||||
|
||||
# Full provenance chain - inherit everything and link as child run
|
||||
ml run my-job --rerun run-id-abc123 --parent
|
||||
```
|
||||
|
||||
**Default Behavior (Research-First Reproducibility):**
|
||||
- `--rerun` alone inherits **everything** for absolute reproducibility
|
||||
- Explicit `--inherit-*` flags opt-in to partial inheritance
|
||||
- User-specified flags always override inherited values
|
||||
|
||||
**Inherited Fields:**
|
||||
- Narrative: hypothesis, context, intent, expected_outcome, tags
|
||||
- Config: cpu, memory, gpu, gpu_memory, priority, commit_id, snapshot_id, snapshot_sha256
|
||||
- Provenance: parent_run_id links runs for full lineage tracking
|
||||
|
||||
---
|
||||
|
||||
#### `sync` - Project Synchronization
|
||||
```bash
|
||||
# Basic sync
|
||||
ml sync ./my-project
|
||||
|
||||
# Sync with custom name and queue
|
||||
ml sync ./my-project --name "experiment-1" --queue
|
||||
# Sync with custom name
|
||||
ml sync ./my-project --name "experiment-1"
|
||||
|
||||
# Sync with priority
|
||||
ml sync ./my-project --priority 9
|
||||
|
|
@ -72,91 +144,14 @@ ml sync ./my-project --priority 9
|
|||
- Content-addressed storage for deduplication
|
||||
- SHA256 commit ID generation
|
||||
- Rsync-based file transfer
|
||||
- Automatic queuing (with `--queue` flag)
|
||||
|
||||
#### `queue` - Job Management
|
||||
```bash
|
||||
# Queue with commit ID
|
||||
ml queue my-job --commit abc123def456
|
||||
|
||||
# Queue with commit ID prefix (>=7 hex chars; must be unique)
|
||||
ml queue my-job --commit abc123 --priority 8
|
||||
|
||||
# Queue with extra runner args (stored as task.Args)
|
||||
ml queue my-job --commit abc123 -- --epochs 5 --lr 1e-3
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- WebSocket-based communication
|
||||
- Priority queuing system
|
||||
- API key authentication
|
||||
|
||||
**Notes:**
|
||||
- `--priority` is passed to the server as a single byte (0-255).
|
||||
- Args are sent via a dedicated queue opcode and become `task.Args` on the worker.
|
||||
- `--commit` may be a full 40-hex commit id or a unique prefix (>=7 hex chars) resolvable under `worker_base`.
|
||||
|
||||
#### `requeue` - Re-submit a Previous Run
|
||||
|
||||
Re-submit an existing run with optional config changes, resource overrides, and provenance tracking.
|
||||
|
||||
```bash
|
||||
# Basic requeue
|
||||
ml requeue <commit_id> -- --epochs 20
|
||||
|
||||
# Requeue by commit_id prefix (>=7 hex chars; must be unique)
|
||||
ml requeue <commit_prefix> -- --epochs 20
|
||||
|
||||
# Requeue by run_id/task_id (CLI scans run_manifest.json under worker_base)
|
||||
ml requeue <run_id> -- --epochs 20
|
||||
|
||||
# Requeue by a run directory or run_manifest.json path
|
||||
ml requeue /data/ml-experiments/finished/<run_id> -- --epochs 20
|
||||
|
||||
# Override priority/resources on requeue
|
||||
ml requeue <task_id> --priority 10 --gpu 1 -- --epochs 20
|
||||
|
||||
# Iterative experimentation with config changes
|
||||
ml requeue run-id-123 --with-changes --lr=0.002 --batch-size=64
|
||||
|
||||
# Inherit narrative from parent run
|
||||
ml requeue run-id-123 --inherit-narrative -- --epochs 50
|
||||
|
||||
# Inherit config/metadata from parent
|
||||
ml requeue run-id-123 --inherit-config --with-changes --lr=0.001
|
||||
|
||||
# Link as child run for provenance tracking
|
||||
ml requeue run-id-123 --parent -- --epochs 100
|
||||
|
||||
# Full provenance chain example
|
||||
ml requeue run-id-123 --parent --inherit-narrative --inherit-config \
|
||||
--with-changes --lr=0.0001 -- --epochs 200
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Locates `run_manifest.json`
|
||||
- Extracts `commit_id`
|
||||
- Submits a new queue request using that `commit_id` with optional overridden args/resources
|
||||
|
||||
**Flags:**
|
||||
- `--with-changes`: Override specific config values (key=value pairs)
|
||||
- `--inherit-narrative`: Copy hypothesis/context/intent from parent run
|
||||
- `--inherit-config`: Copy metadata/tags from parent run
|
||||
- `--parent`: Link as child run for provenance tracking
|
||||
|
||||
**Notes:**
|
||||
- `--priority` is passed to the server as a single byte (0-255).
|
||||
- Args are sent via a dedicated queue opcode and become `task.Args` on the worker.
|
||||
- `--commit` may be a full 40-hex commit id or a unique prefix (>=7 hex chars) resolvable under `worker_base`.
|
||||
- Tasks support optional `snapshot_id` and `dataset_specs` fields server-side (for provenance and dataset resolution).
|
||||
|
||||
#### `watch` - Auto-Sync Monitoring
|
||||
```bash
|
||||
# Watch directory for changes
|
||||
ml watch ./project
|
||||
|
||||
# Watch and auto-queue on changes
|
||||
ml watch ./project --name "dev-exp" --queue
|
||||
# Watch with custom name
|
||||
ml watch ./project --name "dev-exp"
|
||||
```
|
||||
|
||||
**Features:**
|
||||
|
|
@ -379,7 +374,7 @@ ml narrative set run-id-123 --experiment-group lr-sweep-2024
|
|||
ml narrative set run-id-123 --tags "bert,training,optimizer"
|
||||
```
|
||||
|
||||
**Note:** These fields can also be set at queue time using flags like `--hypothesis`, `--context`, etc.
|
||||
**Note:** These fields can also be set at run time using flags like `--hypothesis`, `--context`, etc.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -463,8 +458,14 @@ worker_user = "mluser"
|
|||
worker_base = "/data/ml-experiments"
|
||||
worker_port = 22
|
||||
api_key = "your-api-key"
|
||||
execution_mode = "auto" # local | remote | auto
|
||||
```
|
||||
|
||||
**Execution Mode:**
|
||||
- `local`: Always execute locally using SQLite storage
|
||||
- `remote`: Always attempt remote server execution
|
||||
- `auto`: Check server connectivity and fallback to local if unavailable (default)
|
||||
|
||||
### Performance Features
|
||||
|
||||
- **Content-Addressed Storage**: Automatic deduplication of identical files
|
||||
|
|
|
|||
Loading…
Reference in a new issue