diff --git a/docs/src/cli-reference.md b/docs/src/cli-reference.md index 0731b40..6d4ea43 100644 --- a/docs/src/cli-reference.md +++ b/docs/src/cli-reference.md @@ -37,6 +37,7 @@ High-performance command-line interface for experiment management, written in Zi | `validate` | Validate provenance/integrity for a commit or task | `ml validate --verbose` | | `info` | Show run info from `run_manifest.json` | `ml info ` | | `requeue` | Re-submit an existing run/commit with new args/resources | `ml requeue -- --epochs 20` | +| `logs` | Fetch and follow job logs | `ml logs job123 -n 100` | ### Command Details @@ -170,6 +171,46 @@ ml cancel running-job-id ``` Cancels currently running jobs by ID. +#### `logs` - Fetch and Follow Job Logs + +Retrieve logs from running or completed ML experiments. + +```bash +# Show full logs for a job +ml logs job123 + +# Show last 100 lines (tail) +ml logs job123 -n 100 +ml logs job123 --tail 100 + +# Follow logs in real-time (like tail -f) +ml logs job123 -f +ml logs job123 --follow + +# Combine tail and follow +ml logs job123 -n 50 -f +``` + +**Features:** +- WebSocket-based log streaming for real-time updates +- Works with both running and completed jobs +- Automatic reconnection on network issues +- Scrollable output with pagination support + +**Common Use Cases:** +```bash +# Check why a job failed +ml logs failed-job-abc123 + +# Monitor a running training job +ml logs training-job-xyz789 -f + +# Get recent errors only +ml logs job123 -n 20 | grep -i error +``` + +--- + #### `jupyter` - Jupyter Notebook Management Manage Jupyter notebook services via WebSocket protocol. @@ -219,8 +260,6 @@ ml jupyter list # View all running services ### Configuration -The Zig CLI reads configuration from `~/.ml/config.toml`: - ```toml worker_host = "worker.local" worker_user = "mluser"