--- title: "CLI Reference" url: "/cli-reference/" weight: 2 --- # Fetch ML CLI Reference Comprehensive command-line tools for managing ML experiments in your homelab with Zig-based high-performance CLI. ## Overview Fetch ML provides a comprehensive CLI toolkit built with performance and security in mind: - **Zig CLI** - High-performance experiment management written in Zig - **Go Commands** - API server, TUI, and data management utilities - **Management Scripts** - Service orchestration and deployment - **Setup Scripts** - One-command installation and configuration ## Zig CLI (`./cli/zig-out/bin/ml`) High-performance command-line interface for experiment management, written in Zig for speed and efficiency. ### Available Commands | Command | Description | Example | |---------|-------------|----------| | `init` | Interactive configuration setup | `ml init` | | `sync` | Sync project to worker with deduplication | `ml sync ./project --name myjob --queue` | | `queue` | Queue job for execution | `ml queue myjob --commit abc123 --priority 8` | | `status` | Get system and worker status | `ml status` | | `monitor` | Launch TUI monitoring via SSH | `ml monitor` | | `cancel` | Cancel running job | `ml cancel job123` | | `prune` | Clean up old experiments | `ml prune --keep 10` | | `watch` | Auto-sync directory on changes | `ml watch ./project --queue` | | `jupyter` | Manage Jupyter notebook services | `ml jupyter start --name my-nb` | | `validate` | Validate provenance/integrity for a commit or task | `ml validate --verbose` | | `info` | Show run info from `run_manifest.json` | `ml info ` | | `requeue` | Re-submit an existing run/commit with new args/resources | `ml requeue -- --epochs 20` | | `logs` | Fetch and follow job logs | `ml logs job123 -n 100` | ### Command Details #### `init` - Configuration Setup ```bash ml init ``` Creates a configuration template at `~/.ml/config.toml` with: - Worker connection details - API authentication - Base paths and ports #### `sync` - Project Synchronization ```bash # Basic sync ml sync ./my-project # Sync with custom name and queue ml sync ./my-project --name "experiment-1" --queue # Sync with priority ml sync ./my-project --priority 9 ``` **Features:** - Content-addressed storage for deduplication - SHA256 commit ID generation - Rsync-based file transfer - Automatic queuing (with `--queue` flag) #### `queue` - Job Management ```bash # Queue with commit ID ml queue my-job --commit abc123def456 # Queue with commit ID prefix (>=7 hex chars; must be unique) ml queue my-job --commit abc123 --priority 8 # Queue with extra runner args (stored as task.Args) ml queue my-job --commit abc123 -- --epochs 5 --lr 1e-3 ``` **Features:** - WebSocket-based communication - Priority queuing system - API key authentication **Notes:** - `--priority` is passed to the server as a single byte (0-255). - Args are sent via a dedicated queue opcode and become `task.Args` on the worker. - `--commit` may be a full 40-hex commit id or a unique prefix (>=7 hex chars) resolvable under `worker_base`. #### `requeue` - Re-submit a Previous Run ```bash # Requeue directly by commit_id ml requeue -- --epochs 20 # Requeue by commit_id prefix (>=7 hex chars; must be unique) ml requeue -- --epochs 20 # Requeue by run_id/task_id (CLI scans run_manifest.json under worker_base) ml requeue -- --epochs 20 # Requeue by a run directory or run_manifest.json path ml requeue /data/ml-experiments/finished/ -- --epochs 20 # Override priority/resources on requeue ml requeue --priority 10 --gpu 1 -- --epochs 20 ``` **What it does:** - Locates `run_manifest.json` - Extracts `commit_id` - Submits a new queue request using that `commit_id` with optional overridden args/resources **Notes:** - Tasks support optional `snapshot_id` and `dataset_specs` fields server-side (for provenance and dataset resolution). #### `watch` - Auto-Sync Monitoring ```bash # Watch directory for changes ml watch ./project # Watch and auto-queue on changes ml watch ./project --name "dev-exp" --queue ``` **Features:** - Real-time file system monitoring - Automatic re-sync on changes - Configurable polling interval (2 seconds) - Commit ID comparison for efficiency #### `prune` - Cleanup Management ```bash # Keep last N experiments ml prune --keep 20 # Remove experiments older than N days ml prune --older-than 30 ``` #### `monitor` - Remote Monitoring ```bash ml monitor ``` Launches TUI interface via SSH for real-time monitoring. #### `status` - System Status `ml status --json` returns a JSON object including an optional `prewarm` field when worker prewarming is active: ```json { "prewarm": [ { "worker_id": "worker-1", "task_id": "", "started_at": "2025-01-01T00:00:00Z", "updated_at": "2025-01-01T00:00:05Z", "phase": "datasets", "dataset_count": 2 } ] } ``` #### `cancel` - Job Cancellation ```bash ml cancel running-job-id ``` Cancels currently running jobs by ID. #### `logs` - Fetch and Follow Job Logs Retrieve logs from running or completed ML experiments. ```bash # Show full logs for a job ml logs job123 # Show last 100 lines (tail) ml logs job123 -n 100 ml logs job123 --tail 100 # Follow logs in real-time (like tail -f) ml logs job123 -f ml logs job123 --follow # Combine tail and follow ml logs job123 -n 50 -f ``` **Features:** - WebSocket-based log streaming for real-time updates - Works with both running and completed jobs - Automatic reconnection on network issues - Scrollable output with pagination support **Common Use Cases:** ```bash # Check why a job failed ml logs failed-job-abc123 # Monitor a running training job ml logs training-job-xyz789 -f # Get recent errors only ml logs job123 -n 20 | grep -i error ``` --- #### `jupyter` - Jupyter Notebook Management Manage Jupyter notebook services via WebSocket protocol. ```bash # Start a Jupyter service ml jupyter start --name my-notebook --workspace /path/to/workspace # Start with password protection ml jupyter start --name my-notebook --workspace /path/to/workspace --password mypass # List running services ml jupyter list # Stop a service ml jupyter stop service-id-12345 # Check service status ml jupyter status ``` **Features:** - WebSocket-based binary protocol for low latency - Secure API key authentication (SHA256 hashed) - Real-time service management - Workspace isolation **Common Use Cases:** ```bash # Development workflow ml jupyter start --name dev-notebook --workspace ./notebooks # ... do development work ... ml jupyter stop dev-service-123 # Team collaboration ml jupyter start --name team-analysis --workspace /shared/analysis --password teampass # Multiple services ml jupyter list # View all running services ``` **Security:** - API keys are hashed before transmission - Password protection for notebooks - Workspace path validation - Service ID-based authorization ### Configuration ```toml worker_host = "worker.local" worker_user = "mluser" worker_base = "/data/ml-experiments" worker_port = 22 api_key = "your-api-key" ``` ### Performance Features - **Content-Addressed Storage**: Automatic deduplication of identical files - **Incremental Sync**: Only transfers changed files - **SHA256 Hashing**: Reliable commit ID generation - **WebSocket Communication**: Efficient real-time messaging - **Multi-threaded**: Concurrent operations where applicable ## Go Commands ### API Server (`./cmd/api-server/main.go`) Main HTTPS API server for experiment management. ```bash # Build and run go run ./cmd/api-server/main.go # With configuration ./bin/api-server --config configs/api/dev.yaml ``` **Features:** - HTTPS-only communication - API key authentication - Rate limiting and IP whitelisting - WebSocket support for real-time updates - Redis integration for caching ### TUI (`./cmd/tui/main.go`) Terminal User Interface for monitoring experiments. ```bash # Launch TUI go run ./cmd/tui/main.go ``` **Features:** - Real-time experiment monitoring - Interactive job management - Status visualization - Log viewing ### Data Manager (`./cmd/data_manager/`) Utilities for data synchronization and management. ```bash # Sync data ./data_manager --sync ./data # Clean old data ./data_manager --cleanup --older-than 30d ``` ### Config Lint (`./cmd/configlint/main.go`) Configuration validation and linting tool. ```bash # Validate configuration ./configlint configs/api/dev.yaml # Check schema compliance ./configlint --schema configs/schema/api_server_config.yaml ``` ## Management Script (`./tools/manage.sh`) Simple service management for your homelab. ### Commands ```bash ./tools/manage.sh start # Start all services ./tools/manage.sh stop # Stop all services ./tools/manage.sh status # Check service status ./tools/manage.sh logs # View logs ./tools/manage.sh monitor # Basic monitoring ./tools/manage.sh security # Security status ./tools/manage.sh cleanup # Clean project artifacts ``` ## API Testing Test the API with curl: ```bash # Health check curl -f http://localhost:8080/health # List experiments curl -H 'X-API-Key: password' http://localhost:8080/experiments # Submit experiment curl -X POST -H 'X-API-Key: password' \ -H 'Content-Type: application/json' \ -d '{"name":"test","config":{"type":"basic"}}' \ http://localhost:8080/experiments ``` ## Zig CLI Architecture The Zig CLI is designed for performance and reliability: ### Core Components - **Commands** (`cli/src/commands/`): Individual command implementations - **Config** (`cli/src/config.zig`): Configuration management - **Network** (`cli/src/net/ws.zig`): WebSocket client implementation - **Utils** (`cli/src/utils/`): Cryptography, storage, and rsync utilities - **Errors** (`cli/src/errors.zig`): Centralized error handling ### Performance Optimizations - **Content-Addressed Storage**: Deduplicates identical files across experiments - **SHA256 Hashing**: Fast, reliable commit ID generation - **Rsync Integration**: Efficient incremental file transfers - **WebSocket Protocol**: Low-latency communication with worker - **Memory Management**: Efficient allocation with Zig's allocator system ### Security Features - **API Key Hashing**: Secure authentication token handling - **SSH Integration**: Secure file transfers - **Input Validation**: Comprehensive argument checking - **Error Handling**: Secure error reporting without information leakage ## Configuration Main configuration file: `configs/api/dev.yaml` ### Key Settings ```yaml auth: enabled: true api_keys: dev_user: hash: "CHANGE_ME_SHA256_DEV_USER_KEY" admin: true roles: - admin permissions: '*': true researcher_user: hash: "CHANGE_ME_SHA256_RESEARCHER_USER_KEY" admin: false roles: - researcher permissions: 'experiments': true 'datasets': true server: address: ":9101" tls: enabled: false # Set to true for production cert_file: "./ssl/cert.pem" key_file: "./ssl/key.pem" security: rate_limit: enabled: true requests_per_minute: 30 ip_whitelist: - "127.0.0.1" - "::1" - "localhost" - "10.0.0.0/8" ``` ## Docker Commands If using Docker Compose: ```bash # Start services docker-compose up -d (testing only) # View logs docker-compose logs -f # Stop services docker-compose down # Check status docker-compose ps ``` ## Troubleshooting ### Common Issues **Zig CLI not found:** ```bash # Build the CLI cd cli && make build # Check binary exists ls -la ./cli/zig-out/bin/ml ``` **Configuration not found:** ```bash # Create configuration ./cli/zig-out/bin/ml init # Check config file ls -la ~/.ml/config.toml ``` **Worker connection failed:** ```bash # Test SSH connection ssh -p 22 mluser@worker.local # Check configuration cat ~/.ml/config.toml ``` **Sync not working:** ```bash # Check rsync availability rsync --version # Test manual sync rsync -avz ./project/ mluser@worker.local:/tmp/test/ ``` **WebSocket connection failed:** ```bash # Check worker WebSocket port telnet worker.local 9100 # Verify API key ./cli/zig-out/bin/ml status ``` **API not responding:** ```bash ./tools/manage.sh status ./tools/manage.sh logs ``` **Authentication failed:** ```bash # Check API key in config grep -A 5 "api_keys:" configs/api/dev.yaml ``` **Redis connection failed:** ```bash # Check Redis status redis-cli ping # Start Redis redis-server ``` ### Getting Help ```bash # CLI help ./cli/zig-out/bin/ml help # Management script help ./tools/manage.sh help # Check all available commands make help ``` --- **That's it for the CLI reference!** For complete setup instructions, see the main [index](index.md).