---
title: "CLI Reference"
url: "/cli-reference/"
weight: 2
---

# Fetch ML CLI Reference

Comprehensive command-line tools for managing ML experiments in your homelab with Zig-based high-performance CLI.

## Overview

Fetch ML provides a comprehensive CLI toolkit built with performance and security in mind:

- **Zig CLI** - High-performance experiment management written in Zig
- **Go Commands** - API server, TUI, and data management utilities
- **Management Scripts** - Service orchestration and deployment
- **Setup Scripts** - One-command installation and configuration

## Zig CLI (`./cli/zig-out/bin/ml`)

High-performance command-line interface for experiment management, written in Zig for speed and efficiency.

### Available Commands

| Command | Description | Example |
|---------|-------------|----------|
| `init` | Interactive configuration setup | `ml init` |
| `sync` | Sync project to worker with deduplication | `ml sync ./project --name myjob --queue` |
| `queue` | Queue job for execution | `ml queue myjob --commit abc123 --priority 8` |
| `status` | Get system and worker status | `ml status` |
| `monitor` | Launch TUI monitoring via SSH | `ml monitor` |
| `cancel` | Cancel running job | `ml cancel job123` |
| `prune` | Clean up old experiments | `ml prune --keep 10` |
| `watch` | Auto-sync directory on changes | `ml watch ./project --queue` |
| `jupyter` | Manage Jupyter notebook services | `ml jupyter start --name my-nb` |
| `validate` | Validate provenance/integrity for a commit or task | `ml validate <commit_id> --verbose` |
| `info` | Show run info from `run_manifest.json` | `ml info <run_dir>` |
| `requeue` | Re-submit an existing run/commit with new args/resources | `ml requeue <commit_id|run_id|task_id|path> -- --epochs 20` |
| `logs` | Fetch and follow job logs | `ml logs job123 -n 100` |

### Command Details

#### `init` - Configuration Setup
```bash
ml init
```
Creates a configuration template at `~/.ml/config.toml` with:
- Worker connection details
- API authentication
- Base paths and ports

#### `sync` - Project Synchronization
```bash
# Basic sync
ml sync ./my-project

# Sync with custom name and queue
ml sync ./my-project --name "experiment-1" --queue

# Sync with priority
ml sync ./my-project --priority 9
```

**Features:**
- Content-addressed storage for deduplication
- SHA256 commit ID generation
- Rsync-based file transfer
- Automatic queuing (with `--queue` flag)

#### `queue` - Job Management
```bash
# Queue with commit ID
ml queue my-job --commit abc123def456

# Queue with commit ID prefix (>=7 hex chars; must be unique)
ml queue my-job --commit abc123 --priority 8

# Queue with extra runner args (stored as task.Args)
ml queue my-job --commit abc123 -- --epochs 5 --lr 1e-3
```

**Features:**
- WebSocket-based communication
- Priority queuing system
- API key authentication

**Notes:**
- `--priority` is passed to the server as a single byte (0-255).
- Args are sent via a dedicated queue opcode and become `task.Args` on the worker.
- `--commit` may be a full 40-hex commit id or a unique prefix (>=7 hex chars) resolvable under `worker_base`.

#### `requeue` - Re-submit a Previous Run
```bash
# Requeue directly by commit_id
ml requeue <commit_id> -- --epochs 20

# Requeue by commit_id prefix (>=7 hex chars; must be unique)
ml requeue <commit_prefix> -- --epochs 20

# Requeue by run_id/task_id (CLI scans run_manifest.json under worker_base)
ml requeue <run_id> -- --epochs 20

# Requeue by a run directory or run_manifest.json path
ml requeue /data/ml-experiments/finished/<run_id> -- --epochs 20

# Override priority/resources on requeue
ml requeue <task_id> --priority 10 --gpu 1 -- --epochs 20
```

**What it does:**
- Locates `run_manifest.json`
- Extracts `commit_id`
- Submits a new queue request using that `commit_id` with optional overridden args/resources

**Notes:**
- Tasks support optional `snapshot_id` and `dataset_specs` fields server-side (for provenance and dataset resolution).

#### `watch` - Auto-Sync Monitoring
```bash
# Watch directory for changes
ml watch ./project

# Watch and auto-queue on changes
ml watch ./project --name "dev-exp" --queue
```

**Features:**
- Real-time file system monitoring
- Automatic re-sync on changes
- Configurable polling interval (2 seconds)
- Commit ID comparison for efficiency

#### `prune` - Cleanup Management
```bash
# Keep last N experiments
ml prune --keep 20

# Remove experiments older than N days
ml prune --older-than 30
```

#### `monitor` - Remote Monitoring
```bash
ml monitor
```
Launches TUI interface via SSH for real-time monitoring.

#### `status` - System Status

`ml status --json` returns a JSON object including an optional `prewarm` field when worker prewarming is active:

```json
{
  "prewarm": [
    {
      "worker_id": "worker-1",
      "task_id": "<task-id>",
      "started_at": "2025-01-01T00:00:00Z",
      "updated_at": "2025-01-01T00:00:05Z",
      "phase": "datasets",
      "dataset_count": 2
    }
  ]
}
```

#### `cancel` - Job Cancellation
```bash
ml cancel running-job-id
```
Cancels currently running jobs by ID.

#### `logs` - Fetch and Follow Job Logs

Retrieve logs from running or completed ML experiments.

```bash
# Show full logs for a job
ml logs job123

# Show last 100 lines (tail)
ml logs job123 -n 100
ml logs job123 --tail 100

# Follow logs in real-time (like tail -f)
ml logs job123 -f
ml logs job123 --follow

# Combine tail and follow
ml logs job123 -n 50 -f
```

**Features:**
- WebSocket-based log streaming for real-time updates
- Works with both running and completed jobs
- Automatic reconnection on network issues
- Scrollable output with pagination support

**Common Use Cases:**
```bash
# Check why a job failed
ml logs failed-job-abc123

# Monitor a running training job
ml logs training-job-xyz789 -f

# Get recent errors only
ml logs job123 -n 20 | grep -i error
```

---

#### `jupyter` - Jupyter Notebook Management

Manage Jupyter notebook services via WebSocket protocol.

```bash
# Start a Jupyter service
ml jupyter start --name my-notebook --workspace /path/to/workspace

# Start with password protection
ml jupyter start --name my-notebook --workspace /path/to/workspace --password mypass

# List running services
ml jupyter list

# Stop a service
ml jupyter stop service-id-12345

# Check service status
ml jupyter status
```

**Features:**
- WebSocket-based binary protocol for low latency
- Secure API key authentication (SHA256 hashed)
- Real-time service management
- Workspace isolation

**Common Use Cases:**
```bash
# Development workflow
ml jupyter start --name dev-notebook --workspace ./notebooks
# ... do development work ...
ml jupyter stop dev-service-123

# Team collaboration
ml jupyter start --name team-analysis --workspace /shared/analysis --password teampass

# Multiple services
ml jupyter list  # View all running services
```

**Security:**
- API keys are hashed before transmission
- Password protection for notebooks
- Workspace path validation
- Service ID-based authorization

### Configuration

```toml
worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"
```

### Performance Features

- **Content-Addressed Storage**: Automatic deduplication of identical files
- **Incremental Sync**: Only transfers changed files
- **SHA256 Hashing**: Reliable commit ID generation
- **WebSocket Communication**: Efficient real-time messaging
- **Multi-threaded**: Concurrent operations where applicable

## Go Commands

### API Server (`./cmd/api-server/main.go`)
Main HTTPS API server for experiment management.

```bash
# Build and run
go run ./cmd/api-server/main.go

# With configuration
./bin/api-server --config configs/api/dev.yaml
```

**Features:**
- HTTPS-only communication
- API key authentication
- Rate limiting and IP whitelisting
- WebSocket support for real-time updates
- Redis integration for caching

### TUI (`./cmd/tui/main.go`)
Terminal User Interface for monitoring experiments.

```bash
# Launch TUI
go run ./cmd/tui/main.go
```

**Features:**
- Real-time experiment monitoring
- Interactive job management
- Status visualization
- Log viewing

### Data Manager (`./cmd/data_manager/`)
Utilities for data synchronization and management.

```bash
# Sync data
./data_manager --sync ./data

# Clean old data
./data_manager --cleanup --older-than 30d
```

### Config Lint (`./cmd/configlint/main.go`)
Configuration validation and linting tool.

```bash
# Validate configuration
./configlint configs/api/dev.yaml

# Check schema compliance
./configlint --schema configs/schema/api_server_config.yaml
```

## Management Script (`./tools/manage.sh`)

Simple service management for your homelab.

### Commands
```bash
./tools/manage.sh start          # Start all services
./tools/manage.sh stop           # Stop all services
./tools/manage.sh status         # Check service status
./tools/manage.sh logs           # View logs
./tools/manage.sh monitor        # Basic monitoring
./tools/manage.sh security       # Security status
./tools/manage.sh cleanup        # Clean project artifacts
```

## API Testing

Test the API with curl:

```bash
# Health check
curl -f http://localhost:8080/health

# List experiments
curl -H 'X-API-Key: password' http://localhost:8080/experiments

# Submit experiment
curl -X POST -H 'X-API-Key: password' \
     -H 'Content-Type: application/json' \
     -d '{"name":"test","config":{"type":"basic"}}' \
     http://localhost:8080/experiments
```

## Zig CLI Architecture

The Zig CLI is designed for performance and reliability:

### Core Components
- **Commands** (`cli/src/commands/`): Individual command implementations
- **Config** (`cli/src/config.zig`): Configuration management
- **Network** (`cli/src/net/ws.zig`): WebSocket client implementation
- **Utils** (`cli/src/utils/`): Cryptography, storage, and rsync utilities
- **Errors** (`cli/src/errors.zig`): Centralized error handling

### Performance Optimizations
- **Content-Addressed Storage**: Deduplicates identical files across experiments
- **SHA256 Hashing**: Fast, reliable commit ID generation
- **Rsync Integration**: Efficient incremental file transfers
- **WebSocket Protocol**: Low-latency communication with worker
- **Memory Management**: Efficient allocation with Zig's allocator system

### Security Features
- **API Key Hashing**: Secure authentication token handling
- **SSH Integration**: Secure file transfers
- **Input Validation**: Comprehensive argument checking
- **Error Handling**: Secure error reporting without information leakage

## Configuration

Main configuration file: `configs/api/dev.yaml`

### Key Settings
```yaml
auth:
  enabled: true
  api_keys:
    dev_user:
      hash: "CHANGE_ME_SHA256_DEV_USER_KEY"
      admin: true
      roles:
        - admin
      permissions:
        '*': true
    researcher_user:
      hash: "CHANGE_ME_SHA256_RESEARCHER_USER_KEY"
      admin: false
      roles:
        - researcher
      permissions:
        'experiments': true
        'datasets': true

server:
  address: ":9101"
  tls:
    enabled: false  # Set to true for production
    cert_file: "./ssl/cert.pem"
    key_file: "./ssl/key.pem"

security:
  rate_limit:
    enabled: true
    requests_per_minute: 30
  ip_whitelist:
    - "127.0.0.1"
    - "::1"
    - "localhost"
    - "10.0.0.0/8"
```

## Docker Commands

If using Docker Compose:

```bash
# Start services
docker-compose up -d (testing only)

# View logs
docker-compose logs -f

# Stop services
docker-compose down

# Check status
docker-compose ps
```

## Troubleshooting

### Common Issues

**Zig CLI not found:**
```bash
# Build the CLI
cd cli && make build

# Check binary exists
ls -la ./cli/zig-out/bin/ml
```

**Configuration not found:**
```bash
# Create configuration
./cli/zig-out/bin/ml init

# Check config file
ls -la ~/.ml/config.toml
```

**Worker connection failed:**
```bash
# Test SSH connection
ssh -p 22 mluser@worker.local

# Check configuration
cat ~/.ml/config.toml
```

**Sync not working:**
```bash
# Check rsync availability
rsync --version

# Test manual sync
rsync -avz ./project/ mluser@worker.local:/tmp/test/
```

**WebSocket connection failed:**
```bash
# Check worker WebSocket port
telnet worker.local 9100

# Verify API key
./cli/zig-out/bin/ml status
```

**API not responding:**
```bash
./tools/manage.sh status
./tools/manage.sh logs
```

**Authentication failed:**
```bash
# Check API key in config
grep -A 5 "api_keys:" configs/api/dev.yaml
```

**Redis connection failed:**
```bash
# Check Redis status
redis-cli ping

# Start Redis
redis-server
```

### Getting Help

```bash
# CLI help
./cli/zig-out/bin/ml help

# Management script help
./tools/manage.sh help

# Check all available commands
make help
```

---

**That's it for the CLI reference!** For complete setup instructions, see the main [index](index.md).