Skip to content

Zig CLI Guide

High-performance command-line interface for ML experiment management, written in Zig for maximum speed and efficiency.

Overview

The Zig CLI (ml) is the primary interface for managing ML experiments in your homelab. Built with Zig, it provides exceptional performance for file operations, network communication, and experiment management.

Installation

Download from GitHub Releases:

# Download for your platform
curl -LO https://github.com/jfraeys/fetch_ml/releases/latest/download/ml-<platform>.tar.gz

# Extract
tar -xzf ml-<platform>.tar.gz

# Install
chmod +x ml-<platform>
sudo mv ml-<platform> /usr/local/bin/ml

# Verify
ml --help

Platforms: - ml-linux-x86_64.tar.gz - Linux (fully static, zero dependencies) - ml-macos-x86_64.tar.gz - macOS Intel
- ml-macos-arm64.tar.gz - macOS Apple Silicon

All release binaries include embedded static rsync for complete independence.

Build from Source

Development Build (uses system rsync):

cd cli
zig build dev
./zig-out/dev/ml-dev --help

Production Build (embedded rsync):

cd cli
# For testing: uses rsync wrapper
zig build prod

# For release with static rsync:
# 1. Place static rsync binary at src/assets/rsync_release.bin
# 2. Build
zig build prod
strip zig-out/prod/ml  # Optional: reduce size

# Verify
./zig-out/prod/ml --help
ls -lh zig-out/prod/ml

See cli/src/assets/README.md for details on obtaining static rsync binaries.

Verify Installation

ml --help
ml --version  # Shows build config

Quick Start

  1. Initialize Configuration

    ./cli/zig-out/bin/ml init
    

  2. Sync Your First Project

    ./cli/zig-out/bin/ml sync ./my-project --queue
    

  3. Monitor Progress

    ./cli/zig-out/bin/ml status
    

Command Reference

init - Configuration Setup

Initialize the CLI configuration file.

ml init

Creates: ~/.ml/config.toml

Configuration Template:

worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"

sync - Project Synchronization

Sync project files to the worker with intelligent deduplication.

# Basic sync
ml sync ./project

# Sync with custom name and auto-queue
ml sync ./project --name "experiment-1" --queue

# Sync with priority
ml sync ./project --priority 8

Options: - --name <name>: Custom experiment name - --queue: Automatically queue after sync - --priority N: Set priority (1-10, default 5)

Features: - Content-Addressed Storage: Automatic deduplication - SHA256 Commit IDs: Reliable change detection - Incremental Transfer: Only sync changed files - Rsync Backend: Efficient file transfer

queue - Job Management

Queue experiments for execution on the worker.

# Queue with commit ID
ml queue my-job --commit abc123def456

# Queue with priority
ml queue my-job --commit abc123 --priority 8

Options: - --commit <id>: Commit ID from sync output - --priority N: Execution priority (1-10)

Features: - WebSocket Communication: Real-time job submission - Priority Queuing: Higher priority jobs run first - API Authentication: Secure job submission

watch - Auto-Sync Monitoring

Monitor directories for changes and auto-sync.

# Watch for changes
ml watch ./project

# Watch and auto-queue on changes
ml watch ./project --name "dev-exp" --queue

Options: - --name <name>: Custom experiment name - --queue: Auto-queue on changes - --priority N: Set priority for queued jobs

Features: - Real-time Monitoring: 2-second polling interval - Change Detection: File modification time tracking - Commit Comparison: Only sync when content changes - Automatic Queuing: Seamless development workflow

status - System Status

Check system and worker status.

ml status

Displays: - Worker connectivity - Queue status - Running jobs - System health

monitor - Remote Monitoring

Launch TUI interface via SSH for real-time monitoring.

ml monitor

Features: - Real-time Updates: Live experiment status - Interactive Interface: Browse and manage experiments - SSH Integration: Secure remote access

cancel - Job Cancellation

Cancel running or queued jobs.

ml cancel job-id

Options: - job-id: Job identifier from status output

prune - Cleanup Management

Clean up old experiments to save space.

# Keep last N experiments
ml prune --keep 20

# Remove experiments older than N days
ml prune --older-than 30

Options: - --keep N: Keep N most recent experiments - --older-than N: Remove experiments older than N days

Architecture

Testing: Docker Compose (macOS/Linux) Production: Podman + systemd (Linux)

Important: Docker is for testing only. Podman is used for running actual ML experiments in production.

Core Components

cli/src/
├── commands/        # Command implementations
│   ├── init.zig     # Configuration setup
│   ├── sync.zig     # Project synchronization
│   ├── queue.zig    # Job management
│   ├── watch.zig    # Auto-sync monitoring
│   ├── status.zig   # System status
│   ├── monitor.zig  # Remote monitoring
│   ├── cancel.zig   # Job cancellation
│   └── prune.zig    # Cleanup operations
├── config.zig       # Configuration management
├── errors.zig       # Error handling
├── net/            # Network utilities
│   └── ws.zig       # WebSocket client
└── utils/          # Utility functions
    ├── crypto.zig   # Hashing and encryption
    ├── storage.zig  # Content-addressed storage
    └── rsync.zig    # File synchronization

Performance Features

Content-Addressed Storage

  • Deduplication: Identical files shared across experiments
  • Hash-based Storage: Files stored by SHA256 hash
  • Space Efficiency: Reduces storage by up to 90%

SHA256 Commit IDs

  • Reliable Detection: Cryptographic change detection
  • Collision Resistance: Guaranteed unique identifiers
  • Fast Computation: Optimized for large directories

WebSocket Protocol

  • Low Latency: Real-time communication
  • Binary Protocol: Efficient message format
  • Connection Pooling: Reused connections

Memory Management

  • Arena Allocators: Efficient memory allocation
  • Zero-copy Operations: Minimized memory usage
  • Resource Cleanup: Automatic resource management

Security Features

Authentication

  • API Key Hashing: Secure token storage
  • SHA256 Hashes: Irreversible token protection
  • Config Validation: Input sanitization

Secure Communication

  • SSH Integration: Encrypted file transfers
  • WebSocket Security: TLS-protected communication
  • Input Validation: Comprehensive argument checking

Error Handling

  • Secure Reporting: No sensitive information leakage
  • Graceful Degradation: Safe error recovery
  • Audit Logging: Operation tracking

Advanced Usage

Workflow Integration

Development Workflow

# 1. Initialize project
ml sync ./project --name "dev" --queue

# 2. Auto-sync during development
ml watch ./project --name "dev" --queue

# 3. Monitor progress
ml status

Batch Processing

# Process multiple experiments
for dir in experiments/*/; do
    ml sync "$dir" --queue
done

Priority Management

# High priority experiment
ml sync ./urgent --priority 10 --queue

# Background processing
ml sync ./background --priority 1 --queue

Configuration Management

Multiple Workers

# ~/.ml/config.toml
worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"

Security Settings

# Set restrictive permissions
chmod 600 ~/.ml/config.toml

# Verify configuration
ml status

Troubleshooting

Common Issues

Build Problems

# Check Zig installation
zig version

# Clean build
cd cli && make clean && make build

Connection Issues

# Test SSH connectivity
ssh -p $worker_port $worker_user@$worker_host

# Verify configuration
cat ~/.ml/config.toml

Sync Failures

# Check rsync
rsync --version

# Manual sync test
rsync -avz ./test/ $worker_user@$worker_host:/tmp/

Performance Issues

# Monitor resource usage
top -p $(pgrep ml)

# Check disk space
df -h $worker_base

Debug Mode

Enable verbose logging:

# Environment variable
export ML_DEBUG=1
ml sync ./project

# Or use debug build
cd cli && make debug

Performance Benchmarks

File Operations

  • Sync Speed: 100MB/s+ (network limited)
  • Hash Computation: 500MB/s+ (CPU limited)
  • Deduplication: 90%+ space savings

Memory Usage

  • Base Memory: ~10MB
  • Large Projects: ~50MB (1GB+ projects)
  • Memory Efficiency: Constant per-file overhead

Network Performance

  • WebSocket Latency: <10ms (local network)
  • Connection Setup: <100ms
  • Throughput: Network limited

Contributing

Development Setup

cd cli
zig build-exe src/main.zig

Testing

# Run tests
cd cli && zig test src/

# Integration tests
zig test tests/

Code Style

  • Follow Zig style guidelines
  • Use explicit error handling
  • Document public APIs
  • Add comprehensive tests

For more information, see the CLI Reference and Architecture pages.