Zig CLI Guide¶

High-performance command-line interface for ML experiment management, written in Zig for maximum speed and efficiency.

Overview¶

The Zig CLI (ml) is the primary interface for managing ML experiments in your homelab. Built with Zig, it provides exceptional performance for file operations, network communication, and experiment management.

Installation¶

Pre-built Binaries (Recommended)¶

Download from GitHub Releases:

# Download for your platform
curl -LO https://github.com/jfraeys/fetch_ml/releases/latest/download/ml-<platform>.tar.gz

# Extract
tar -xzf ml-<platform>.tar.gz

# Install
chmod +x ml-<platform>
sudo mv ml-<platform> /usr/local/bin/ml

# Verify
ml --help

Platforms: - ml-linux-x86_64.tar.gz - Linux (fully static, zero dependencies) - ml-macos-x86_64.tar.gz - macOS Intel
- ml-macos-arm64.tar.gz - macOS Apple Silicon

All release binaries include embedded static rsync for complete independence.

Build from Source¶

Development Build (uses system rsync):

cd cli
zig build dev
./zig-out/dev/ml-dev --help

Production Build (embedded rsync):

cd cli
# For testing: uses rsync wrapper
zig build prod

# For release with static rsync:
# 1. Place static rsync binary at src/assets/rsync_release.bin
# 2. Build
zig build prod
strip zig-out/prod/ml  # Optional: reduce size

# Verify
./zig-out/prod/ml --help
ls -lh zig-out/prod/ml

See cli/src/assets/README.md for details on obtaining static rsync binaries.

Verify Installation¶

ml --help
ml --version  # Shows build config

Quick Start¶

Initialize Configuration
```
./cli/zig-out/bin/ml init
```

Sync Your First Project

./cli/zig-out/bin/ml sync ./my-project --queue

Monitor Progress
```
./cli/zig-out/bin/ml status
```

Command Reference¶

`init` - Configuration Setup¶

Initialize the CLI configuration file.

ml init

Creates: ~/.ml/config.toml

Configuration Template:

worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"

`sync` - Project Synchronization¶

Sync project files to the worker with intelligent deduplication.

# Basic sync
ml sync ./project

# Sync with custom name and auto-queue
ml sync ./project --name "experiment-1" --queue

# Sync with priority
ml sync ./project --priority 8

Options: - --name <name>: Custom experiment name - --queue: Automatically queue after sync - --priority N: Set priority (1-10, default 5)

Features: - Content-Addressed Storage: Automatic deduplication - SHA256 Commit IDs: Reliable change detection - Incremental Transfer: Only sync changed files - Rsync Backend: Efficient file transfer

`queue` - Job Management¶

Queue experiments for execution on the worker.

# Queue with commit ID
ml queue my-job --commit abc123def456

# Queue with priority
ml queue my-job --commit abc123 --priority 8

Options: - --commit <id>: Commit ID from sync output - --priority N: Execution priority (1-10)

Features: - WebSocket Communication: Real-time job submission - Priority Queuing: Higher priority jobs run first - API Authentication: Secure job submission

`watch` - Auto-Sync Monitoring¶

Monitor directories for changes and auto-sync.

# Watch for changes
ml watch ./project

# Watch and auto-queue on changes
ml watch ./project --name "dev-exp" --queue

Options: - --name <name>: Custom experiment name - --queue: Auto-queue on changes - --priority N: Set priority for queued jobs

Features: - Real-time Monitoring: 2-second polling interval - Change Detection: File modification time tracking - Commit Comparison: Only sync when content changes - Automatic Queuing: Seamless development workflow

`status` - System Status¶

Check system and worker status.

ml status

Displays: - Worker connectivity - Queue status - Running jobs - System health

`monitor` - Remote Monitoring¶

Launch TUI interface via SSH for real-time monitoring.

ml monitor

Features: - Real-time Updates: Live experiment status - Interactive Interface: Browse and manage experiments - SSH Integration: Secure remote access

`cancel` - Job Cancellation¶

Cancel running or queued jobs.

ml cancel job-id

Options: - job-id: Job identifier from status output

`prune` - Cleanup Management¶

Clean up old experiments to save space.

# Keep last N experiments
ml prune --keep 20

# Remove experiments older than N days
ml prune --older-than 30

Options: - --keep N: Keep N most recent experiments - --older-than N: Remove experiments older than N days

Architecture¶

Testing: Docker Compose (macOS/Linux) Production: Podman + systemd (Linux)

Important: Docker is for testing only. Podman is used for running actual ML experiments in production.

Core Components¶

cli/src/
├── commands/        # Command implementations
│   ├── init.zig     # Configuration setup
│   ├── sync.zig     # Project synchronization
│   ├── queue.zig    # Job management
│   ├── watch.zig    # Auto-sync monitoring
│   ├── status.zig   # System status
│   ├── monitor.zig  # Remote monitoring
│   ├── cancel.zig   # Job cancellation
│   └── prune.zig    # Cleanup operations
├── config.zig       # Configuration management
├── errors.zig       # Error handling
├── net/            # Network utilities
│   └── ws.zig       # WebSocket client
└── utils/          # Utility functions
    ├── crypto.zig   # Hashing and encryption
    ├── storage.zig  # Content-addressed storage
    └── rsync.zig    # File synchronization

Performance Features¶

Content-Addressed Storage¶

Deduplication: Identical files shared across experiments
Hash-based Storage: Files stored by SHA256 hash
Space Efficiency: Reduces storage by up to 90%

SHA256 Commit IDs¶

Reliable Detection: Cryptographic change detection
Collision Resistance: Guaranteed unique identifiers
Fast Computation: Optimized for large directories

WebSocket Protocol¶

Low Latency: Real-time communication
Binary Protocol: Efficient message format
Connection Pooling: Reused connections

Memory Management¶

Arena Allocators: Efficient memory allocation
Zero-copy Operations: Minimized memory usage
Resource Cleanup: Automatic resource management

Security Features¶

Authentication¶

API Key Hashing: Secure token storage
SHA256 Hashes: Irreversible token protection
Config Validation: Input sanitization

Secure Communication¶

SSH Integration: Encrypted file transfers
WebSocket Security: TLS-protected communication
Input Validation: Comprehensive argument checking

Error Handling¶

Secure Reporting: No sensitive information leakage
Graceful Degradation: Safe error recovery
Audit Logging: Operation tracking

Advanced Usage¶

Workflow Integration¶

Development Workflow¶

# 1. Initialize project
ml sync ./project --name "dev" --queue

# 2. Auto-sync during development
ml watch ./project --name "dev" --queue

# 3. Monitor progress
ml status

Batch Processing¶

# Process multiple experiments
for dir in experiments/*/; do
    ml sync "$dir" --queue
done

Priority Management¶

# High priority experiment
ml sync ./urgent --priority 10 --queue

# Background processing
ml sync ./background --priority 1 --queue

Configuration Management¶

Multiple Workers¶

# ~/.ml/config.toml
worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"

Security Settings¶

# Set restrictive permissions
chmod 600 ~/.ml/config.toml

# Verify configuration
ml status

Troubleshooting¶

Common Issues¶

Build Problems¶

# Check Zig installation
zig version

# Clean build
cd cli && make clean && make build

Connection Issues¶

# Test SSH connectivity
ssh -p $worker_port $worker_user@$worker_host

# Verify configuration
cat ~/.ml/config.toml

Sync Failures¶

# Check rsync
rsync --version

# Manual sync test
rsync -avz ./test/ $worker_user@$worker_host:/tmp/

Performance Issues¶

# Monitor resource usage
top -p $(pgrep ml)

# Check disk space
df -h $worker_base

Debug Mode¶

Enable verbose logging:

# Environment variable
export ML_DEBUG=1
ml sync ./project

# Or use debug build
cd cli && make debug

Performance Benchmarks¶

File Operations¶

Sync Speed: 100MB/s+ (network limited)
Hash Computation: 500MB/s+ (CPU limited)
Deduplication: 90%+ space savings

Memory Usage¶

Base Memory: ~10MB
Large Projects: ~50MB (1GB+ projects)
Memory Efficiency: Constant per-file overhead

Network Performance¶

WebSocket Latency: <10ms (local network)
Connection Setup: <100ms
Throughput: Network limited

Contributing¶

Development Setup¶

cd cli
zig build-exe src/main.zig

Testing¶

# Run tests
cd cli && zig test src/

# Integration tests
zig test tests/

Code Style¶

Follow Zig style guidelines
Use explicit error handling
Document public APIs
Add comprehensive tests

For more information, see the CLI Reference and Architecture pages.