fetch_ml/cli
Jeremie Fraeys 743bc4be3b
cli: update Zig CLI build and native hash integration
- Update build.zig configuration
- Improve queue command implementation
- Enhance native hash support
2026-03-04 13:23:30 -05:00
..
scripts chore: Update security scan workflow and SQLite build script 2026-02-23 14:24:00 -05:00
src cli: update Zig CLI build and native hash integration 2026-03-04 13:23:30 -05:00
tests refactor(cli): reorganize queue commands and add logs test 2026-02-18 12:45:54 -05:00
build.zig cli: update Zig CLI build and native hash integration 2026-03-04 13:23:30 -05:00
Makefile fix(build): add rsync SHA256 hash to skip GPG verification 2026-02-21 20:42:35 -05:00
README.md cli: update Zig CLI build and native hash integration 2026-03-04 13:23:30 -05:00
src.zig feat(cli): enhance Zig CLI with new commands and improved networking 2026-02-12 12:05:10 -05:00

ML CLI

Fast CLI tool for managing ML experiments. Supports both local mode (SQLite) and server mode (WebSocket).

Build Policy

Native C++ libraries (dataset_hash, etc.) are available when building natively on any platform. Cross-compilation is supported for development on non-native targets but disables native library features.

Build Type Target Native Libraries Purpose
Native Host platform (Linux, macOS) Yes Dev, staging, production
Cross-compile Different arch/OS Stubbed Testing on foreign targets

Builds on the host platform with full native library support:

zig build -Doptimize=ReleaseSmall

Cross-Compile (Dev Only)

For testing on different architectures without native library support:

zig build -Dtarget=x86_64-linux-gnu  # from macOS/Windows

Architecture

The CLI follows a modular 3-layer architecture for maintainability:

src/
├── core/                    # Shared foundation
│   ├── context.zig         # Execution context (allocator, config, mode dispatch)
│   ├── output.zig          # Unified JSON/text output helpers
│   └── flags.zig           # Common flag parsing
├── local/                   # Local mode operations (SQLite)
│   └── experiment_ops.zig  # Experiment CRUD for local DB
├── server/                  # Server mode operations (WebSocket)
│   └── experiment_api.zig  # Experiment API for remote server
├── commands/                # Thin command routers
│   ├── experiment.zig      # ~100 lines (was 887)
│   ├── queue.zig           # Job submission
│   └── queue/              # Queue submodules
│       ├── parse.zig       # Job template parsing
│       ├── validate.zig    # Validation logic
│       └── submit.zig      # Job submission
└── utils/                   # Utilities (21 files)

Mode Dispatch Pattern

Commands auto-detect local vs server mode using core.context.Context:

var ctx = core.context.Context.init(allocator, cfg, flags.json);
if (ctx.isLocal()) {
    return try local.experiment.list(ctx.allocator, ctx.json_output);
} else {
    return try server.experiment.list(ctx.allocator, ctx.json_output);
}

Quick Start

# 1. Build
zig build

# 2. Initialize local tracking (creates fetch_ml.db)
./zig-out/bin/ml init

# 3. Create experiment and run locally
./zig-out/bin/ml experiment create --name "baseline"
./zig-out/bin/ml run start --experiment <id> --name "run-1"
./zig-out/bin/ml experiment log --run <id> --name loss --value 0.5
./zig-out/bin/ml run finish --run <id>

Commands

Local Mode Commands (SQLite)

  • ml init - Initialize local experiment tracking database
  • ml experiment create --name <name> - Create experiment locally
  • ml experiment list - List experiments from SQLite
  • ml experiment log --run <id> --name <key> --value <val> - Log metrics
  • ml run start --experiment <id> [--name <name>] - Start a run
  • ml run finish --run <id> - Mark run as finished
  • ml run fail --run <id> - Mark run as failed
  • ml run list - List all runs

Server Mode Commands (WebSocket)

  • ml sync <path> - Sync project to server
  • ml queue <job1> [job2 ...] [--commit <id>] [--priority N] [--note <text>] - Queue jobs
  • ml status - Check system/queue status
  • ml validate <commit_id> [--json] [--task <task_id>] - Validate provenance
  • ml cancel <job> - Cancel a running/queued job

Shared Commands (Auto-detect Mode)

  • ml experiment log|show|list|delete - Works in both local and server mode
  • ml monitor - Launch TUI (local SQLite or remote SSH)

Notes:

  • Commands auto-detect mode from config (sqlite:// vs wss://)
  • --json mode is designed to be pipe-friendly

Core Modules

core.context

Provides unified execution context for all commands:

  • Mode detection: Automatically detects local (SQLite) vs server (WebSocket) mode
  • Output handling: JSON vs text output based on --json flag
  • Dispatch helpers: ctx.dispatch(local_fn, server_fn, args) for mode-specific implementations
const core = @import("../core.zig");

pub fn execute(allocator: std.mem.Allocator, args: []const []const u8) !void {
    const cfg = try config.Config.load(allocator);
    var ctx = core.context.Context.init(allocator, cfg, flags.json);
    defer ctx.deinit();
    
    // Dispatch to local or server implementation
    if (ctx.isLocal()) {
        return try local.experiment.list(ctx.allocator, ctx.json_output);
    } else {
        return try server.experiment.list(ctx.allocator, ctx.json_output);
    }
}

core.output

Unified output helpers that respect --json flag:

core.output.errorMsg("command", "Error message");        // JSON: {"success":false,...}
core.output.success("command");                          // JSON: {"success":true,...}
core.output.successString("cmd", "key", "value");     // JSON with data
core.output.info("Text output", .{});                   // Text mode only
core.output.usage("cmd", "usage string");              // Help text

core.flags

Common flag parsing utilities:

var flags = core.flags.CommonFlags{};
var remaining = try core.flags.parseCommon(allocator, args, &flags);

// Check for subcommands
if (core.flags.matchSubcommand(remaining.items, "list")) |sub_args| {
    return try executeList(ctx, sub_args);
}

Configuration

Local Mode (SQLite)

# .fetchml/config.toml or ~/.ml/config.toml
tracking_uri = "sqlite://./fetch_ml.db"
artifact_path = "./experiments/"
sync_uri = ""  # Optional: server to sync with

Server Mode (WebSocket)

# ~/.ml/config.toml
worker_host = "worker.local"
worker_user = "mluser"
worker_base = "/data/ml-experiments"
worker_port = 22
api_key = "your-api-key"

Building

Development

cd cli
zig build

Production (requires SQLite in assets/)

cd cli
make build-sqlite    # Fetch SQLite amalgamation
zig build prod       # Build with embedded SQLite

Install

# Install to system
make install

# Or copy binary manually
cp zig-out/bin/ml /usr/local/bin/

Local/Server Module Pattern

Commands that work in both modes follow this structure:

src/
├── local.zig              # Module index
├── local/
│   └── experiment_ops.zig # Local implementations
├── server.zig             # Module index
└── server/
    └── experiment_api.zig # Server implementations

Adding a New Command

  1. Create local implementation in src/local/<name>_ops.zig
  2. Create server implementation in src/server/<name>_api.zig
  3. Export from src/local.zig and src/server.zig
  4. Create thin router in src/commands/<name>.zig using ctx.dispatch()

Maintainability Cleanup (2026-02)

Recent refactoring improved code organization:

Metric Before After
experiment.zig 836 lines 348 lines (58% reduction)
queue.zig 1203 lines Modular structure
Duplicate printUsage 24 functions 1 shared helper
Mode dispatch logic Inlined everywhere core.context.Context

Key Improvements

  1. Core Modules: Unified core.output, core.flags, core.context eliminate duplication
  2. Mode Abstraction: Local/server operations separated into dedicated modules
  3. Queue Decomposition: queue/ submodules for parsing, validation, submission
  4. Bug Fixes: Resolved 15+ compilation errors in narrative.zig, outcome.zig, annotate.zig, etc.

Need Help?

  • ml --help - Show command help
  • ml <command> --help - Show command-specific help