- Add musl-tools to build-cli.yml for static linking support
- Move rsync build from ci.yml into build-cli.yml workflow
- Fix SQLite year parameter in both workflows
- Remove redundant RSYNC_VERSION env var from ci.yml
This consolidates the CLI artifact build process into a single
workflow file, making the CI pipeline easier to maintain.
Exec is now an internal module used by 'ml run', not a standalone
command. Remove the misleading 'ml exec' usage documentation and
replace with simple internal module message.
Add execution_mode enum (local/remote/auto) to config for persistent
control over command execution behavior. Removes --local/--remote flags
from commands to simplify user workflow - no need to check server
connection status manually.
Changes:
- config.zig: Add ExecutionMode enum, execution_mode field, parsing/serialization
- mode.zig: Update detect() to check execution_mode == .local
- init.zig: Add --mode flag (local/remote/auto) for setting during init
- info.zig: Use config execution_mode, removed --local/--remote flags
- run.zig: Use config execution_mode, removed --local/--remote flags
- exec/mod.zig: Use config execution_mode, removed --local/--remote flags
Priority order for determining execution mode:
1. Config setting (execution_mode: local/remote/auto)
2. Auto-detect only if config is 'auto'
Users set mode once during init:
ml init --mode=local # Always use local
ml init --mode=remote # Always use remote
ml init --mode=auto # Auto-detect (default)
Add isConnected() method to common.ConnectionContext to check WebSocket
client connection state. Migrate all server-connected commands to use
the standardized ConnectionContext pattern:
- jupyter/lifecycle.zig: Replace local ConnectionCtx with common.ConnectionContext
- status.zig: Use ConnectionContext, remove manual connection boilerplate,
add connection status indicators (connecting/connected)
- cancel.zig: Use ConnectionContext for server cancel operations
- dataset.zig: Use ConnectionContext for list/register/info/search operations
- exec/remote.zig: Use ConnectionContext for remote job execution
Benefits:
- Eliminates ~160 lines of duplicated connection boilerplate
- Consistent error handling and cleanup across commands
- Single point of change for connection logic
- Adds runtime connection state visibility to status command
Enhance ml info to query server when connected, falling back to local
manifests when offline. Unifies behavior with other commands like run,
exec, and cancel.
CLI changes:
- Add --local and --remote flags for explicit control
- Auto-detect connection state via mode.detect()
- queryRemoteRun(): Query server via WebSocket for run details
- queryLocalRun(): Read local run_manifest.json
- displayRunInfo(): Shared display logic for both sources
- Add connection status indicators (Remote: connecting.../connected)
WebSocket protocol:
- Add query_run_info opcode (0x28) to cli and server
- Add sendQueryRunInfo() method to ws/client.zig
- Protocol: [opcode:1][api_key_hash:16][run_id_len:1][run_id:var]
Server changes:
- Add handleQueryRunInfo() handler to ws/handler.go
- Returns run_id, job_name, user, timestamp, overall_sha, files_count
- Checks PermJobsRead permission
- Looks up run in experiment manager
Usage:
ml info abc123 # Auto: tries remote, falls back to local
ml info abc123 --local # Force local manifest lookup
ml info abc123 --remote # Force remote query (fails if offline)
Removed duplicate help text from doc comments:
- log.zig: Removed usage examples (in printUsage)
- annotate.zig: Removed usage examples (in printUsage)
- experiment.zig: Removed usage examples (in printUsage)
Rationale: printUsage() already contains detailed help text.
Doc comments should not duplicate this information.
All tests pass.
Fixed compilation error in jupyter/lifecycle.zig:
- Changed 'const client' to 'var client' in ConnectionCtx.init()
- Allows errdefer client.close() to work correctly
- close() requires mutable reference to ws.Client
All tests pass.
Created utils/dataset_hash.zig:
- computeDatasetHash(allocator, path) -> [64]u8
- Returns fixed 64-char hex string (stack allocated)
- Provides verifyDatasetIntegrity() for hash comparison
- Enables testing against native C++ implementations
Updated dataset.zig:
- verifyDataset() now automatically computes hash during verification
- Uses utils/dataset_hash.zig for hash computation
- Hash displayed in JSON output for reference
- No separate 'dataset hash' command needed
Benefits:
- Single source of truth for dataset hashing
- Testable independently for correctness verification
- Automatic during dataset verify operation
Since app is not released, removed old commands entirely:
- Deleted exec.zig (533 lines) - modularized version
- Deleted queue.zig (1248 lines) - complete removal
- Unified all functionality into run.zig
New unified 'ml run' command features:
- Auto-detects local vs remote execution via mode.detect()
- Supports --local and --remote flags to force execution mode
- Includes all resource options: --cpu, --memory, --gpu
- Research context: --hypothesis, --context, --intent, --tags
- Validation modes: --dry-run, --validate, --explain
- Uses modular exec/remote.zig and exec/local.zig for execution
Dispatcher updates (main.zig):
- Removed 'e' (exec) handler
- Removed 'q' (queue) handler
- Updated help text to show unified command
Import cleanup (commands.zig):
- Removed queue.zig import
Total code reduction: ~1,700 lines
All tests pass.
Standardized dataset.zig with proper doc comment format:
- Added /// doc comment with usage and subcommand descriptions
- Follows same format as other commands
Removed dataset_hash.zig:
- Hash computation is already automatic in 'dataset verify'
- Standalone 'ml dataset hash' command was redundant
- Users can use 'ml dataset verify <path>' to get hash
All tests pass.
Since app is not released, removed old commands entirely:
- Deleted exec.zig (533 lines)
- Deleted queue.zig (1248 lines)
- Unified functionality into run.zig
New unified 'ml run' command:
- Auto-detects local vs remote execution
- Supports --local and --remote flags to force mode
- Includes all features: priority, resources, research context
- Single command for all execution needs
Updated main.zig dispatcher:
- Removed 'e' (exec) handler
- Removed 'q' (queue) handler
- Updated help text
Total reduction: ~1,700 lines of code
All tests pass.
Move shared utility functions from queue.zig to common.zig:
- buildNarrativeJson() - was duplicated in queue.zig, exec/dryrun.zig, exec/remote.zig
- formatNextSteps() - was duplicated in queue.zig
- dryRun() - was duplicated in exec/dryrun.zig
- JobOptions struct - shared configuration options
Added common.zig import to queue.zig and updated all references.
Reduction: ~80 lines of duplicate code removed
All tests pass.
The has_tracking variable was set but never read. Removed:
- Variable declaration (line 140)
- 6 assignments across tracking flag handlers
Cleanup only, no functional changes.
All tests pass.
Break down exec.zig into focused modules:
- exec/mod.zig - Main entry point and command dispatch (211 lines)
- exec/remote.zig - Remote execution via WebSocket (87 lines)
- exec/local.zig - Local execution with fork/exec (137 lines)
- exec/dryrun.zig - Dry-run preview functionality (53 lines)
Original exec.zig now acts as backward-compatible wrapper.
Benefits:
- Each module <150 lines (highly maintainable)
- Clear separation: remote vs local vs dry-run logic
- Easier to test individual execution paths
- Original 533-line file split into 4 focused modules
All tests pass.
Introduce ConnectionCtx helper struct that encapsulates the common pattern:
- Config.load() + getWebSocketUrl() + hashApiKey() + Client.connect()
Applied to 4 functions in lifecycle.zig:
- startJupyter (was 76 lines, now 58 lines)
- stopJupyter (was 62 lines, now 44 lines)
- removeJupyter (was 101 lines, now 83 lines)
- restoreJupyter (was 62 lines, now 44 lines)
Total reduction: ~50 lines of duplicated boilerplate code.
Also created commands/common.zig for future shared patterns.
All tests pass.
Break down jupyter.zig (31KB, 906 lines) into focused modules:
- jupyter/mod.zig - Main entry point and command dispatch
- jupyter/validation.zig - Security validation functions
- jupyter/lifecycle.zig - Service create/start/stop/remove/restore
- jupyter/query.zig - List, status, and package queries
- jupyter/workspace.zig - Workspace and experiment management
Original jupyter.zig now acts as backward-compatible wrapper.
Removed 5 unimplemented placeholder functions (~50 lines of dead code).
Benefits:
- Each module <250 lines (maintainable)
- Clear separation of concerns
- Easier to test individual components
- Better code organization
All tests pass.
Update errors.zig to use consolidated io module:
- Replace colors.printError with io.printError
- Replace colors.printWarning with io.printWarning
- Replace colors.printInfo with io.printInfo
All tests pass.
Fix broken imports in dataset_hash.zig:
- ui/ui.zig and ui/colors.zig don't exist - replaced with std.debug.print
- Updated colors. to io. for consistency with consolidated utilities
- Remove dependency on non-existent ui module
All tests pass.
Update auth.zig import from colors.zig to io.zig:
- colors.zig is now consolidated into io.zig
- Use io module directly for consistency
All tests pass.
Remove unused suggest.zig utility module:
- Not imported by any file in the codebase
- Levenshtein distance functionality not currently needed
- Contains ~200 lines of unused code
If needed in the future for command suggestions, can be restored from git.
All tests pass.
Integrate ProgressBar into queue.zig for multi-job queuing:
- Show progress bar when queuing 2+ jobs (not in JSON mode)
- Update progress after each successful job queue
- Maintain simple output for single job queuing
- Clean up output for batch operations
Benefits:
- Better UX for batch job queuing
- Visual progress indication for long operations
- Consistent with sync command ProgressBar pattern
All tests pass.
Update progress.zig and integrate into sync command:
- progress.zig: update import from colors.zig to io.zig
- sync.zig: add ProgressBar for multi-run sync operations
- Shows progress bar when syncing 2+ runs (not in JSON mode)
- Updates progress after each successful sync
Benefits:
- Better UX for long-running sync operations
- Visual feedback on sync progress
- Maintains clean output for single runs
All tests pass.
Remove redundant logging.zig (28 lines):
- Functions moved to io.zig: printInfo, printSuccess, printWarning, printError, printProgress, confirm
- All functionality preserved with re-exports in utils.zig
Benefits:
- Reduced file count (22 → 21 utils)
- Single source of truth for I/O operations
- No functional changes
Build passes successfully.
Move JSON accessor functions to io.zig:
- jsonGetString, jsonGetInt, jsonGetFloat, jsonGetBool
- json.zig now re-exports from io.zig for backward compatibility
Benefits:
- Single location for all I/O related utilities
- Consistent with terminal/color consolidation
- Reduced file count
Build passes successfully.
Simplify imports by providing direct re-exports:
- utils.isTTY, utils.getWidth (instead of utils.terminal.isTTY)
- utils.reset, utils.red, utils.green (instead of utils.colors.reset)
- Mark colors, terminal, logging as consolidated into io.zig
- Mark rsync modules as deprecated
Benefits:
- Shorter import paths for common utilities
- Reduced typing: utils.red vs utils.colors.red
- Backward compatibility maintained
Build passes successfully.
Consolidate overlapping utilities:
- colors.zig (35 lines) → re-exports from io.zig
- terminal.zig (36 lines) → re-exports from io.zig
- io.zig now contains all terminal, color, and I/O utilities
Benefits:
- Single source of truth for terminal/color logic
- Reduced file count (25 → 23 utils)
- Easier maintenance with all I/O in one place
Build passes successfully.
Move configuration types to queue/mod.zig:
- TrackingConfig with MLflow, TensorBoard, Wandb sub-configs
- QueueOptions with all queue-related options
queue.zig now re-exports from queue/mod.zig for backward compatibility.
Build passes successfully.
Remove simplified placeholders and implement production versions:
- db.zig: Update UUID comment to reflect crypto RNG is already in use
- tls.zig: Implement proper TLS 1.2 ClientHello message construction
- Full record layer header with correct version
- Proper handshake header
- 32-byte cryptographically secure random bytes
- SNI extension with hostname
- ECDHE cipher suites for forward secrecy
- Correct length calculations for all fields
Build passes successfully with production implementations.
Fix broken hash tests to work with current CLI architecture:
- Update import to use @import(src) module system
- Add hash module export to utils.zig
- Make validatePath() public for testing
- Fix Zig 0.15 API: writeFile options struct, var tmp_dir for cleanup
- Fix file paths: use tmp_dir realpath for hashFile
- Replace std.fs.MAX_PATH_BYTES with 4096 buffer
All hash tests now passing.
Update build system for new modules:
- Makefile: add build targets for new components
- build.zig: include new source files in build graph
Supports new CLI architecture.
Remove obsolete native hash implementation files:
- Delete native/hash.zig (superseded by utils/hash.zig)
- Delete utils/native_bridge.zig (replaced by direct TLS)
- Delete utils/native_hash.zig (consolidated into utils/hash.zig)
Cleanup as part of CLI hardening.
Update command structure with improved implementations:
- exec.zig: consolidated command execution
- queue.zig: improved job queuing with narrative support
- run.zig: enhanced local run execution
- dataset.zig, dataset_hash.zig: improved dataset management
Part of CLI hardening for better UX and reliability.
Add progress reporting and offline sync infrastructure:
- progress.zig: progress bars and status reporting
- sync_manager.zig: offline run synchronization manager
Supports resilient operation with server connectivity issues.
Add signal handling, environment detection, and secrets management:
- signals.zig: graceful Ctrl+C handling and signal management
- environment.zig: user environment detection for telemetry
- secrets.zig: secrets redaction for secure logging
Improves CLI reliability and security posture.
Update experiment creation to track sync status:
- Insert experiments with synced=0 (not synced to server)
- Align with schema update in db.zig for offline tracking
Part of offline run synchronization feature.
Add hardware-accelerated hash detection:
- Implement hasShaNi() using CPUID inline assembly for x86_64
- Detect SHA-NI support (bit 29 of EBX in leaf 7, subleaf 0)
- Cross-platform fallback for non-x86_64 architectures
- Enables hardware-accelerated SHA-256 when available
Improves hashing performance on modern Intel/AMD CPUs.
Add TLS transport abstraction for secure WebSocket connections:
- Create tls.zig module with TlsStream struct for TLS-encrypted sockets
- Implement Transport union in client.zig supporting both TCP and TLS
- Update frame.zig and handshake.zig to use Transport abstraction
- Add TLS handshake, read, write, flush, and close operations
- Support TLS 1.2/1.3 protocol versions with error handling
- Zig 0.15 compatible ArrayList API usage
Enables wss:// protocol support for encrypted server communication.
Remove three unused methods/parameter identified by static analysis:
- canRequeue(): never integrated into scheduling flow
- runMetricsClient clientID param: accepted but never used
- getUsageLocked(): callers inline the logic
Fixes IDE warnings about unused code per AGENTS.md cleanup discipline.
- Update E2E tests for consolidated docker-compose.test.yml
- Remove references to obsolete logs-debug.yml
- Enhance test fixtures and utilities
- Improve integration test coverage for KMS, queue, scheduler
- Update unit tests for config constants and worker execution
- Modernize cleanup-status.sh with new Makefile targets
- Update go.mod and go.sum with latest dependencies
- Remove docker-compose.local.yml and prod.smoke.yml (consolidated)
- Update CI workflow configurations