Enhance ml info to query server when connected, falling back to local
manifests when offline. Unifies behavior with other commands like run,
exec, and cancel.
CLI changes:
- Add --local and --remote flags for explicit control
- Auto-detect connection state via mode.detect()
- queryRemoteRun(): Query server via WebSocket for run details
- queryLocalRun(): Read local run_manifest.json
- displayRunInfo(): Shared display logic for both sources
- Add connection status indicators (Remote: connecting.../connected)
WebSocket protocol:
- Add query_run_info opcode (0x28) to cli and server
- Add sendQueryRunInfo() method to ws/client.zig
- Protocol: [opcode:1][api_key_hash:16][run_id_len:1][run_id:var]
Server changes:
- Add handleQueryRunInfo() handler to ws/handler.go
- Returns run_id, job_name, user, timestamp, overall_sha, files_count
- Checks PermJobsRead permission
- Looks up run in experiment manager
Usage:
ml info abc123 # Auto: tries remote, falls back to local
ml info abc123 --local # Force local manifest lookup
ml info abc123 --remote # Force remote query (fails if offline)
Removed duplicate help text from doc comments:
- log.zig: Removed usage examples (in printUsage)
- annotate.zig: Removed usage examples (in printUsage)
- experiment.zig: Removed usage examples (in printUsage)
Rationale: printUsage() already contains detailed help text.
Doc comments should not duplicate this information.
All tests pass.
Fixed compilation error in jupyter/lifecycle.zig:
- Changed 'const client' to 'var client' in ConnectionCtx.init()
- Allows errdefer client.close() to work correctly
- close() requires mutable reference to ws.Client
All tests pass.
Created utils/dataset_hash.zig:
- computeDatasetHash(allocator, path) -> [64]u8
- Returns fixed 64-char hex string (stack allocated)
- Provides verifyDatasetIntegrity() for hash comparison
- Enables testing against native C++ implementations
Updated dataset.zig:
- verifyDataset() now automatically computes hash during verification
- Uses utils/dataset_hash.zig for hash computation
- Hash displayed in JSON output for reference
- No separate 'dataset hash' command needed
Benefits:
- Single source of truth for dataset hashing
- Testable independently for correctness verification
- Automatic during dataset verify operation
Since app is not released, removed old commands entirely:
- Deleted exec.zig (533 lines) - modularized version
- Deleted queue.zig (1248 lines) - complete removal
- Unified all functionality into run.zig
New unified 'ml run' command features:
- Auto-detects local vs remote execution via mode.detect()
- Supports --local and --remote flags to force execution mode
- Includes all resource options: --cpu, --memory, --gpu
- Research context: --hypothesis, --context, --intent, --tags
- Validation modes: --dry-run, --validate, --explain
- Uses modular exec/remote.zig and exec/local.zig for execution
Dispatcher updates (main.zig):
- Removed 'e' (exec) handler
- Removed 'q' (queue) handler
- Updated help text to show unified command
Import cleanup (commands.zig):
- Removed queue.zig import
Total code reduction: ~1,700 lines
All tests pass.
Standardized dataset.zig with proper doc comment format:
- Added /// doc comment with usage and subcommand descriptions
- Follows same format as other commands
Removed dataset_hash.zig:
- Hash computation is already automatic in 'dataset verify'
- Standalone 'ml dataset hash' command was redundant
- Users can use 'ml dataset verify <path>' to get hash
All tests pass.
Since app is not released, removed old commands entirely:
- Deleted exec.zig (533 lines)
- Deleted queue.zig (1248 lines)
- Unified functionality into run.zig
New unified 'ml run' command:
- Auto-detects local vs remote execution
- Supports --local and --remote flags to force mode
- Includes all features: priority, resources, research context
- Single command for all execution needs
Updated main.zig dispatcher:
- Removed 'e' (exec) handler
- Removed 'q' (queue) handler
- Updated help text
Total reduction: ~1,700 lines of code
All tests pass.
Move shared utility functions from queue.zig to common.zig:
- buildNarrativeJson() - was duplicated in queue.zig, exec/dryrun.zig, exec/remote.zig
- formatNextSteps() - was duplicated in queue.zig
- dryRun() - was duplicated in exec/dryrun.zig
- JobOptions struct - shared configuration options
Added common.zig import to queue.zig and updated all references.
Reduction: ~80 lines of duplicate code removed
All tests pass.
The has_tracking variable was set but never read. Removed:
- Variable declaration (line 140)
- 6 assignments across tracking flag handlers
Cleanup only, no functional changes.
All tests pass.
Break down exec.zig into focused modules:
- exec/mod.zig - Main entry point and command dispatch (211 lines)
- exec/remote.zig - Remote execution via WebSocket (87 lines)
- exec/local.zig - Local execution with fork/exec (137 lines)
- exec/dryrun.zig - Dry-run preview functionality (53 lines)
Original exec.zig now acts as backward-compatible wrapper.
Benefits:
- Each module <150 lines (highly maintainable)
- Clear separation: remote vs local vs dry-run logic
- Easier to test individual execution paths
- Original 533-line file split into 4 focused modules
All tests pass.
Introduce ConnectionCtx helper struct that encapsulates the common pattern:
- Config.load() + getWebSocketUrl() + hashApiKey() + Client.connect()
Applied to 4 functions in lifecycle.zig:
- startJupyter (was 76 lines, now 58 lines)
- stopJupyter (was 62 lines, now 44 lines)
- removeJupyter (was 101 lines, now 83 lines)
- restoreJupyter (was 62 lines, now 44 lines)
Total reduction: ~50 lines of duplicated boilerplate code.
Also created commands/common.zig for future shared patterns.
All tests pass.
Break down jupyter.zig (31KB, 906 lines) into focused modules:
- jupyter/mod.zig - Main entry point and command dispatch
- jupyter/validation.zig - Security validation functions
- jupyter/lifecycle.zig - Service create/start/stop/remove/restore
- jupyter/query.zig - List, status, and package queries
- jupyter/workspace.zig - Workspace and experiment management
Original jupyter.zig now acts as backward-compatible wrapper.
Removed 5 unimplemented placeholder functions (~50 lines of dead code).
Benefits:
- Each module <250 lines (maintainable)
- Clear separation of concerns
- Easier to test individual components
- Better code organization
All tests pass.
Update errors.zig to use consolidated io module:
- Replace colors.printError with io.printError
- Replace colors.printWarning with io.printWarning
- Replace colors.printInfo with io.printInfo
All tests pass.
Fix broken imports in dataset_hash.zig:
- ui/ui.zig and ui/colors.zig don't exist - replaced with std.debug.print
- Updated colors. to io. for consistency with consolidated utilities
- Remove dependency on non-existent ui module
All tests pass.
Update auth.zig import from colors.zig to io.zig:
- colors.zig is now consolidated into io.zig
- Use io module directly for consistency
All tests pass.
Remove unused suggest.zig utility module:
- Not imported by any file in the codebase
- Levenshtein distance functionality not currently needed
- Contains ~200 lines of unused code
If needed in the future for command suggestions, can be restored from git.
All tests pass.
Integrate ProgressBar into queue.zig for multi-job queuing:
- Show progress bar when queuing 2+ jobs (not in JSON mode)
- Update progress after each successful job queue
- Maintain simple output for single job queuing
- Clean up output for batch operations
Benefits:
- Better UX for batch job queuing
- Visual progress indication for long operations
- Consistent with sync command ProgressBar pattern
All tests pass.
Update progress.zig and integrate into sync command:
- progress.zig: update import from colors.zig to io.zig
- sync.zig: add ProgressBar for multi-run sync operations
- Shows progress bar when syncing 2+ runs (not in JSON mode)
- Updates progress after each successful sync
Benefits:
- Better UX for long-running sync operations
- Visual feedback on sync progress
- Maintains clean output for single runs
All tests pass.
Remove redundant logging.zig (28 lines):
- Functions moved to io.zig: printInfo, printSuccess, printWarning, printError, printProgress, confirm
- All functionality preserved with re-exports in utils.zig
Benefits:
- Reduced file count (22 → 21 utils)
- Single source of truth for I/O operations
- No functional changes
Build passes successfully.
Move JSON accessor functions to io.zig:
- jsonGetString, jsonGetInt, jsonGetFloat, jsonGetBool
- json.zig now re-exports from io.zig for backward compatibility
Benefits:
- Single location for all I/O related utilities
- Consistent with terminal/color consolidation
- Reduced file count
Build passes successfully.
Simplify imports by providing direct re-exports:
- utils.isTTY, utils.getWidth (instead of utils.terminal.isTTY)
- utils.reset, utils.red, utils.green (instead of utils.colors.reset)
- Mark colors, terminal, logging as consolidated into io.zig
- Mark rsync modules as deprecated
Benefits:
- Shorter import paths for common utilities
- Reduced typing: utils.red vs utils.colors.red
- Backward compatibility maintained
Build passes successfully.
Consolidate overlapping utilities:
- colors.zig (35 lines) → re-exports from io.zig
- terminal.zig (36 lines) → re-exports from io.zig
- io.zig now contains all terminal, color, and I/O utilities
Benefits:
- Single source of truth for terminal/color logic
- Reduced file count (25 → 23 utils)
- Easier maintenance with all I/O in one place
Build passes successfully.
Move configuration types to queue/mod.zig:
- TrackingConfig with MLflow, TensorBoard, Wandb sub-configs
- QueueOptions with all queue-related options
queue.zig now re-exports from queue/mod.zig for backward compatibility.
Build passes successfully.
Remove simplified placeholders and implement production versions:
- db.zig: Update UUID comment to reflect crypto RNG is already in use
- tls.zig: Implement proper TLS 1.2 ClientHello message construction
- Full record layer header with correct version
- Proper handshake header
- 32-byte cryptographically secure random bytes
- SNI extension with hostname
- ECDHE cipher suites for forward secrecy
- Correct length calculations for all fields
Build passes successfully with production implementations.
Fix broken hash tests to work with current CLI architecture:
- Update import to use @import(src) module system
- Add hash module export to utils.zig
- Make validatePath() public for testing
- Fix Zig 0.15 API: writeFile options struct, var tmp_dir for cleanup
- Fix file paths: use tmp_dir realpath for hashFile
- Replace std.fs.MAX_PATH_BYTES with 4096 buffer
All hash tests now passing.
Update build system for new modules:
- Makefile: add build targets for new components
- build.zig: include new source files in build graph
Supports new CLI architecture.
Remove obsolete native hash implementation files:
- Delete native/hash.zig (superseded by utils/hash.zig)
- Delete utils/native_bridge.zig (replaced by direct TLS)
- Delete utils/native_hash.zig (consolidated into utils/hash.zig)
Cleanup as part of CLI hardening.
Update command structure with improved implementations:
- exec.zig: consolidated command execution
- queue.zig: improved job queuing with narrative support
- run.zig: enhanced local run execution
- dataset.zig, dataset_hash.zig: improved dataset management
Part of CLI hardening for better UX and reliability.
Add progress reporting and offline sync infrastructure:
- progress.zig: progress bars and status reporting
- sync_manager.zig: offline run synchronization manager
Supports resilient operation with server connectivity issues.
Add signal handling, environment detection, and secrets management:
- signals.zig: graceful Ctrl+C handling and signal management
- environment.zig: user environment detection for telemetry
- secrets.zig: secrets redaction for secure logging
Improves CLI reliability and security posture.
Update experiment creation to track sync status:
- Insert experiments with synced=0 (not synced to server)
- Align with schema update in db.zig for offline tracking
Part of offline run synchronization feature.
Add hardware-accelerated hash detection:
- Implement hasShaNi() using CPUID inline assembly for x86_64
- Detect SHA-NI support (bit 29 of EBX in leaf 7, subleaf 0)
- Cross-platform fallback for non-x86_64 architectures
- Enables hardware-accelerated SHA-256 when available
Improves hashing performance on modern Intel/AMD CPUs.
Add TLS transport abstraction for secure WebSocket connections:
- Create tls.zig module with TlsStream struct for TLS-encrypted sockets
- Implement Transport union in client.zig supporting both TCP and TLS
- Update frame.zig and handshake.zig to use Transport abstraction
- Add TLS handshake, read, write, flush, and close operations
- Support TLS 1.2/1.3 protocol versions with error handling
- Zig 0.15 compatible ArrayList API usage
Enables wss:// protocol support for encrypted server communication.
Remove three unused methods/parameter identified by static analysis:
- canRequeue(): never integrated into scheduling flow
- runMetricsClient clientID param: accepted but never used
- getUsageLocked(): callers inline the logic
Fixes IDE warnings about unused code per AGENTS.md cleanup discipline.
- Update E2E tests for consolidated docker-compose.test.yml
- Remove references to obsolete logs-debug.yml
- Enhance test fixtures and utilities
- Improve integration test coverage for KMS, queue, scheduler
- Update unit tests for config constants and worker execution
- Modernize cleanup-status.sh with new Makefile targets
- Update go.mod and go.sum with latest dependencies
- Remove docker-compose.local.yml and prod.smoke.yml (consolidated)
- Update CI workflow configurations