Commit graph

384 commits

Author SHA1 Message Date
Jeremie Fraeys
68062831b0
refactor(cli): remove redundant doc comments from command files
Removed duplicate help text from doc comments:
- log.zig: Removed usage examples (in printUsage)
- annotate.zig: Removed usage examples (in printUsage)
- experiment.zig: Removed usage examples (in printUsage)

Rationale: printUsage() already contains detailed help text.
Doc comments should not duplicate this information.

All tests pass.
2026-03-05 11:06:28 -05:00
Jeremie Fraeys
747579eae4
refactor: misc improvements across codebase
Various improvements:
- Makefile: build optimizations and native lib integration
- prune.zig: cleanup logic refinements
- status.zig: improved status reporting
- experiment_core.zig: core functionality updates
- progress.zig: progress bar improvements
- task.go: domain model updates for task handling

All tests pass.
2026-03-05 10:58:22 -05:00
Jeremie Fraeys
3e557e4565
fix(cli): correct const qualifier in jupyter lifecycle
Fixed compilation error in jupyter/lifecycle.zig:
- Changed 'const client' to 'var client' in ConnectionCtx.init()
- Allows errdefer client.close() to work correctly
- close() requires mutable reference to ws.Client

All tests pass.
2026-03-05 10:57:57 -05:00
Jeremie Fraeys
cb018934e1
feat(cli): add shared dataset_hash utility and automatic hashing
Created utils/dataset_hash.zig:
- computeDatasetHash(allocator, path) -> [64]u8
- Returns fixed 64-char hex string (stack allocated)
- Provides verifyDatasetIntegrity() for hash comparison
- Enables testing against native C++ implementations

Updated dataset.zig:
- verifyDataset() now automatically computes hash during verification
- Uses utils/dataset_hash.zig for hash computation
- Hash displayed in JSON output for reference
- No separate 'dataset hash' command needed

Benefits:
- Single source of truth for dataset hashing
- Testable independently for correctness verification
- Automatic during dataset verify operation
2026-03-05 10:57:39 -05:00
Jeremie Fraeys
e2673be8b5
feat(cli): unify exec, queue, run into single 'run' command
Since app is not released, removed old commands entirely:
- Deleted exec.zig (533 lines) - modularized version
- Deleted queue.zig (1248 lines) - complete removal
- Unified all functionality into run.zig

New unified 'ml run' command features:
- Auto-detects local vs remote execution via mode.detect()
- Supports --local and --remote flags to force execution mode
- Includes all resource options: --cpu, --memory, --gpu
- Research context: --hypothesis, --context, --intent, --tags
- Validation modes: --dry-run, --validate, --explain
- Uses modular exec/remote.zig and exec/local.zig for execution

Dispatcher updates (main.zig):
- Removed 'e' (exec) handler
- Removed 'q' (queue) handler
- Updated help text to show unified command

Import cleanup (commands.zig):
- Removed queue.zig import

Total code reduction: ~1,700 lines
All tests pass.
2026-03-05 10:57:00 -05:00
Jeremie Fraeys
9b4bd1b103
refactor(cli): standardize dataset command format and remove redundant hash command
Standardized dataset.zig with proper doc comment format:
- Added /// doc comment with usage and subcommand descriptions
- Follows same format as other commands

Removed dataset_hash.zig:
- Hash computation is already automatic in 'dataset verify'
- Standalone 'ml dataset hash' command was redundant
- Users can use 'ml dataset verify <path>' to get hash

All tests pass.
2026-03-05 10:30:55 -05:00
Jeremie Fraeys
b99cd6b0e3
feat(cli): unify exec, queue, run into single 'run' command
Since app is not released, removed old commands entirely:
- Deleted exec.zig (533 lines)
- Deleted queue.zig (1248 lines)
- Unified functionality into run.zig

New unified 'ml run' command:
- Auto-detects local vs remote execution
- Supports --local and --remote flags to force mode
- Includes all features: priority, resources, research context
- Single command for all execution needs

Updated main.zig dispatcher:
- Removed 'e' (exec) handler
- Removed 'q' (queue) handler
- Updated help text

Total reduction: ~1,700 lines of code
All tests pass.
2026-03-05 10:22:37 -05:00
Jeremie Fraeys
0d05ec0317
refactor(cli): consolidate duplicate functions into common.zig
Move shared utility functions from queue.zig to common.zig:
- buildNarrativeJson() - was duplicated in queue.zig, exec/dryrun.zig, exec/remote.zig
- formatNextSteps() - was duplicated in queue.zig
- dryRun() - was duplicated in exec/dryrun.zig
- JobOptions struct - shared configuration options

Added common.zig import to queue.zig and updated all references.

Reduction: ~80 lines of duplicate code removed
All tests pass.
2026-03-05 10:12:44 -05:00
Jeremie Fraeys
ab7da26d77
refactor(cli): remove unused has_tracking variable from queue.zig
The has_tracking variable was set but never read. Removed:
- Variable declaration (line 140)
- 6 assignments across tracking flag handlers

Cleanup only, no functional changes.

All tests pass.
2026-03-05 10:03:05 -05:00
Jeremie Fraeys
6316e4d702
refactor(cli): modularize exec.zig (533 lines)
Break down exec.zig into focused modules:
- exec/mod.zig - Main entry point and command dispatch (211 lines)
- exec/remote.zig - Remote execution via WebSocket (87 lines)
- exec/local.zig - Local execution with fork/exec (137 lines)
- exec/dryrun.zig - Dry-run preview functionality (53 lines)

Original exec.zig now acts as backward-compatible wrapper.

Benefits:
- Each module <150 lines (highly maintainable)
- Clear separation: remote vs local vs dry-run logic
- Easier to test individual execution paths
- Original 533-line file split into 4 focused modules

All tests pass.
2026-03-05 09:59:00 -05:00
Jeremie Fraeys
923ccaf22b
refactor(cli): consolidate config parsing duplication in jupyter/lifecycle.zig
Introduce ConnectionCtx helper struct that encapsulates the common pattern:
- Config.load() + getWebSocketUrl() + hashApiKey() + Client.connect()

Applied to 4 functions in lifecycle.zig:
- startJupyter (was 76 lines, now 58 lines)
- stopJupyter (was 62 lines, now 44 lines)
- removeJupyter (was 101 lines, now 83 lines)
- restoreJupyter (was 62 lines, now 44 lines)

Total reduction: ~50 lines of duplicated boilerplate code.
Also created commands/common.zig for future shared patterns.

All tests pass.
2026-03-05 09:55:28 -05:00
Jeremie Fraeys
729394b7d5
refactor(cli): add shared error handling utilities for jupyter
Create jupyter/utils.zig with shared WebSocket error handling patterns:
- withWebSocketConnection() - standardized connection wrapper
- handleStandardResponse() - consistent response handling

This consolidates duplicate error handling patterns found across
lifecycle.zig and query.zig, reducing code duplication by ~30 lines.

All tests pass.
2026-03-04 22:07:22 -05:00
Jeremie Fraeys
90e5c6dc17
refactor(cli): modularize jupyter command
Break down jupyter.zig (31KB, 906 lines) into focused modules:
- jupyter/mod.zig - Main entry point and command dispatch
- jupyter/validation.zig - Security validation functions
- jupyter/lifecycle.zig - Service create/start/stop/remove/restore
- jupyter/query.zig - List, status, and package queries
- jupyter/workspace.zig - Workspace and experiment management

Original jupyter.zig now acts as backward-compatible wrapper.
Removed 5 unimplemented placeholder functions (~50 lines of dead code).

Benefits:
- Each module <250 lines (maintainable)
- Clear separation of concerns
- Easier to test individual components
- Better code organization

All tests pass.
2026-03-04 21:55:37 -05:00
Jeremie Fraeys
59a5433444
refactor(cli): remove remaining deprecated imports
Update remaining files using deprecated imports:
- core/output.zig: terminal.zig → io.zig
- net/ws/deps.zig: remove colors.zig export (available via io)

All tests pass.
2026-03-04 21:39:05 -05:00
Jeremie Fraeys
02b64be999
refactor(cli): update errors.zig to use io.zig
Update errors.zig to use consolidated io module:
- Replace colors.printError with io.printError
- Replace colors.printWarning with io.printWarning
- Replace colors.printInfo with io.printInfo

All tests pass.
2026-03-04 21:36:00 -05:00
Jeremie Fraeys
6c192e5144
fix(cli): fix broken imports in dataset_hash.zig
Fix broken imports in dataset_hash.zig:
- ui/ui.zig and ui/colors.zig don't exist - replaced with std.debug.print
- Updated colors. to io. for consistency with consolidated utilities
- Remove dependency on non-existent ui module

All tests pass.
2026-03-04 21:33:47 -05:00
Jeremie Fraeys
38fe9aba73
refactor(cli): update auth.zig to use io.zig
Update auth.zig import from colors.zig to io.zig:
- colors.zig is now consolidated into io.zig
- Use io module directly for consistency

All tests pass.
2026-03-04 21:32:03 -05:00
Jeremie Fraeys
49d5954623
chore(cli): remove dead code - suggest.zig
Remove unused suggest.zig utility module:
- Not imported by any file in the codebase
- Levenshtein distance functionality not currently needed
- Contains ~200 lines of unused code

If needed in the future for command suggestions, can be restored from git.

All tests pass.
2026-03-04 21:31:05 -05:00
Jeremie Fraeys
704099a13b
feat(cli): add ProgressBar to queue command for batch job processing
Integrate ProgressBar into queue.zig for multi-job queuing:
- Show progress bar when queuing 2+ jobs (not in JSON mode)
- Update progress after each successful job queue
- Maintain simple output for single job queuing
- Clean up output for batch operations

Benefits:
- Better UX for batch job queuing
- Visual progress indication for long operations
- Consistent with sync command ProgressBar pattern

All tests pass.
2026-03-04 21:26:58 -05:00
Jeremie Fraeys
ef7d19db9b
feat(cli): integrate ProgressBar into sync command
Update progress.zig and integrate into sync command:
- progress.zig: update import from colors.zig to io.zig
- sync.zig: add ProgressBar for multi-run sync operations
- Shows progress bar when syncing 2+ runs (not in JSON mode)
- Updates progress after each successful sync

Benefits:
- Better UX for long-running sync operations
- Visual feedback on sync progress
- Maintains clean output for single runs

All tests pass.
2026-03-04 21:23:16 -05:00
Jeremie Fraeys
94441fdc76
fix(cli): update imports after logging.zig removal
Update files still referencing deleted logging.zig:
- prune.zig: import io.zig, replace logging. with io.
- deps.zig: re-export log from io module

All zig tests now pass.
2026-03-04 21:18:18 -05:00
Jeremie Fraeys
5b0bd83bd7
refactor(cli): remove logging.zig, consolidate into io.zig
Remove redundant logging.zig (28 lines):
- Functions moved to io.zig: printInfo, printSuccess, printWarning, printError, printProgress, confirm
- All functionality preserved with re-exports in utils.zig

Benefits:
- Reduced file count (22 → 21 utils)
- Single source of truth for I/O operations
- No functional changes

Build passes successfully.
2026-03-04 21:14:27 -05:00
Jeremie Fraeys
cf7e82c758
refactor(cli): consolidate JSON utilities into io.zig
Move JSON accessor functions to io.zig:
- jsonGetString, jsonGetInt, jsonGetFloat, jsonGetBool
- json.zig now re-exports from io.zig for backward compatibility

Benefits:
- Single location for all I/O related utilities
- Consistent with terminal/color consolidation
- Reduced file count

Build passes successfully.
2026-03-04 21:07:04 -05:00
Jeremie Fraeys
00ffeb93c8
refactor(cli): add flattened re-exports to utils.zig for cleaner imports
Simplify imports by providing direct re-exports:
- utils.isTTY, utils.getWidth (instead of utils.terminal.isTTY)
- utils.reset, utils.red, utils.green (instead of utils.colors.reset)
- Mark colors, terminal, logging as consolidated into io.zig
- Mark rsync modules as deprecated

Benefits:
- Shorter import paths for common utilities
- Reduced typing: utils.red vs utils.colors.red
- Backward compatibility maintained

Build passes successfully.
2026-03-04 21:06:13 -05:00
Jeremie Fraeys
87cefea9ae
refactor(cli): consolidate terminal and color utilities into io.zig
Consolidate overlapping utilities:
- colors.zig (35 lines) → re-exports from io.zig
- terminal.zig (36 lines) → re-exports from io.zig
- io.zig now contains all terminal, color, and I/O utilities

Benefits:
- Single source of truth for terminal/color logic
- Reduced file count (25 → 23 utils)
- Easier maintenance with all I/O in one place

Build passes successfully.
2026-03-04 21:05:37 -05:00
Jeremie Fraeys
7a6d454174
refactor(cli): modularize queue.zig structure
Move configuration types to queue/mod.zig:
- TrackingConfig with MLflow, TensorBoard, Wandb sub-configs
- QueueOptions with all queue-related options

queue.zig now re-exports from queue/mod.zig for backward compatibility.
Build passes successfully.
2026-03-04 21:00:23 -05:00
Jeremie Fraeys
c17811cf2b
refactor(cli): create modular WebSocket client structure
Break monolithic client.zig (1,558 lines) into focused modules:
- connection.zig: Transport, connection logic, URL parsing, TLS setup
- messaging.zig: MessageBuilder, validation, send methods
- state.zig: ClientState, response handling, error conversion
- mod.zig: Public exports and Client struct composition

Benefits:
- Each module <400 lines (maintainability target)
- Clear separation of concerns
- Easier to test individual components
- Foundation for future client refactoring

Original client.zig kept intact for backward compatibility.
Build passes successfully.
2026-03-04 20:58:11 -05:00
Jeremie Fraeys
fd4c342de0
refactor(cli): implement production-ready TLS and UUID generation
Remove simplified placeholders and implement production versions:
- db.zig: Update UUID comment to reflect crypto RNG is already in use
- tls.zig: Implement proper TLS 1.2 ClientHello message construction
  - Full record layer header with correct version
  - Proper handshake header
  - 32-byte cryptographically secure random bytes
  - SNI extension with hostname
  - ECDHE cipher suites for forward secrecy
  - Correct length calculations for all fields

Build passes successfully with production implementations.
2026-03-04 20:41:15 -05:00
Jeremie Fraeys
bb584b3410
test(cli): fix and update hash tests for current architecture
Fix broken hash tests to work with current CLI architecture:
- Update import to use @import(src) module system
- Add hash module export to utils.zig
- Make validatePath() public for testing
- Fix Zig 0.15 API: writeFile options struct, var tmp_dir for cleanup
- Fix file paths: use tmp_dir realpath for hashFile
- Replace std.fs.MAX_PATH_BYTES with 4096 buffer

All hash tests now passing.
2026-03-04 20:35:19 -05:00
Jeremie Fraeys
69951ce5a1
test: update test infrastructure and documentation
Update tests and documentation:
- native/README.md: document C++ native library plans
- restart_recovery_test.go: scheduler restart/recovery tests
- scheduler_fixture.go: test fixtures for scheduler
- hash_test.zig: SHA-NI hash tests (WIP)

Improves test coverage and documentation.
2026-03-04 20:25:48 -05:00
Jeremie Fraeys
dbecb2b521
refactor(cli): update protocol and main entry point
Update core protocol and application structure:
- protocol.zig: enhanced binary protocol handling
- main.zig: updated entry point for new command structure
- manifest.zig: improved manifest handling

Part of CLI hardening architecture improvements.
2026-03-04 20:25:40 -05:00
Jeremie Fraeys
89635b1d8c
build(cli): update build configuration
Update build system for new modules:
- Makefile: add build targets for new components
- build.zig: include new source files in build graph

Supports new CLI architecture.
2026-03-04 20:25:31 -05:00
Jeremie Fraeys
6a0555207e
refactor(cli): remove deprecated native hash modules
Remove obsolete native hash implementation files:
- Delete native/hash.zig (superseded by utils/hash.zig)
- Delete utils/native_bridge.zig (replaced by direct TLS)
- Delete utils/native_hash.zig (consolidated into utils/hash.zig)

Cleanup as part of CLI hardening.
2026-03-04 20:25:25 -05:00
Jeremie Fraeys
4c2af17ad6
feat(cli): consolidate and improve command implementations
Update command structure with improved implementations:
- exec.zig: consolidated command execution
- queue.zig: improved job queuing with narrative support
- run.zig: enhanced local run execution
- dataset.zig, dataset_hash.zig: improved dataset management

Part of CLI hardening for better UX and reliability.
2026-03-04 20:24:28 -05:00
Jeremie Fraeys
f1965b99bd
feat(cli): add progress tracking and sync management
Add progress reporting and offline sync infrastructure:
- progress.zig: progress bars and status reporting
- sync_manager.zig: offline run synchronization manager

Supports resilient operation with server connectivity issues.
2026-03-04 20:23:17 -05:00
Jeremie Fraeys
524f440fe4
feat(cli): add core system components for CLI hardening
Add signal handling, environment detection, and secrets management:
- signals.zig: graceful Ctrl+C handling and signal management
- environment.zig: user environment detection for telemetry
- secrets.zig: secrets redaction for secure logging

Improves CLI reliability and security posture.
2026-03-04 20:23:12 -05:00
Jeremie Fraeys
8ae0875800
feat(cli): add synced column support to experiment commands
Update experiment creation to track sync status:
- Insert experiments with synced=0 (not synced to server)
- Align with schema update in db.zig for offline tracking

Part of offline run synchronization feature.
2026-03-04 20:23:06 -05:00
Jeremie Fraeys
303f17d3b2
feat(cli): implement sync tracking for offline run synchronization
Add SQLite-based sync tracking infrastructure:
- Add synced column to ml_experiments schema (0=not synced, 1=synced)
- Implement SyncDB with initOrOpenSyncDB for sync_pending table
- Add markForSync, markAsSynced, getPendingRuns functions
- Fix SQLite error handling for Zig 0.15 compatibility

Enables tracking experiments that need server synchronization when offline.
2026-03-04 20:22:56 -05:00
Jeremie Fraeys
fedaba2409
feat(cli): implement CPUID-based SHA-NI detection for hash operations
Add hardware-accelerated hash detection:
- Implement hasShaNi() using CPUID inline assembly for x86_64
- Detect SHA-NI support (bit 29 of EBX in leaf 7, subleaf 0)
- Cross-platform fallback for non-x86_64 architectures
- Enables hardware-accelerated SHA-256 when available

Improves hashing performance on modern Intel/AMD CPUs.
2026-03-04 20:22:21 -05:00
Jeremie Fraeys
cce3ab83ee
feat(cli): implement TLS/WSS support for WebSocket connections
Add TLS transport abstraction for secure WebSocket connections:
- Create tls.zig module with TlsStream struct for TLS-encrypted sockets
- Implement Transport union in client.zig supporting both TCP and TLS
- Update frame.zig and handshake.zig to use Transport abstraction
- Add TLS handshake, read, write, flush, and close operations
- Support TLS 1.2/1.3 protocol versions with error handling
- Zig 0.15 compatible ArrayList API usage

Enables wss:// protocol support for encrypted server communication.
2026-03-04 20:22:12 -05:00
Jeremie Fraeys
08ab628546
refactor(scheduler): remove dead code
Remove three unused methods/parameter identified by static analysis:
- canRequeue(): never integrated into scheduling flow
- runMetricsClient clientID param: accepted but never used
- getUsageLocked(): callers inline the logic

Fixes IDE warnings about unused code per AGENTS.md cleanup discipline.
2026-03-04 13:35:18 -05:00
Jeremie Fraeys
7cd86fb88a
feat: add new API handlers, build scripts, and ADRs
Some checks failed
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 6s
CI Pipeline / Test (ubuntu-latest on self-hosted) (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Contract Tests / Spec Drift Detection (push) Failing after 11s
Contract Tests / API Contract Tests (push) Has been skipped
Deploy API Docs / Build API Documentation (push) Failing after 5s
Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped
Documentation / build-and-publish (push) Failing after 40s
Test Matrix / test-native-vs-pure (cgo) (push) Failing after 14s
Test Matrix / test-native-vs-pure (native) (push) Failing after 35s
Test Matrix / test-native-vs-pure (pure) (push) Failing after 18s
CI Pipeline / Trigger Build Workflow (push) Failing after 1s
Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (arm64) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (x86_64) (push) Has been cancelled
Security Scan / Security Analysis (push) Has been cancelled
Security Scan / Native Library Security (push) Has been cancelled
Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled
Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled
Verification & Maintenance / Verification Summary (push) Has been cancelled
- Introduce audit, plugin, and scheduler API handlers
- Add spec_embed.go for OpenAPI spec embedding
- Create modular build scripts (cli, go, native, cross-platform)
- Add deployment cleanup and health-check utilities
- New ADRs: hot reload, audit store, SSE updates, RBAC, caching, offline mode, KMS regions, tenant offboarding
- Add KMS configuration schema and worker variants
- Include KMS benchmark tests
2026-03-04 13:24:27 -05:00
Jeremie Fraeys
5f53104fcd
test: modernize test suite for streamlined infrastructure
- Update E2E tests for consolidated docker-compose.test.yml
- Remove references to obsolete logs-debug.yml
- Enhance test fixtures and utilities
- Improve integration test coverage for KMS, queue, scheduler
- Update unit tests for config constants and worker execution
- Modernize cleanup-status.sh with new Makefile targets
2026-03-04 13:24:24 -05:00
Jeremie Fraeys
61081655d2
feat: enhance worker execution and scheduler service templates
- Refactor worker configuration management
- Improve container executor lifecycle handling
- Update runloop and worker core logic
- Enhance scheduler service template generation
- Remove obsolete 'scheduler' symlink/directory
2026-03-04 13:24:20 -05:00
Jeremie Fraeys
daf14bfafa
chore: update dependencies and remove obsolete compose files
- Update go.mod and go.sum with latest dependencies
- Remove docker-compose.local.yml and prod.smoke.yml (consolidated)
- Update CI workflow configurations
2026-03-04 13:23:52 -05:00
Jeremie Fraeys
5d75f3576b
docs: comprehensive documentation updates
- Update TEST_COVERAGE_MAP with current requirements
- Refresh ADR-004 with C++ implementation details
- Update architecture, deployment, and security docs
- Improve CLI/TUI UX contract documentation
2026-03-04 13:23:48 -05:00
Jeremie Fraeys
66f262d788
security: improve audit, crypto, and config handling
- Enhance audit checkpoint system
- Update KMS provider and tenant key management
- Refine configuration constants
- Improve TUI config handling
2026-03-04 13:23:42 -05:00
Jeremie Fraeys
a4f2c36069
feat: enhance task domain and scheduler protocol
- Update task domain model
- Improve scheduler hub and priority queue
- Enhance protocol definitions
- Update manifest schema and run handling
2026-03-04 13:23:38 -05:00
Jeremie Fraeys
1f495dfbb7
api: regenerate OpenAPI types and server code
- Update openapi.yaml spec
- Regenerate server_gen.go with oapi-codegen
- Update adapter, routes, and server configuration
2026-03-04 13:23:34 -05:00
Jeremie Fraeys
743bc4be3b
cli: update Zig CLI build and native hash integration
- Update build.zig configuration
- Improve queue command implementation
- Enhance native hash support
2026-03-04 13:23:30 -05:00