- Merge test-native job from ci-native.yml into ci.yml
- Remove duplicate security-scan job (kept in security-scan.yml)
- Remove cache steps from merged native job (infra not available)
- Update Go version to 1.25.0 for consistency
- Update docker-build needs to include test-native
- Delete ci-native.yml
Add SHA256 for rsync 3.3.0: 7399e9a6708c32d678a72a63219e96f23be0be2336e50fd1348498d07041df90
This allows the build to proceed without requiring GPG keyring setup in CI
Cache infrastructure is not implemented in the Forgejo instance, causing timeouts
Removed:
- actions/cache steps for Go modules, Zig build, native libs, apt packages
- Docker buildx layer caching (cache-from/cache-to)
- Switch from ziglang.org/download to github.com/ziglang/zig/releases
- Add curl -fsSL --retry 3 for reliable downloads
- Use file-based extraction instead of pipe-to-tar for verification
- Add sendSyncRun method for run synchronization
- Add sendRerunRequest method for queue rerun
- Add sync_run (0x26) and rerun_request (0x27) opcodes
- Fix protocol import path to relative path
- Fix db.Stmt type alias usage in sync.zig
- Update TUI controller loadGPU() to use NVML when available
- Prioritize NVML over nvidia-smi command for better performance
- Show additional metrics: power draw, SM clock when available
- Maintain fallback to nvidia-smi and system_profiler
- Add stub implementation in nvml_gpu.cpp when NVML not available
- CMakeLists.txt checks for both NVML library and headers
- Build succeeds on macOS/non-NVIDIA systems with stub
- Runtime detection via gpu_is_available() prevents runtime errors
- Add native/nvml_gpu/ C++ library wrapping NVIDIA Management Library
- Add Go bindings in internal/worker/gpu_nvml_native.go and gpu_nvml_stub.go
- Update gpu_detector.go to use NVML for accurate GPU count detection
- Update native/CMakeLists.txt to build nvml_gpu library
- Provides real-time GPU utilization, memory, temperature, clocks, power
- Falls back to environment variable when NVML unavailable
- README.md: Replace FETCHML_NATIVE_LIBS with -tags native_libs
- docs/src/native-libraries.md: Update all examples to use build tags
- .forgejo/workflows/ci-native.yml: Use -tags native_libs in all test steps
- Remove deprecated FETCHML_NATIVE_LIBS=1/0 env var references
- Replace worker.DirOverallSHA256HexParallel with worker.DirOverallSHA256Hex
- Fixes in dataset_hash_bench_test.go and hash_bench_test.go
- All benchmarks pass with native_libs build tag
- Remove duplicate hash_selector.go (build tags handle switching)
- Fix benchmark to use worker.DirOverallSHA256Hex
- Fix snapshot_store.go to use integrity.DirOverallSHA256Hex directly
- Native tests pass, benchmarks now correctly test native vs Go
Go Worker (internal/worker/native_bridge_libs.go):
- Add global hashCtx with sync.Once for lazy initialization
- Eliminates 5-20ms fh_init/fh_cleanup per hash operation
- Uses runtime.NumCPU() for optimal thread count
- Log initialization time for observability
Zig CLI (cli/src/native/hash.zig):
- Add global_ctx with atomic flag and mutex
- Thread-safe initialization with double-check pattern
- Idempotent init() callable from multiple threads
- Log init time for debugging
- Remove separate 'hash' subcommand
- Integrate native SHA256 hash into 'dataset verify'
- Hash is now computed automatically when verifying datasets
- Shows hash in output (JSON, CSV, and text formats)
- Help text updated to indicate auto-hashing
Replace FETCHML_NATIVE_LIBS=1 environment variable with -tags native_libs:
Changes:
- internal/queue/native_queue.go: UseNativeQueue is now const true
- internal/queue/native_queue_stub.go: UseNativeQueue is now const false
- build/docker/simple.Dockerfile: Add -tags native_libs to go build
- deployments/docker-compose.dev.yml: Remove FETCHML_NATIVE_LIBS env var
- native/README.md: Update documentation for build tags
- scripts/test-native-with-redis.sh: New test script with Redis via docker-compose
Benefits:
- Compile-time enforcement (no runtime checks needed)
- Cleaner deployment (no env var management)
- Type safety (const vs var)
- Simpler testing with docker-compose Redis integration
Add comprehensive explanation of the reproducibility problem and fix:
- Document readdir filesystem-dependent ordering issue
- Explain std::sort fix for lexicographic ordering
- Clarify recursive traversal with cycle detection
- Document hidden file and special file exclusions
- Warn researchers about silent omissions and empty hash edge cases
This addresses the core concern that researchers need to understand
the hash is computed over sorted paths to trust cross-machine verification.
- Renamed note.zig to annotate.zig (preserves user's preferred naming)
- Updated all references from 'ml note' to 'ml annotate'
- Re-added experiment.zig with create/list/show subcommands
- Updated main.zig dispatch: 'a' for annotate, 'e' for experiment
- Updated printUsage and test block to reflect changes
- store/store.go: New SQLite storage for TUI local mode
- Open() with WAL mode and NORMAL synchronous
- Schema initialization for ml_experiments, ml_runs, ml_metrics, ml_params, ml_tags
- GetUnsyncedRuns(), GetRunsByExperiment(), MarkRunSynced()
- GetRunMetrics(), GetRunParams() for run details
- config/config.go: Add local mode configuration fields
- DBPath, ForceLocal, ProjectRoot fields
- Experiment struct with Name and Entrypoint
- IsLocalMode() and GetDBPath() helper methods
- go.mod: Add modernc.org/sqlite v1.36.0 dependency
- queue.zig: Add --rerun <run_id> flag to re-queue completed local runs
- Requires server connection, rejects in offline mode with clear error
- HandleRerun function sends rerun request via WebSocket
- sync.zig: Rewrite for WebSocket experiment sync protocol
- Queries unsynced runs from SQLite ml_runs table
- Builds sync JSON with metrics and params
- Sends sync_run message, waits for sync_ack response
- MarkRunSynced updates synced flag in database
- watch.zig: Add --sync flag for continuous experiment sync
- Auto-sync runs to server every 30 seconds when online
- Mode detection with offline error handling
- note.zig: New unified metadata annotation command
- Supports --text, --hypothesis, --outcome, --confidence, --privacy, --author
- Stores metadata as tags in SQLite ml_tags table
- log.zig: Simplified to unified logs command (fetch/stream only)
- Removed metric/param/tag subcommands (now in run wrapper)
- Supports --follow for live log streaming from server
- cancel.zig: Add local process termination support
- Sends SIGTERM first, waits 5s, then SIGKILL if needed
- Updates run status to CANCELLED in SQLite
- Also supports server job cancellation via WebSocket
- Fork child process and capture stdout/stderr via pipe
- Parse FETCHML_METRIC key=value [step=N] lines from output
- Write run_manifest.json with run metadata
- Insert/update ml_runs table in SQLite with PID tracking
- Stream output to output.log file
- Support entrypoint from config or explicit command after --
- Update experiment.zig with unified commands (local + server modes)
- Add init.zig for local project initialization
- Update sync.zig for project synchronization
- Update main.zig to route new local mode commands (experiment, run, log)
- Support automatic mode detection from config (sqlite:// vs wss://)
- Add SQLite amalgamation fetch script (make build-sqlite)
- Embed SQLite in release builds, link system lib in dev
- Create sqlite_embedded.zig utility module
- Unify experiment/run/log commands with auto mode detection
- Add Forgejo CI workflow for building with embedded SQLite
- Update READMEs for local mode and build instructions
SQLite follows rsync embedding pattern: assets/sqlite_release_<os>_<arch>/
Zero external dependencies for release builds.
- Fix duplicate check in security_test.go lint warning
- Mark SHA256 tests as Legacy for backward compatibility
- Convert TODO comments to documentation (task, handlers, privacy)
- Update user_manager_test to use GenerateAPIKey pattern