Jeremie Fraeys
fc2459977c
refactor(worker): update worker tests and native bridge
...
**Worker Refactoring:**
- Update internal/worker/factory.go, worker.go, snapshot_store.go
- Update native_bridge.go and native_bridge_nocgo.go for native library integration
**Test Updates:**
- Update all worker unit tests for new interfaces
- Update chaos tests
- Update container/podman_test.go
- Add internal/workertest/worker.go for shared test utilities
**Documentation:**
- Update native/README.md
2026-02-23 18:04:22 -05:00
Jeremie Fraeys
158c525bef
fix: resolve benchmark and build tag conflicts
...
- Remove duplicate hash_selector.go (build tags handle switching)
- Fix benchmark to use worker.DirOverallSHA256Hex
- Fix snapshot_store.go to use integrity.DirOverallSHA256Hex directly
- Native tests pass, benchmarks now correctly test native vs Go
2026-02-21 14:26:48 -05:00
Jeremie Fraeys
33b893a71a
refactor: adopt PathRegistry in worker snapshot_store.go
...
Update internal/worker/snapshot_store.go to use centralized PathRegistry:
Changes:
- Add import for internal/config package
- Update ResolveSnapshot to use config.FromEnv() for directory creation
- Replace os.MkdirAll with paths.EnsureDir() for tmpRoot
- Replace os.MkdirAll with paths.EnsureDir() for extractDir
- Replace os.MkdirAll with paths.EnsureDir() for cacheDir parent
Benefits:
- Consistent directory creation via PathRegistry
- Centralized path management for snapshot storage
- Better error handling for directory creation
2026-02-18 16:56:27 -05:00
Jeremie Fraeys
38fa017b8e
refactor: Phase 6 - Complete migration, remove legacy files
...
BREAKING CHANGE: Legacy worker files removed, Worker struct simplified
Changes:
1. worker.go - Simplified to 8 fields using composed dependencies:
- runLoop, runner, metrics, health (from new packages)
- Removed: server, queue, running, datasetCache, ctx, cancel, etc.
2. factory.go - Updated NewWorker to use new structure
- Uses lifecycle.NewRunLoop
- Integrates jupyter.Manager properly
3. Removed legacy files:
- execution.go (1,016 lines)
- data_integrity.go (929 lines)
- runloop.go (555 lines)
- jupyter_task.go (144 lines)
- simplified.go (demonstration no longer needed)
4. Fixed references to use new packages:
- hash_selector.go -> integrity.DirOverallSHA256Hex
- snapshot_store.go -> integrity.NormalizeSHA256ChecksumHex
- metrics.go - Removed resource-dependent metrics temporarily
5. Added RecordQueueLatency to metrics.Metrics for lifecycle.MetricsRecorder
Worker struct: 27 fields -> 8 fields (70% reduction)
Build status: Compiles successfully
2026-02-17 14:39:48 -05:00
Jeremie Fraeys
72b4b29ecd
perf: add profiling benchmarks and parallel Go baseline for C++ optimization
...
Add comprehensive benchmarking suite for C++ optimization targets:
- tests/benchmarks/dataset_hash_bench_test.go - dirOverallSHA256Hex profiling
- tests/benchmarks/queue_bench_test.go - filesystem queue profiling
- tests/benchmarks/artifact_and_snapshot_bench_test.go - scanArtifacts/extractTarGz profiling
- tests/unit/worker/artifacts_test.go - moved from internal/ for clean separation
Add parallel Go implementation as baseline for C++ comparison:
- internal/worker/data_integrity.go: dirOverallSHA256HexParallel() with worker pool
- Benchmarks show 2.1x speedup (3.97ms -> 1.90ms) vs sequential
Exported wrappers for testing:
- ScanArtifacts() - artifact scanning
- ExtractTarGz() - tar.gz extraction
- DirOverallSHA256HexParallel() - parallel hashing
Profiling results (Apple M2 Ultra):
- dirOverallSHA256Hex: 78% syscall overhead (target for mmap C++)
- rebuildIndex: 96% syscall overhead (target for binary index C++)
- scanArtifacts: 87% syscall overhead (target for fast traversal C++)
- extractTarGz: 95% syscall overhead (target for parallel gzip C++)
Related: C++ optimization strategy in memory 5d5f0bb6
2026-02-12 12:04:02 -05:00
Jeremie Fraeys
82034c68f3
feat(worker): add integrity checks, snapshot staging, and prewarm support
2026-01-05 12:31:13 -05:00