fetch_ml/internal/worker
Jeremie Fraeys 3248279c01
refactor: Phase 3 - Extract data integrity layer
Created integrity package with extracted data utilities:

1. internal/worker/integrity/hash.go (113 lines)
   - FileSHA256Hex() - SHA256 hash of single file
   - NormalizeSHA256ChecksumHex() - Checksum normalization
   - DirOverallSHA256Hex() - Directory hash (sequential)
   - DirOverallSHA256HexParallel() - Directory hash (parallel workers)

2. internal/worker/integrity/validate.go (76 lines)
   - DatasetVerifier type for dataset validation
   - VerifyDatasetSpecs() method for checksum validation
   - ProvenanceCalculator type for provenance computation
   - ComputeProvenance() method for task provenance

Note: Used 'integrity' instead of 'data' due to .gitignore conflict
(data/ directory is ignored for experiment artifacts)

Functions extracted from data_integrity.go:
- fileSHA256Hex → FileSHA256Hex
- normalizeSHA256ChecksumHex → NormalizeSHA256ChecksumHex
- dirOverallSHA256HexGo → DirOverallSHA256Hex
- dirOverallSHA256HexParallel → DirOverallSHA256HexParallel
- verifyDatasetSpecs logic → DatasetVerifier
- computeTaskProvenance logic → ProvenanceCalculator

Build status: Compiles successfully
2026-02-17 14:20:41 -05:00
..
execution refactor: Phase 4 deferred - Extract GPU utilities and execution helpers 2026-02-17 14:03:11 -05:00
executor refactor: Phase 2 - Extract executor implementations 2026-02-17 14:14:04 -05:00
integrity refactor: Phase 3 - Extract data integrity layer 2026-02-17 14:20:41 -05:00
interfaces refactor: Phase 1 - Extract worker interfaces 2026-02-17 14:10:03 -05:00
artifacts.go perf: add profiling benchmarks and parallel Go baseline for C++ optimization 2026-02-12 12:04:02 -05:00
config.go refactor: Phase 4 - split worker package into focused files 2026-02-17 12:57:02 -05:00
data_integrity.go ci: push all workflow updates 2026-02-12 13:28:15 -05:00
execution.go refactor: Phase 4 deferred - Extract GPU utilities and execution helpers 2026-02-17 14:03:11 -05:00
factory.go refactor: Phase 4 - split worker package into focused files 2026-02-17 12:57:02 -05:00
gpu.go refactor: Phase 4 deferred - Extract GPU utilities and execution helpers 2026-02-17 14:03:11 -05:00
gpu_detector.go feat(worker): add integrity checks, snapshot staging, and prewarm support 2026-01-05 12:31:13 -05:00
hash_selector.go feat: add native library bridge and queue integration 2026-02-16 20:38:30 -05:00
jupyter_task.go feat(core): API, worker, queue, and manifest improvements 2026-02-12 12:05:17 -05:00
metrics.go refactor: Phase 4 - split worker package into focused files 2026-02-17 12:57:02 -05:00
native_bridge.go feat: implement C++ native libraries for performance-critical operations 2026-02-16 20:38:04 -05:00
native_bridge_libs.go feat: add native library bridge and queue integration 2026-02-16 20:38:30 -05:00
native_bridge_nocgo.go feat: add native library bridge and queue integration 2026-02-16 20:38:30 -05:00
runloop.go feat: add native library bridge and queue integration 2026-02-16 20:38:30 -05:00
snapshot_store.go perf: add profiling benchmarks and parallel Go baseline for C++ optimization 2026-02-12 12:04:02 -05:00
worker.go refactor: Phase 4 - split worker package into focused files 2026-02-17 12:57:02 -05:00