fetch_ml/internal
Jeremie Fraeys 3248279c01
refactor: Phase 3 - Extract data integrity layer
Created integrity package with extracted data utilities:

1. internal/worker/integrity/hash.go (113 lines)
   - FileSHA256Hex() - SHA256 hash of single file
   - NormalizeSHA256ChecksumHex() - Checksum normalization
   - DirOverallSHA256Hex() - Directory hash (sequential)
   - DirOverallSHA256HexParallel() - Directory hash (parallel workers)

2. internal/worker/integrity/validate.go (76 lines)
   - DatasetVerifier type for dataset validation
   - VerifyDatasetSpecs() method for checksum validation
   - ProvenanceCalculator type for provenance computation
   - ComputeProvenance() method for task provenance

Note: Used 'integrity' instead of 'data' due to .gitignore conflict
(data/ directory is ignored for experiment artifacts)

Functions extracted from data_integrity.go:
- fileSHA256Hex → FileSHA256Hex
- normalizeSHA256ChecksumHex → NormalizeSHA256ChecksumHex
- dirOverallSHA256HexGo → DirOverallSHA256Hex
- dirOverallSHA256HexParallel → DirOverallSHA256HexParallel
- verifyDatasetSpecs logic → DatasetVerifier
- computeTaskProvenance logic → ProvenanceCalculator

Build status: Compiles successfully
2026-02-17 14:20:41 -05:00
..
api refactor: Migrate all test imports from api to api/ws package 2026-02-17 13:52:20 -05:00
audit feat(tracking): add pluggable tracking backends and audit support 2026-01-05 12:33:57 -05:00
auth feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
config refactor: Phase 3 - fix config/storage boundaries 2026-02-17 12:49:53 -05:00
container feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
controller Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
domain refactor: extract domain types and consolidate error system (Phases 1-2) 2026-02-17 12:34:28 -05:00
envpool feat(worker): add integrity checks, snapshot staging, and prewarm support 2026-01-05 12:31:13 -05:00
errtypes Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
experiment feat: implement C++ native libraries for performance-critical operations 2026-02-16 20:38:04 -05:00
fileutil Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
jupyter feat(core): API, worker, queue, and manifest improvements 2026-02-12 12:05:17 -05:00
logging feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
manifest feat(core): API, worker, queue, and manifest improvements 2026-02-12 12:05:17 -05:00
metrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
middleware feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
network feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
prommetrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
queue refactor: Phase 6 - Queue Restructure 2026-02-17 13:41:06 -05:00
resources feat(worker): add integrity checks, snapshot staging, and prewarm support 2026-01-05 12:31:13 -05:00
storage refactor: Phase 3 - fix config/storage boundaries 2026-02-17 12:49:53 -05:00
telemetry Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
tracking feat(tracking): add pluggable tracking backends and audit support 2026-01-05 12:33:57 -05:00
worker refactor: Phase 3 - Extract data integrity layer 2026-02-17 14:20:41 -05:00