fetch_ml/internal
Jeremie Fraeys 3187ff26ea
refactor: complete maintainability phases 1-9 and fix all tests
Test fixes (all 41 test packages now pass):
- Fix ComputeTaskProvenance - add dataset_specs JSON output
- Fix EnforceTaskProvenance - populate all metadata fields in best-effort mode
- Fix PrewarmNextOnce - preserve prewarm state when queue empty
- Fix RunManifest directory creation in SetupJobDirectories
- Add ManifestWriter to test worker (simpleManifestWriter)
- Fix worker ID mismatch (use cfg.WorkerID)
- Fix WebSocket binary protocol responses
- Implement all WebSocket handlers: QueueJob, QueueJobWithSnapshot, StatusRequest,
  CancelJob, Prune, ValidateRequest (with run manifest validation), LogMetric,
  GetExperiment, DatasetList/Register/Info/Search

Maintainability phases completed:
- Phases 1-6: Domain types, error system, config boundaries, worker/API/queue splits
- Phase 7: TUI cleanup - reorganize model package (jobs.go, messages.go, styles.go, keys.go)
- Phase 8: MLServer unification - consolidate worker + TUI into internal/network/mlserver.go
- Phase 9: CI enforcement - add scripts/ci-checks.sh with 5 checks:
  * No internal/ -> cmd/ imports
  * domain/ has zero internal imports
  * File size limit (500 lines, rigid)
  * No circular imports
  * Package naming conventions

Documentation:
- Add docs/src/file-naming-conventions.md
- Add make ci-checks target

Lines changed: +756/-36 (WebSocket fixes), +518/-320 (TUI), +263/-20 (Phase 8-9)
2026-02-17 20:32:14 -05:00
..
api refactor: Phase 7 - TUI cleanup - reorganize model package 2026-02-17 20:22:04 -05:00
audit feat(tracking): add pluggable tracking backends and audit support 2026-01-05 12:33:57 -05:00
auth feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
config refactor: Phase 3 - fix config/storage boundaries 2026-02-17 12:49:53 -05:00
container feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
controller Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
domain refactor: extract domain types and consolidate error system (Phases 1-2) 2026-02-17 12:34:28 -05:00
envpool feat(worker): add integrity checks, snapshot staging, and prewarm support 2026-01-05 12:31:13 -05:00
errtypes Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
experiment refactor: Export SelectDependencyManifest for API helpers 2026-02-17 16:45:59 -05:00
fileutil Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
jupyter feat(core): API, worker, queue, and manifest improvements 2026-02-12 12:05:17 -05:00
logging feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
manifest feat(core): API, worker, queue, and manifest improvements 2026-02-12 12:05:17 -05:00
metrics refactor: Phase 6 - Complete migration, remove legacy files 2026-02-17 14:39:48 -05:00
middleware feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
network refactor: Phase 7 - TUI cleanup - reorganize model package 2026-02-17 20:22:04 -05:00
prommetrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
queue refactor: Phase 6 - Queue Restructure 2026-02-17 13:41:06 -05:00
resources feat(worker): add integrity checks, snapshot staging, and prewarm support 2026-01-05 12:31:13 -05:00
storage refactor: Phase 3 - fix config/storage boundaries 2026-02-17 12:49:53 -05:00
telemetry Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
tracking feat(tracking): add pluggable tracking backends and audit support 2026-01-05 12:33:57 -05:00
worker refactor: complete maintainability phases 1-9 and fix all tests 2026-02-17 20:32:14 -05:00