fetch_ml/internal
Jeremie Fraeys 6580917ba8
refactor: extract domain types and consolidate error system (Phases 1-2)
Phase 1: Extract Domain Types
=============================
- Create internal/domain/ package with canonical types:
  - domain/task.go: Task, Attempt structs
  - domain/tracking.go: TrackingConfig and MLflow/TensorBoard/Wandb configs
  - domain/dataset.go: DatasetSpec
  - domain/status.go: JobStatus constants
  - domain/errors.go: FailureClass system with classification functions
  - domain/doc.go: package documentation

- Update queue/task.go to re-export domain types (backward compatibility)
- Update TUI model/state.go to use domain types via type aliases
- Simplify TUI services: remove ~60 lines of conversion functions

Phase 2: Delete ErrorCategory System
====================================
- Remove deprecated ErrorCategory type and constants
- Remove TaskError struct and related functions
- Remove mapping functions: ClassifyError, IsRetryable, GetUserMessage, RetryDelay
- Update all queue implementations to use domain.FailureClass directly:
  - queue/metrics.go: RecordTaskFailure/Retry now take FailureClass
  - queue/queue.go: RetryTask uses domain.ClassifyFailure
  - queue/filesystem_queue.go: RetryTask and MoveToDeadLetterQueue updated
  - queue/sqlite_queue.go: RetryTask and MoveToDeadLetterQueue updated

Lines eliminated: ~190 lines of conversion and mapping code
Result: Single source of truth for domain types and error classification
2026-02-17 12:34:28 -05:00
..
api refactor: improve API structure and WebSocket protocol 2026-02-16 20:38:12 -05:00
audit feat(tracking): add pluggable tracking backends and audit support 2026-01-05 12:33:57 -05:00
auth feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
config feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
container feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
controller Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
domain refactor: extract domain types and consolidate error system (Phases 1-2) 2026-02-17 12:34:28 -05:00
envpool feat(worker): add integrity checks, snapshot staging, and prewarm support 2026-01-05 12:31:13 -05:00
errtypes Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
experiment feat: implement C++ native libraries for performance-critical operations 2026-02-16 20:38:04 -05:00
fileutil Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
jupyter feat(core): API, worker, queue, and manifest improvements 2026-02-12 12:05:17 -05:00
logging feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
manifest feat(core): API, worker, queue, and manifest improvements 2026-02-12 12:05:17 -05:00
metrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
middleware feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
network feat(jupyter): improve runtime management and update security/workflow docs 2026-01-05 12:37:27 -05:00
prommetrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
queue refactor: extract domain types and consolidate error system (Phases 1-2) 2026-02-17 12:34:28 -05:00
resources feat(worker): add integrity checks, snapshot staging, and prewarm support 2026-01-05 12:31:13 -05:00
storage refactor(storage,queue): split storage layer and add sqlite queue backend 2026-01-05 12:31:02 -05:00
telemetry Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
tracking feat(tracking): add pluggable tracking backends and audit support 2026-01-05 12:33:57 -05:00
worker feat: add native library bridge and queue integration 2026-02-16 20:38:30 -05:00