Jeremie Fraeys
6580917ba8
refactor: extract domain types and consolidate error system (Phases 1-2)
Phase 1: Extract Domain Types
=============================
- Create internal/domain/ package with canonical types:
- domain/task.go: Task, Attempt structs
- domain/tracking.go: TrackingConfig and MLflow/TensorBoard/Wandb configs
- domain/dataset.go: DatasetSpec
- domain/status.go: JobStatus constants
- domain/errors.go: FailureClass system with classification functions
- domain/doc.go: package documentation
- Update queue/task.go to re-export domain types (backward compatibility)
- Update TUI model/state.go to use domain types via type aliases
- Simplify TUI services: remove ~60 lines of conversion functions
Phase 2: Delete ErrorCategory System
====================================
- Remove deprecated ErrorCategory type and constants
- Remove TaskError struct and related functions
- Remove mapping functions: ClassifyError, IsRetryable, GetUserMessage, RetryDelay
- Update all queue implementations to use domain.FailureClass directly:
- queue/metrics.go: RecordTaskFailure/Retry now take FailureClass
- queue/queue.go: RetryTask uses domain.ClassifyFailure
- queue/filesystem_queue.go: RetryTask and MoveToDeadLetterQueue updated
- queue/sqlite_queue.go: RetryTask and MoveToDeadLetterQueue updated
Lines eliminated: ~190 lines of conversion and mapping code
Result: Single source of truth for domain types and error classification