Jeremie Fraeys
6866ba9366
refactor(queue): integrate scheduler backend and storage improvements
...
Update queue and storage systems for scheduler integration:
- Queue backend with scheduler coordination
- Filesystem queue with batch operations
- Deduplication with tenant-aware keys
- Storage layer with audit logging hooks
- Domain models (Task, Events, Errors) with scheduler fields
- Database layer with tenant isolation
- Dataset storage with integrity checks
2026-02-26 12:06:46 -05:00
Jeremie Fraeys
d9ed8f4ffa
refactor: adopt PathRegistry in queue filesystem_queue.go
...
Update internal/queue/filesystem_queue.go to use centralized PathRegistry:
Changes:
- Add import for internal/config package
- Update NewFilesystemQueue to use config.FromEnv() for directory creation
- Replace os.MkdirAll with paths.EnsureDir() for all queue directories:
- pending/entries
- running
- finished
- failed
Benefits:
- Consistent directory creation via PathRegistry
- Centralized path management for queue storage
- Better error handling for directory creation
2026-02-18 16:57:45 -05:00
Jeremie Fraeys
6580917ba8
refactor: extract domain types and consolidate error system (Phases 1-2)
...
Phase 1: Extract Domain Types
=============================
- Create internal/domain/ package with canonical types:
- domain/task.go: Task, Attempt structs
- domain/tracking.go: TrackingConfig and MLflow/TensorBoard/Wandb configs
- domain/dataset.go: DatasetSpec
- domain/status.go: JobStatus constants
- domain/errors.go: FailureClass system with classification functions
- domain/doc.go: package documentation
- Update queue/task.go to re-export domain types (backward compatibility)
- Update TUI model/state.go to use domain types via type aliases
- Simplify TUI services: remove ~60 lines of conversion functions
Phase 2: Delete ErrorCategory System
====================================
- Remove deprecated ErrorCategory type and constants
- Remove TaskError struct and related functions
- Remove mapping functions: ClassifyError, IsRetryable, GetUserMessage, RetryDelay
- Update all queue implementations to use domain.FailureClass directly:
- queue/metrics.go: RecordTaskFailure/Retry now take FailureClass
- queue/queue.go: RetryTask uses domain.ClassifyFailure
- queue/filesystem_queue.go: RetryTask and MoveToDeadLetterQueue updated
- queue/sqlite_queue.go: RetryTask and MoveToDeadLetterQueue updated
Lines eliminated: ~190 lines of conversion and mapping code
Result: Single source of truth for domain types and error classification
2026-02-17 12:34:28 -05:00
Jeremie Fraeys
2e701340e5
feat(core): API, worker, queue, and manifest improvements
...
- Add protocol buffer optimizations (internal/api/protocol.go)
- Add filesystem queue backend (internal/queue/filesystem_queue.go)
- Add run manifest support (internal/manifest/run_manifest.go)
- Worker and jupyter task refinements
- Exported test wrappers for benchmarking
2026-02-12 12:05:17 -05:00