fetch_ml/internal
Jeremie Fraeys 57787e1e7b
feat(scheduler): implement capability-based routing and hub v2
Add comprehensive capability routing system to scheduler hub:
- Capability-aware worker matching with requirement/offer negotiation
- Hub v2 protocol with structured message types and heartbeat management
- Worker capability advertisement and dynamic routing decisions
- Orphan recovery for disconnected workers with state reconciliation
- Template-based job scheduling with capability constraints

Add extensive test coverage:
- Unit tests for capability routing logic and heartbeat mechanics
- Unit tests for orphan recovery scenarios
- E2E tests for capability routing across multiple workers
- Hub capabilities integration tests
- Scheduler fixture helpers for test setup

Protocol improvements:
- Define structured protocol messages for hub-worker communication
- Add capability matching algorithm with scoring
- Implement graceful worker disconnection handling
2026-03-12 12:00:05 -04:00
..
api feat(scheduler): implement capability-based routing and hub v2 2026-03-12 12:00:05 -04:00
audit feat(audit): add HTTP audit middleware and tamper-evident logging 2026-03-08 13:03:02 -04:00
auth fix: add CGO build tags to consistency tests, remove unused isHex function 2026-03-08 13:10:00 -04:00
config feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
container feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
crypto feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
domain feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
envpool refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
errtypes refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
experiment feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
fileutil feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
jupyter feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
logging refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
manifest feat: enhance task domain and scheduler protocol 2026-03-04 13:23:38 -05:00
metrics refactor: Phase 6 - Complete migration, remove legacy files 2026-02-17 14:39:48 -05:00
middleware test: update test suite and remove deprecated privacy middleware 2026-03-08 13:03:55 -04:00
network feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
privacy refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
prommetrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
queue feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
resources refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
scheduler feat(scheduler): implement capability-based routing and hub v2 2026-03-12 12:00:05 -04:00
security refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
storage feat(storage): add groups, tasks, tokens, and audit database schemas 2026-03-08 12:48:42 -04:00
telemetry Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
tracking refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
validation feat: add security monitoring and validation framework 2026-02-19 15:34:25 -05:00
worker refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00