fetch_ml/internal/scheduler
Jeremie Fraeys 57787e1e7b
feat(scheduler): implement capability-based routing and hub v2
Add comprehensive capability routing system to scheduler hub:
- Capability-aware worker matching with requirement/offer negotiation
- Hub v2 protocol with structured message types and heartbeat management
- Worker capability advertisement and dynamic routing decisions
- Orphan recovery for disconnected workers with state reconciliation
- Template-based job scheduling with capability constraints

Add extensive test coverage:
- Unit tests for capability routing logic and heartbeat mechanics
- Unit tests for orphan recovery scenarios
- E2E tests for capability routing across multiple workers
- Hub capabilities integration tests
- Scheduler fixture helpers for test setup

Protocol improvements:
- Define structured protocol messages for hub-worker communication
- Add capability matching algorithm with scoring
- Implement graceful worker disconnection handling
2026-03-12 12:00:05 -04:00
..
auth.go refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00
hub.go feat(scheduler): implement capability-based routing and hub v2 2026-03-12 12:00:05 -04:00
hub_capabilities_test.go feat(scheduler): implement capability-based routing and hub v2 2026-03-12 12:00:05 -04:00
pacing.go feat(scheduler): implement multi-tenant job scheduler with gang scheduling 2026-02-26 12:03:23 -05:00
plugin_quota.go refactor(scheduler): remove dead code 2026-03-04 13:35:18 -05:00
port_allocator.go refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00
priority_queue.go feat: enhance task domain and scheduler protocol 2026-03-04 13:23:38 -05:00
protocol.go feat(scheduler): implement capability-based routing and hub v2 2026-03-12 12:00:05 -04:00
scheduler_conn.go refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00
service_manager.go refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00
service_manager_unix.go feat(scheduler): implement multi-tenant job scheduler with gang scheduling 2026-02-26 12:03:23 -05:00
service_manager_windows.go feat(scheduler): implement multi-tenant job scheduler with gang scheduling 2026-02-26 12:03:23 -05:00
service_templates.go refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00
state.go refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00
template.go feat(scheduler): implement capability-based routing and hub v2 2026-03-12 12:00:05 -04:00