## Problem
TestEndToEndJobLifecycle was failing with two issues:
1. Race condition: Workers signaled ready before job was processed, receiving
MsgNoWork instead of MsgJobAssign
2. getTask() didn't check pendingAcceptance - assigned-but-not-yet-accepted
tasks returned nil
## Changes
### Test Fix (restart_recovery_test.go)
- Replace single-shot select with retry loop that re-signals workers as ready
- Handle both assignment and non-assignment messages correctly
- Add 10ms delay between non-assignment messages to allow job processing
- Use 2-second deadline with 100ms timeout intervals
### Scheduler Fix (hub.go)
- Extend getTask() to check pendingAcceptance map after batch/service queues
- Allows GetTask() to find tasks in 'assigned' state before acceptance
- Maintains backward compatibility with existing queue/running lookups
## Testing
make test now passes: 475 passed, 0 failed, 34 skipped
Remove three unused methods/parameter identified by static analysis:
- canRequeue(): never integrated into scheduling flow
- runMetricsClient clientID param: accepted but never used
- getUsageLocked(): callers inline the logic
Fixes IDE warnings about unused code per AGENTS.md cleanup discipline.
- Add plugin_quota.go with GPU quota management for scheduler
- Update scheduler hub and protocol for plugin support
- Add comprehensive plugin quota unit tests
- Update gang service and WebSocket queue integration tests
Add new scheduler component for distributed ML workload orchestration:
- Hub-based coordination for multi-worker clusters
- Pacing controller for rate limiting job submissions
- Priority queue with preemption support
- Port allocator for dynamic service discovery
- Protocol handlers for worker-scheduler communication
- Service manager with OS-specific implementations
- Connection management and state persistence
- Template system for service deployment
Includes comprehensive test suite:
- Unit tests for all core components
- Integration tests for distributed scenarios
- Benchmark tests for performance validation
- Mock fixtures for isolated testing
Refs: scheduler-architecture.md