fetch_ml/internal
Jeremie Fraeys 0b5e99f720
refactor(scheduler,worker): improve service management and GPU detection
Scheduler enhancements:
- auth.go: Group membership validation in authentication
- hub.go: Task distribution with group affinity
- port_allocator.go: Dynamic port allocation with conflict resolution
- scheduler_conn.go: Connection pooling and retry logic
- service_manager.go: Lifecycle management for scheduler services
- service_templates.go: Template-based service configuration
- state.go: Persistent state management with recovery

Worker improvements:
- config.go: Extended configuration for task visibility rules
- execution/setup.go: Sandboxed execution environment setup
- executor/container.go: Container runtime integration
- executor/runner.go: Task runner with visibility enforcement
- gpu_detector.go: Robust GPU detection (NVIDIA, AMD, Apple Silicon, CPU fallback)
- integrity/validate.go: Data integrity validation
- lifecycle/runloop.go: Improved runloop with graceful shutdown
- lifecycle/service_manager.go: Service lifecycle coordination
- process/isolation.go + isolation_unix.go: Process isolation with namespaces/cgroups
- tenant/manager.go: Multi-tenant resource isolation
- tenant/middleware.go: Tenant context propagation
- worker.go: Core worker with group-scoped task execution
2026-03-08 13:03:15 -04:00
..
api feat(audit): add HTTP audit middleware and tamper-evident logging 2026-03-08 13:03:02 -04:00
audit feat(audit): add HTTP audit middleware and tamper-evident logging 2026-03-08 13:03:02 -04:00
auth feat(auth): add token-based access and structured logging 2026-03-08 12:51:07 -04:00
config security: improve audit, crypto, and config handling 2026-03-04 13:23:42 -05:00
container refactor(jupyter): enhance security and scheduler integration 2026-02-26 12:06:35 -05:00
crypto security: improve audit, crypto, and config handling 2026-03-04 13:23:42 -05:00
domain refactor: misc improvements across codebase 2026-03-05 10:58:22 -05:00
envpool refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
errtypes refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
experiment refactor(jupyter): enhance security and scheduler integration 2026-02-26 12:06:35 -05:00
fileutil refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
jupyter refactor(jupyter): enhance security and scheduler integration 2026-02-26 12:06:35 -05:00
logging refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
manifest feat: enhance task domain and scheduler protocol 2026-03-04 13:23:38 -05:00
metrics refactor: Phase 6 - Complete migration, remove legacy files 2026-02-17 14:39:48 -05:00
middleware feat(audit): add HTTP audit middleware and tamper-evident logging 2026-03-08 13:03:02 -04:00
network refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
privacy refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
prommetrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
queue refactor(queue): integrate scheduler backend and storage improvements 2026-02-26 12:06:46 -05:00
resources refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
scheduler refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00
security refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
storage feat(storage): add groups, tasks, tokens, and audit database schemas 2026-03-08 12:48:42 -04:00
telemetry Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
tracking refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
validation feat: add security monitoring and validation framework 2026-02-19 15:34:25 -05:00
worker refactor(scheduler,worker): improve service management and GPU detection 2026-03-08 13:03:15 -04:00