fetch_ml/internal
Jeremie Fraeys 17170667e2
feat(worker): improve lifecycle management and vLLM plugin
Lifecycle improvements:
- runloop.go: refined state machine with better error recovery
- service_manager.go: service dependency management and health checks
- states.go: add states for capability advertisement and draining

Container execution:
- container.go: improved OCI runtime integration with supply chain checks
- Add image verification and signature validation
- Better resource limits enforcement for GPU/memory

vLLM plugin updates:
- vllm.go: support for vLLM 0.3+ with new engine arguments
- Add quantization-aware scheduling (AWQ, GPTQ, FP8)
- Improve model download and caching logic

Configuration:
- config.go: add capability advertisement configuration
- snapshot_store.go: improve snapshot management for checkpointing
2026-03-12 12:05:02 -04:00
..
api feat(api): add structured error package and refactor handlers 2026-03-12 12:04:46 -04:00
audit feat(audit): add HTTP audit middleware and tamper-evident logging 2026-03-08 13:03:02 -04:00
auth feat(crypto,auth): harden KMS and improve permission handling 2026-03-12 12:04:32 -04:00
config feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
container feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
crypto feat(crypto,auth): harden KMS and improve permission handling 2026-03-12 12:04:32 -04:00
domain feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
envpool refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
errtypes refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
experiment feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
fileutil feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
jupyter feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
logging refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
manifest feat: enhance task domain and scheduler protocol 2026-03-04 13:23:38 -05:00
metrics refactor: Phase 6 - Complete migration, remove legacy files 2026-02-17 14:39:48 -05:00
middleware feat(crypto,auth): harden KMS and improve permission handling 2026-03-12 12:04:32 -04:00
network feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
privacy refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
prommetrics feat(api): refactor websocket handlers; add health and prometheus middleware 2026-01-05 12:31:07 -05:00
queue feat(domain): add task visibility and supporting infrastructure 2026-03-08 13:03:27 -04:00
resources refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
scheduler feat(scheduler): implement capability-based routing and hub v2 2026-03-12 12:00:05 -04:00
security refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
storage feat(storage): add groups, tasks, tokens, and audit database schemas 2026-03-08 12:48:42 -04:00
telemetry Fix multi-user authentication and clean up debug code 2025-12-06 12:35:32 -05:00
tracking refactor(utilities): update supporting modules for scheduler integration 2026-02-26 12:07:15 -05:00
validation feat: add security monitoring and validation framework 2026-02-19 15:34:25 -05:00
worker feat(worker): improve lifecycle management and vLLM plugin 2026-03-12 12:05:02 -04:00