Commit graph

3 commits

Author SHA1 Message Date
Jeremie Fraeys
61660dc925
refactor: co-locate security, storage, telemetry, tracking, worker tests
Move unit tests from tests/unit/ to internal/ following Go conventions:

Security tests:
- tests/unit/security/* -> internal/security/* (audit, config_integrity, filetype, gpu_audit, hipaa_validation, manifest_filename, path_traversal, resource_quota, secrets)

Storage tests:
- tests/unit/storage/* -> internal/storage/* (db, experiment_metadata)

Telemetry tests:
- tests/unit/telemetry/* -> internal/telemetry/* (telemetry)

Tracking tests:
- tests/unit/reproducibility/* -> internal/tracking/* (config_hash, environment_capture)

Worker tests:
- tests/unit/worker/* -> internal/worker/* (artifacts, config, hash_bench, plugins/jupyter_task, plugins/vllm, prewarm_v1, run_manifest_execution, snapshot_stage, snapshot_store, worker)

Update import paths in test files to reflect new locations.
2026-03-12 16:37:03 -04:00
Jeremie Fraeys
17170667e2
feat(worker): improve lifecycle management and vLLM plugin
Lifecycle improvements:
- runloop.go: refined state machine with better error recovery
- service_manager.go: service dependency management and health checks
- states.go: add states for capability advertisement and draining

Container execution:
- container.go: improved OCI runtime integration with supply chain checks
- Add image verification and signature validation
- Better resource limits enforcement for GPU/memory

vLLM plugin updates:
- vllm.go: support for vLLM 0.3+ with new engine arguments
- Add quantization-aware scheduling (AWQ, GPTQ, FP8)
- Improve model download and caching logic

Configuration:
- config.go: add capability advertisement configuration
- snapshot_store.go: improve snapshot management for checkpointing
2026-03-12 12:05:02 -04:00
Jeremie Fraeys
95adcba437
feat(worker): add Jupyter/vLLM plugins and process isolation
Extend worker capabilities with new execution plugins and security features:
- Jupyter plugin for notebook-based ML experiments
- vLLM plugin for LLM inference workloads
- Cross-platform process isolation (Unix/Windows)
- Network policy enforcement with platform-specific implementations
- Service manager integration for lifecycle management
- Scheduler backend integration for queue coordination

Update lifecycle management:
- Enhanced runloop with state transitions
- Service manager integration for plugin coordination
- Improved state persistence and recovery

Add test coverage:
- Unit tests for Jupyter and vLLM plugins
- Updated worker execution tests
2026-02-26 12:03:59 -05:00