Commit graph

4 commits

Author SHA1 Message Date
Jeremie Fraeys
7194826871
feat: implement research-grade maintainability phases 1,3,4,7
Phase 1: Event Sourcing
- Add TaskEvent types (queued, started, completed, failed, etc.)
- Create EventStore with Redis Streams (append-only)
- Support event querying by task ID and time range

Phase 3: Diagnosable Failures
- Enhance TaskExecutionError with Context map, Timestamp, Recoverable flag
- Update container.go to populate error context (image, GPU, duration)
- Add WithContext helper for building error context
- Create cmd/errors CLI for querying task errors

Phase 4: Testable Security
- Add security fields to PodmanConfig (Privileged, Network, ReadOnlyMounts)
- Create ValidateSecurityPolicy() with ErrSecurityViolation
- Add security contract tests (privileged rejection, host network rejection)
- Tests serve as executable security documentation

Phase 7: Reproducible Builds
- Add BuildHash and BuildTime ldflags to Makefile
- Create verify-build target for reproducibility testing
- Add -version and -verify flags to api-server

All tests pass:
- go test ./internal/errtypes/...
- go test ./internal/container/... -run Security
- go test ./internal/queue/...
- go build ./cmd/api-server/...
2026-02-18 15:27:50 -05:00
Jeremie Fraeys
a1ce267b86
feat: Implement all worker stub methods with real functionality
- VerifySnapshot: SHA256 verification using integrity package
- EnforceTaskProvenance: Strict and best-effort provenance validation
- RunJupyterTask: Full Jupyter service lifecycle (start/stop/remove/restore/list_packages)
- RunJob: Job execution using executor.JobRunner
- PrewarmNextOnce: Prewarming with queue integration

All methods now use new architecture components instead of placeholders
2026-02-17 17:37:56 -05:00
Jeremie Fraeys
4c8c9dfe4b
refactor: Export SelectDependencyManifest for API helpers
- Renamed selectDependencyManifest to SelectDependencyManifest (exported)
- Added re-export in worker package for backward compatibility
- Updated internal call in container.go to use exported function
- API helpers can now access via worker.SelectDependencyManifest

Build status: Compiles successfully
2026-02-17 16:45:59 -05:00
Jeremie Fraeys
22f3d66f1d
refactor: Phase 2 - Extract executor implementations
Created executor package with extracted job execution logic:

1. internal/worker/executor/local.go (104 lines)
   - LocalExecutor implements JobExecutor interface
   - Execute() method for local bash script execution
   - generateScript() helper for creating experiment scripts

2. internal/worker/executor/container.go (229 lines)
   - ContainerExecutor implements JobExecutor interface
   - Execute() method for podman container execution
   - EnvironmentPool interface for image caching
   - Tracking tool provisioning (MLflow, TensorBoard, Wandb)
   - Volume and cache setup
   - selectDependencyManifest() helper

3. internal/worker/executor/runner.go (131 lines)
   - JobRunner orchestrates execution
   - ExecutionMode enum (Auto, Local, Container)
   - Run() method with directory setup and executor selection
   - finalize() for success/failure handling

Key design decisions:
- Executors depend on interfaces (ManifestWriter, not Worker)
- JobRunner composes both executors
- No direct Worker dependencies in executor package
- SetupJobDirectories reused from execution package

Build status: Compiles successfully
2026-02-17 14:14:04 -05:00