Phase 1: Event Sourcing - Add TaskEvent types (queued, started, completed, failed, etc.) - Create EventStore with Redis Streams (append-only) - Support event querying by task ID and time range Phase 3: Diagnosable Failures - Enhance TaskExecutionError with Context map, Timestamp, Recoverable flag - Update container.go to populate error context (image, GPU, duration) - Add WithContext helper for building error context - Create cmd/errors CLI for querying task errors Phase 4: Testable Security - Add security fields to PodmanConfig (Privileged, Network, ReadOnlyMounts) - Create ValidateSecurityPolicy() with ErrSecurityViolation - Add security contract tests (privileged rejection, host network rejection) - Tests serve as executable security documentation Phase 7: Reproducible Builds - Add BuildHash and BuildTime ldflags to Makefile - Create verify-build target for reproducibility testing - Add -version and -verify flags to api-server All tests pass: - go test ./internal/errtypes/... - go test ./internal/container/... -run Security - go test ./internal/queue/... - go build ./cmd/api-server/... |
||
|---|---|---|
| .. | ||
| main.go | ||
| README.md | ||
API Server
WebSocket API server for the ML CLI tool...
Usage
./bin/api-server --config configs/api/dev.yaml
Endpoints
GET /health- Health checkWS /ws- WebSocket endpoint for CLI communication
Binary Protocol
See CLI README for protocol details.
Configuration
Uses the same configuration file as the worker. Experiment base path is read from base_path configuration key.
Example
# Start API server
./bin/api-server --listen :9100
# In another terminal, test with CLI
./cli/zig-out/bin/ml status