Commit graph

12 commits

Author SHA1 Message Date
Jeremie Fraeys
188cf55939
refactor(api): overhaul WebSocket handler and protocol layer
Major WebSocket handler refactor:
- Rewrite ws/handler.go with structured message routing and backpressure
- Add connection lifecycle management with heartbeats and timeouts
- Implement graceful connection draining for zero-downtime restarts

Protocol improvements:
- Define structured protocol types in protocol.go for hub communication
- Add versioned message envelopes for backward compatibility
- Standardize error codes and response formats across WebSocket API

Job streaming via WebSocket:
- Simplify ws/jobs.go with async job status streaming
- Add compression for high-volume job updates

Testing:
- Update websocket_e2e_test.go for new protocol semantics
- Add connection resilience tests
2026-03-12 12:01:21 -04:00
Jeremie Fraeys
c52179dcbe
feat(auth): add token-based access and structured logging
Add comprehensive authentication and authorization enhancements:

- tokens.go: New token management system for public task access and cloning
  * SHA-256 hashed token storage for security
  * Token generation, validation, and automatic cleanup
  * Support for public access and clone permissions

- api_key.go: Extend User struct with Groups field
  * Lab group membership (ml-lab, nlp-group)
  * Integration with permission system for group-based access

- flags.go: Security hardening - migrate to structured logging
  * Replace log.Printf with log/slog to prevent log injection attacks
  * Consistent structured output for all auth warnings
  * Safe handling of file paths and errors in logs

- permissions.go: Add task sharing permission constants
  * PermissionTasksReadOwn: Access own tasks
  * PermissionTasksReadLab: Access lab group tasks
  * PermissionTasksReadAll: Admin/institution-wide access
  * PermissionTasksShare: Grant access to other users
  * PermissionTasksClone: Create copies of shared tasks
  * CanAccessTask() method with visibility checks

- database.go: Improve error handling
  * Add structured error logging on row close failures
2026-03-08 12:51:07 -04:00
Jeremie Fraeys
c6a224d5fc
feat(cli,server): unify info command with remote/local support
Enhance ml info to query server when connected, falling back to local
manifests when offline. Unifies behavior with other commands like run,
exec, and cancel.

CLI changes:
- Add --local and --remote flags for explicit control
- Auto-detect connection state via mode.detect()
- queryRemoteRun(): Query server via WebSocket for run details
- queryLocalRun(): Read local run_manifest.json
- displayRunInfo(): Shared display logic for both sources
- Add connection status indicators (Remote: connecting.../connected)

WebSocket protocol:
- Add query_run_info opcode (0x28) to cli and server
- Add sendQueryRunInfo() method to ws/client.zig
- Protocol: [opcode:1][api_key_hash:16][run_id_len:1][run_id:var]

Server changes:
- Add handleQueryRunInfo() handler to ws/handler.go
- Returns run_id, job_name, user, timestamp, overall_sha, files_count
- Checks PermJobsRead permission
- Looks up run in experiment manager

Usage:
  ml info abc123              # Auto: tries remote, falls back to local
  ml info abc123 --local      # Force local manifest lookup
  ml info abc123 --remote     # Force remote query (fails if offline)
2026-03-05 12:07:00 -05:00
Jeremie Fraeys
420de879ff
feat(api): integrate scheduler protocol and WebSocket enhancements
Update API layer for scheduler integration:
- WebSocket handlers with scheduler protocol support
- Jobs WebSocket endpoint with priority queue integration
- Validation middleware for scheduler messages
- Server configuration with security hardening
- Protocol definitions for worker-scheduler communication
- Dataset handlers with tenant isolation checks
- Response helpers with audit context
- OpenAPI spec updates for new endpoints
2026-02-26 12:05:57 -05:00
Jeremie Fraeys
6028779239
feat: update CLI, TUI, and security documentation
- Add safety checks to Zig build
- Add TUI with job management and narrative views
- Add WebSocket support and export services
- Add smart configuration defaults
- Update API routes with security headers
- Update SECURITY.md with comprehensive policy
- Add Makefile security scanning targets
2026-02-19 15:35:05 -05:00
Jeremie Fraeys
260e18499e
feat: Research features - narrative fields and outcome tracking
Add comprehensive research context tracking to jobs:
- Narrative fields: hypothesis, context, intent, expected_outcome
- Experiment groups and tags for organization
- Run comparison (compare command) for diff analysis
- Run search (find command) with criteria filtering
- Run export (export command) for data portability
- Outcome setting (outcome command) for experiment validation

Update queue and requeue commands to support narrative fields.
Add narrative validation to manifest validator.
Add WebSocket handlers for compare, find, export, and outcome operations.

Includes E2E tests for phase 2 features.
2026-02-18 21:27:05 -05:00
Jeremie Fraeys
10e6416e11
refactor: update WebSocket handlers and database schemas
- Update datasets handlers with improved error handling
- Refactor WebSocket handler for better organization
- Clean up jobs.go handler implementation
- Add websocket_metrics table to Postgres and SQLite schemas
2026-02-18 14:36:30 -05:00
Jeremie Fraeys
f92e0bbdf9
feat: implement WebSocket handlers by delegating to sub-packages
Implemented WebSocket handlers by creating and integrating sub-packages:

**New package: api/datasets**
- HandleDatasetList, HandleDatasetRegister, HandleDatasetInfo, HandleDatasetSearch
- Binary protocol parsing for each operation

**Updated ws/handler.go**
- Added jobsHandler, jupyterHandler, datasetsHandler fields
- Updated NewHandler to accept sub-handlers
- Implemented handleAnnotateRun -> api/jobs
- Implemented handleSetRunNarrative -> api/jobs
- Implemented handleStartJupyter -> api/jupyter
- Implemented handleStopJupyter -> api/jupyter
- Implemented handleListJupyter -> api/jupyter
- Implemented handleDatasetList -> api/datasets
- Implemented handleDatasetRegister -> api/datasets
- Implemented handleDatasetInfo -> api/datasets
- Implemented handleDatasetSearch -> api/datasets

**Updated api/routes.go**
- Create jobs, jupyter, and datasets handlers
- Pass all handlers to ws.NewHandler

Build passes, all tests pass.
2026-02-17 20:49:31 -05:00
Jeremie Fraeys
3694d4e56f
refactor: extract ws handlers to separate files to reduce handler.go size
- Extract job handlers (handleQueueJob, handleQueueJobWithSnapshot, handleCancelJob, handlePrune) to ws/jobs.go (209 lines)
- Extract validation handler (handleValidateRequest) to ws/validate.go (167 lines)
- Reduce ws/handler.go from 879 to 474 lines (under 500 line target)
- Keep core framework in handler.go: Handler struct, dispatch, packet sending, auth helpers
- All handlers remain as methods on Handler for backward compatibility

Result: handler.go 474 lines, jobs.go 209 lines, validate.go 167 lines
2026-02-17 20:38:03 -05:00
Jeremie Fraeys
fb2bbbaae5
refactor: Phase 7 - TUI cleanup - reorganize model package
Phase 7 of the monorepo maintainability plan:

New files created:
- model/jobs.go - Job type, JobStatus constants, list.Item interface
- model/messages.go - tea.Msg types (JobsLoadedMsg, StatusMsg, TickMsg, etc.)
- model/styles.go - NewJobListDelegate(), JobListTitleStyle(), SpinnerStyle()
- model/keys.go - KeyMap struct, DefaultKeys() function

Modified files:
- model/state.go - reduced from 226 to ~130 lines
  - Removed: Job, JobStatus, KeyMap, Keys, inline styles
  - Kept: State struct, domain re-exports, ViewMode, DatasetInfo, InitialState()
- controller/commands.go - use model. prefix for message types
- controller/controller.go - use model. prefix for message types
- controller/settings.go - use model.SettingsContentMsg

Deleted files:
- controller/keys.go (moved to model/keys.go since State references KeyMap)

Result:
- No file >150 lines in model/ package
- Single concern per file: state, jobs, messages, styles, keys
- All 41 test packages pass
2026-02-17 20:22:04 -05:00
Jeremie Fraeys
a1ce267b86
feat: Implement all worker stub methods with real functionality
- VerifySnapshot: SHA256 verification using integrity package
- EnforceTaskProvenance: Strict and best-effort provenance validation
- RunJupyterTask: Full Jupyter service lifecycle (start/stop/remove/restore/list_packages)
- RunJob: Job execution using executor.JobRunner
- PrewarmNextOnce: Prewarming with queue integration

All methods now use new architecture components instead of placeholders
2026-02-17 17:37:56 -05:00
Jeremie Fraeys
f0ffbb4a3d
refactor: Phase 5 complete - API packages extracted
Extracted all deferred API packages from monolithic ws_*.go files:

- api/routes.go (75 lines) - Extracted route registration from server.go
- api/errors.go (108 lines) - Standardized error responses and error codes
- api/jobs/handlers.go (271 lines) - Job WebSocket handlers
  * HandleAnnotateRun, HandleSetRunNarrative
  * HandleCancelJob, HandlePruneJobs, HandleListJobs
- api/jupyter/handlers.go (244 lines) - Jupyter WebSocket handlers
  * HandleStartJupyter, HandleStopJupyter
  * HandleListJupyter, HandleListJupyterPackages
  * HandleRemoveJupyter, HandleRestoreJupyter
- api/validate/handlers.go (163 lines) - Validation WebSocket handlers
  * HandleValidate, HandleGetValidateStatus, HandleListValidations
- api/ws/handler.go (298 lines) - WebSocket handler framework
  * Core WebSocket handling logic
  * Opcode constants and error codes

Lines redistributed: ~1,150 lines from ws_jobs.go (1,365), ws_jupyter.go (512),
ws_validate.go (523), ws_handler.go (379) into focused packages.

Note: Original ws_*.go files still present - cleanup in next commit.
Build status: Compiles successfully
2026-02-17 13:25:58 -05:00