Commit graph

55 commits

Author SHA1 Message Date
Jeremie Fraeys
61660dc925
refactor: co-locate security, storage, telemetry, tracking, worker tests
Move unit tests from tests/unit/ to internal/ following Go conventions:

Security tests:
- tests/unit/security/* -> internal/security/* (audit, config_integrity, filetype, gpu_audit, hipaa_validation, manifest_filename, path_traversal, resource_quota, secrets)

Storage tests:
- tests/unit/storage/* -> internal/storage/* (db, experiment_metadata)

Telemetry tests:
- tests/unit/telemetry/* -> internal/telemetry/* (telemetry)

Tracking tests:
- tests/unit/reproducibility/* -> internal/tracking/* (config_hash, environment_capture)

Worker tests:
- tests/unit/worker/* -> internal/worker/* (artifacts, config, hash_bench, plugins/jupyter_task, plugins/vllm, prewarm_v1, run_manifest_execution, snapshot_stage, snapshot_store, worker)

Update import paths in test files to reflect new locations.
2026-03-12 16:37:03 -04:00
Jeremie Fraeys
74e06017b5
refactor: co-locate scheduler non-hub tests with source code
Move unit tests from tests/unit/scheduler/ to internal/scheduler/ following Go conventions:
- capability_routing_test.go - Worker capability-based job routing tests
- failure_scenarios_test.go - Scheduler failure handling and recovery tests
- heartbeat_test.go - Worker heartbeat monitoring tests
- plugin_quota_test.go - Plugin resource quota enforcement tests
- port_allocator_test.go - Dynamic port allocation for services tests
- priority_queue_test.go - Job priority queue implementation tests
- service_templates_test.go - Service template management tests
- state_store_test.go - Scheduler state persistence tests

Note: orphan_recovery_test.go excluded from this commit - will be handled with hub refactoring due to significant test changes.
2026-03-12 16:36:29 -04:00
Jeremie Fraeys
ee0b90cfc5
refactor: co-locate queue and resources tests, add manager tests
Move unit tests from tests/unit/ to internal/ following Go conventions:
- tests/unit/queue/* -> internal/queue/* (dedup, filesystem_fallback, queue_permissions, queue_spec, queue, sqlite_queue tests)
- tests/unit/gpu/* -> internal/resources/* (gpu_detector, gpu_golden tests)
- tests/unit/resources/* -> internal/resources/* (manager_test.go)

Update import paths in test files to reflect new locations.

Note: GPU tests consolidated into resources package since GPU detection is part of resource management. Manager tests show significant new test coverage (166 lines).
2026-03-12 16:36:02 -04:00
Jeremie Fraeys
ca6ad970c3
refactor: co-locate logging, manifest, network, privacy, prommetrics tests
Move unit tests from tests/unit/ to internal/ following Go conventions:
- tests/unit/logging/* -> internal/logging/* (logging tests)
- tests/unit/manifest/* -> internal/manifest/* (run_manifest, schema tests)
- tests/unit/network/* -> internal/network/* (retry, ssh_pool, ssh tests)
- tests/unit/privacy/* -> internal/privacy/* (pii tests)
- tests/unit/metrics/* -> internal/prommetrics/* (metrics tests)

Update import paths in test files to reflect new locations.

Note: metrics_test.go moved from tests/unit/metrics/ to internal/prommetrics/ to match the actual package name.
2026-03-12 16:35:37 -04:00
Jeremie Fraeys
cf84246115
refactor: co-locate config, container, envpool, errors, experiment, jupyter tests
Move unit tests from tests/unit/ to internal/ following Go conventions:
- tests/unit/config/* -> internal/config/* (constants, mode_paths, paths, validation)
- tests/unit/container/* -> internal/container/* (podman, security tests)
- tests/unit/envpool/* -> internal/envpool/* (envpool tests)
- tests/unit/errors/* -> internal/errtypes/* (errors_test.go moved to errtypes package)
- tests/unit/experiment/* -> internal/experiment/* (manager tests)
- tests/unit/jupyter/* -> internal/jupyter/* (config, package_blacklist, service_manager, trash_restore)

Update import paths in test files to reflect new locations.

Note: errors_test.go moved from tests/unit/errors/ to internal/errtypes/ to match the package structure.
2026-03-12 16:35:15 -04:00
Jeremie Fraeys
a4e2ecdbe6
refactor: co-locate api, audit, auth tests with source code
Move unit tests from tests/unit/ to internal/ following Go conventions:
- tests/unit/api/* -> internal/api/* (WebSocket handlers, helpers, duplicate detection)
- tests/unit/audit/* -> internal/audit/* (alert, sealed, verifier tests)
- tests/unit/auth/* -> internal/auth/* (API key, keychain, user manager)
- tests/unit/crypto/kms/* -> internal/auth/kms/* (cache, protocol tests)

Update import paths in test files to reflect new locations.

Benefits:
- Tests live alongside the code they test
- Easier navigation and maintenance
- Clearer package boundaries
- Follows standard Go project layout
2026-03-12 16:34:54 -04:00
Jeremie Fraeys
c5524562e9
test(scheduler): remove unused fields in service slot pool separation test
Remove ID and GPUCount fields from batchJob in TestServiceSlotPoolSeparation
that were assigned but never used. The test only validates SlotPool values.
2026-03-12 12:10:33 -04:00
Jeremie Fraeys
2bd7f97ae2
test(integration,unit): update test suites for new features and APIs
Integration test updates:
- jupyter_experiment_test.go: update for new workspace handling
- run_manifest_test.go: reproducibility manifest validation
- secrets_integration_test.go: KMS and secret provider tests
- storage_redis_integration_test.go: Redis-backed storage tests

Unit test updates:
- response_helpers_test.go: API response helper tests
- config_hash_test.go: configuration hashing for reproducibility
- filetype_test.go: security file type detection tests

Load testing:
- load_test.go: scheduler load and stress tests
2026-03-12 12:09:15 -04:00
Jeremie Fraeys
17170667e2
feat(worker): improve lifecycle management and vLLM plugin
Lifecycle improvements:
- runloop.go: refined state machine with better error recovery
- service_manager.go: service dependency management and health checks
- states.go: add states for capability advertisement and draining

Container execution:
- container.go: improved OCI runtime integration with supply chain checks
- Add image verification and signature validation
- Better resource limits enforcement for GPU/memory

vLLM plugin updates:
- vllm.go: support for vLLM 0.3+ with new engine arguments
- Add quantization-aware scheduling (AWQ, GPTQ, FP8)
- Improve model download and caching logic

Configuration:
- config.go: add capability advertisement configuration
- snapshot_store.go: improve snapshot management for checkpointing
2026-03-12 12:05:02 -04:00
Jeremie Fraeys
37c4d4e9c7
feat(crypto,auth): harden KMS and improve permission handling
KMS improvements:
- cache.go: add LRU eviction with memory-bounded caches
- provider.go: refactor provider initialization and key rotation
- tenant_keys.go: per-tenant key isolation with envelope encryption

Auth layer updates:
- hybrid.go: refine hybrid auth flow for API key + JWT
- permissions_loader.go: faster permission caching with hot-reload
- validator.go: stricter validation with detailed error messages

Security middleware:
- security.go: add rate limiting headers and CORS refinement

Testing and benchmarks:
- Add KMS cache and protocol unit tests
- Add KMS benchmark tests for encryption throughput
- Update KMS integration tests for tenant isolation
2026-03-12 12:04:32 -04:00
Jeremie Fraeys
de83300962
feat(worker): refactor GPU detection with macOS Metal support
GPU detection refactor:
- Major rewrite of gpu_detector.go with unified detection interface
- Support for NVIDIA (NVML), AMD (ROCm), and Apple Metal
- Runtime GPU capability querying for scheduler matching

macOS improvements:
- gpu_macos.go: native Metal device enumeration and memory queries
- Support for Apple Silicon (M1/M2/M3) unified memory reporting
- Fallback to system profiler for Intel Macs

Testing infrastructure:
- Add gpu_detector_mock.go for testing without hardware
- Update gpu_golden_test.go with platform-specific expectations
- Cross-platform GPU info validation
2026-03-12 12:02:41 -04:00
Jeremie Fraeys
57787e1e7b
feat(scheduler): implement capability-based routing and hub v2
Add comprehensive capability routing system to scheduler hub:
- Capability-aware worker matching with requirement/offer negotiation
- Hub v2 protocol with structured message types and heartbeat management
- Worker capability advertisement and dynamic routing decisions
- Orphan recovery for disconnected workers with state reconciliation
- Template-based job scheduling with capability constraints

Add extensive test coverage:
- Unit tests for capability routing logic and heartbeat mechanics
- Unit tests for orphan recovery scenarios
- E2E tests for capability routing across multiple workers
- Hub capabilities integration tests
- Scheduler fixture helpers for test setup

Protocol improvements:
- Define structured protocol messages for hub-worker communication
- Add capability matching algorithm with scoring
- Implement graceful worker disconnection handling
2026-03-12 12:00:05 -04:00
Jeremie Fraeys
c74e91dd69
test: update test suite and remove deprecated privacy middleware
Test improvements:
- fixtures/: Updated mocks, fixtures with group context, SSH server, TUI driver
- integration/: WebSocket queue and handler tests with groups
- e2e/: WebSocket and TLS proxy end-to-end tests
- unit/api/ws_test.go: WebSocket API tests
- unit/scheduler/service_templates_test.go: Service template tests
- benchmarks/scheduler_bench_test.go: Performance benchmarks

Cleanup:
- Remove privacy middleware (replaced by audit system)
- Remove privacy_test.go
2026-03-08 13:03:55 -04:00
Jeremie Fraeys
5f53104fcd
test: modernize test suite for streamlined infrastructure
- Update E2E tests for consolidated docker-compose.test.yml
- Remove references to obsolete logs-debug.yml
- Enhance test fixtures and utilities
- Improve integration test coverage for KMS, queue, scheduler
- Update unit tests for config constants and worker execution
- Modernize cleanup-status.sh with new Makefile targets
2026-03-04 13:24:24 -05:00
Jeremie Fraeys
16343e6c2a
test(kms): add comprehensive unit and integration tests
Unit tests for DEK cache:
- Put/Get operations, TTL expiry, LRU eviction
- Tenant isolation, flush/clear, stats, empty DEK rejection

Unit tests for KMS protocol:
- Encrypt/decrypt round-trip with MemoryProvider
- Multi-tenant isolation (wrong key fails MAC verification)
- Cache hit verification, key rotation flow
- Health check protocol

Integration tests with testcontainers:
- VaultProvider with hashicorp/vault:1.15 container
- AWSProvider with localstack/localstack container
- TenantKeyManager end-to-end with MemoryProvider
2026-03-03 19:14:31 -05:00
Jeremie Fraeys
da104367d6
feat: add Plugin GPU Quota implementation and tests
Some checks failed
Build Pipeline / Build Binaries (push) Failing after 1m59s
Build Pipeline / Build Docker Images (push) Has been skipped
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 5s
CI Pipeline / Test (ubuntu-latest on self-hosted) (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Documentation / build-and-publish (push) Failing after 35s
CI Pipeline / Trigger Build Workflow (push) Failing after 0s
Security Scan / Security Analysis (push) Has been cancelled
Security Scan / Native Library Security (push) Has been cancelled
Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled
Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled
Verification & Maintenance / Verification Summary (push) Has been cancelled
- Add plugin_quota.go with GPU quota management for scheduler

- Update scheduler hub and protocol for plugin support

- Add comprehensive plugin quota unit tests

- Update gang service and WebSocket queue integration tests
2026-02-26 14:35:05 -05:00
Jeremie Fraeys
d87c556afa
test(all): update test suite for scheduler and security features
Update comprehensive test coverage:
- E2E tests with scheduler integration
- Integration tests with tenant isolation
- Unit tests with security assertions
- Security tests with audit validation
- Audit verification tests
- Auth tests with tenant scoping
- Config validation tests
- Container security tests
- Worker tests with scheduler mock
- Environment pool tests
- Load tests with distributed patterns
- Test fixtures with scheduler support
- Update go.mod/go.sum with new dependencies
2026-02-26 12:08:46 -05:00
Jeremie Fraeys
95adcba437
feat(worker): add Jupyter/vLLM plugins and process isolation
Extend worker capabilities with new execution plugins and security features:
- Jupyter plugin for notebook-based ML experiments
- vLLM plugin for LLM inference workloads
- Cross-platform process isolation (Unix/Windows)
- Network policy enforcement with platform-specific implementations
- Service manager integration for lifecycle management
- Scheduler backend integration for queue coordination

Update lifecycle management:
- Enhanced runloop with state transitions
- Service manager integration for plugin coordination
- Improved state persistence and recovery

Add test coverage:
- Unit tests for Jupyter and vLLM plugins
- Updated worker execution tests
2026-02-26 12:03:59 -05:00
Jeremie Fraeys
a981e89005
feat(security): add audit subsystem and tenant isolation
Implement comprehensive audit and security infrastructure:
- Immutable audit logs with platform-specific backends (Linux/Other)
- Sealed log entries with tamper-evident checksums
- Audit alert system for real-time security notifications
- Log rotation with retention policies
- Checkpoint-based audit verification

Add multi-tenant security features:
- Tenant manager with quota enforcement
- Middleware for tenant authentication/authorization
- Per-tenant cryptographic key isolation
- Supply chain security for container verification
- Cross-platform secure file utilities (Unix/Windows)

Add test coverage:
- Unit tests for audit alerts and sealed logs
- Platform-specific audit backend tests
2026-02-26 12:03:45 -05:00
Jeremie Fraeys
43e6446587
feat(scheduler): implement multi-tenant job scheduler with gang scheduling
Add new scheduler component for distributed ML workload orchestration:
- Hub-based coordination for multi-worker clusters
- Pacing controller for rate limiting job submissions
- Priority queue with preemption support
- Port allocator for dynamic service discovery
- Protocol handlers for worker-scheduler communication
- Service manager with OS-specific implementations
- Connection management and state persistence
- Template system for service deployment

Includes comprehensive test suite:
- Unit tests for all core components
- Integration tests for distributed scenarios
- Benchmark tests for performance validation
- Mock fixtures for isolated testing

Refs: scheduler-architecture.md
2026-02-26 12:03:23 -05:00
Jeremie Fraeys
6fc2e373c1
fix: resolve IDE warnings and test errors
Bug fixes and cleanup for test infrastructure:

- schema_test.go: Fix SchemaVersion reference with proper manifest import
- schema_test.go: Update all schema.json paths to internal/manifest location
- manifestenv.go: Remove unused helper functions (isArtifactsType, getPackagePath)
- nobaredetector.go: Fix exprToString syntax error, remove unused functions

All tests now pass without errors or warnings
2026-02-23 20:26:20 -05:00
Jeremie Fraeys
9f9d75dd68
test(phase-4): reproducibility crossover tests
Implement reproducibility crossover requirements:

- TestManifestEnvironmentCapture: Environment population with ConfigHash and DetectionMethod
- TestConfigHashPostDefaults: Hash computation after env expansion and defaults

Verifies manifest.Environment is properly populated for reproducibility tracking
2026-02-23 20:25:37 -05:00
Jeremie Fraeys
8f9bcef754
test(phase-3): prerequisite security and reproducibility tests
Implement 4 prerequisite test requirements:

- TestConfigIntegrityVerification: Config signing, tamper detection, hash stability
- TestManifestFilenameNonce: Cryptographic nonce generation and filename patterns
- TestGPUDetectionAudit: Structured logging of GPU detection at startup
- TestResourceEnvVarParsing: Resource env var parsing and override behavior

Also update manifest run_manifest.go:
- Add nonce-based filename support to WriteToDir
- Add nonce-based file detection to LoadFromDir
2026-02-23 20:25:26 -05:00
Jeremie Fraeys
f71352202e
test(phase-1-2): naming alignment and partial test completion
Rename and enhance existing tests to align with coverage map:
- TestGPUDetectorAMDVendorAlias -> TestAMDAliasManifestRecord
- TestScanArtifacts_SkipsKnownPathsAndLogs -> TestScanExclusionsRecorded
- Add env var expansion verification to TestHIPAAValidation_InlineCredentials
- Record exclusions in manifest.Artifacts for audit trail
2026-02-23 20:25:07 -05:00
Jeremie Fraeys
b33c6c4878
test(security): Add PHI denylist tests to secrets validation
Add comprehensive PHI detection tests:
- patient_id rejection
- ssn rejection
- medical_record_number rejection
- diagnosis_code rejection
- Mixed secrets with PHI rejection
- Normal secrets acceptance (HF_TOKEN, WANDB_API_KEY, etc.)

Ensures AllowedSecrets PHI denylist validation works correctly
across all PHI pattern variations.

Part of: PHI denylist validation from security plan
2026-02-23 19:44:33 -05:00
Jeremie Fraeys
17d5c75e33
fix(security): Path validation improvements for symlink resolution
Fix ValidatePath to correctly resolve symlinks and handle edge cases:
- Resolve symlinks before boundary check to prevent traversal
- Handle macOS /private prefix correctly
- Add fallback for non-existent paths (parent directory resolution)
- Double boundary checks: before AND after symlink resolution
- Prevent race conditions between check and use

Update path traversal tests:
- Correct test expectations for "..." (three dots is valid filename, not traversal)
- Add tests for symlink escape attempts
- Add unicode attack tests
- Add deeply nested traversal tests

Security impact: Prevents path traversal via symlink following in artifact
scanning and other file operations.
2026-02-23 19:44:16 -05:00
Jeremie Fraeys
58c1a5fa58
feat(audit): Tamper-evident audit chain verification system
Add ChainVerifier for cryptographic audit log verification:
- VerifyLogFile(): Validates entire audit chain integrity
- Detects tampering at specific event index (FirstTampered)
- Returns chain root hash for external verification
- GetChainRootHash(): Standalone hash computation
- VerifyAndAlert(): Boolean tampering detection with logging

Add audit-verifier CLI tool:
- Standalone binary for audit chain verification
- Takes log path argument and reports tampering

Update audit logger for chain integrity:
- Each event includes sequence number and hash chain
- SHA-256 linking: hash_n = SHA-256(prev_hash || event_n)
- Tamper detection through hash chain validation

Add comprehensive test coverage:
- Empty log handling
- Valid chain verification
- Tampering detection with modification
- Root hash consistency
- Alert mechanism tests

Part of: V.7 audit verification from security plan
2026-02-23 19:43:50 -05:00
Jeremie Fraeys
9434f4c8e6
feat(security): Artifact ingestion caps enforcement
Add MaxArtifactFiles and MaxArtifactTotalBytes to SandboxConfig:
- Default MaxArtifactFiles: 10,000 (configurable via SecurityDefaults)
- Default MaxArtifactTotalBytes: 100GB (configurable via SecurityDefaults)
- ApplySecurityDefaults() sets defaults if not specified

Enforce caps in scanArtifacts() during directory walk:
- Returns error immediately when MaxArtifactFiles exceeded
- Returns error immediately when MaxArtifactTotalBytes exceeded
- Prevents resource exhaustion attacks from malicious artifact trees

Update all call sites to pass SandboxConfig for cap enforcement:
- Native bridge libs updated to pass caps argument
- Benchmark tests updated with nil caps (unlimited for benchmarks)
- Unit tests updated with nil caps

Closes: artifact ingestion caps items from security plan
2026-02-23 19:43:28 -05:00
Jeremie Fraeys
a8180f1f26
feat(security): HIPAA compliance mode and PHI denylist validation
Add compliance_mode field to Config with strict HIPAA validation:
- Requires SnapshotStore.Secure=true in HIPAA mode
- Requires NetworkMode="none" for tenant isolation
- Requires non-empty SeccompProfile
- Requires NoNewPrivileges=true
- Enforces credentials via environment variables only (no inline YAML)

Add PHI denylist validation for AllowedSecrets:
- Blocks secrets matching patterns: patient, ssn, mrn, medical_record,
  diagnosis, dob, birth, mrn_number, patient_id, patient_name
- Prevents accidental PHI exfiltration via secret channels

Add comprehensive test coverage in hipaa_validation_test.go:
- Network mode enforcement tests
- NoNewPrivileges requirement tests
- Seccomp profile validation tests
- Inline credential rejection tests
- PHI denylist validation tests

Closes: compliance_mode, PHI denylist items from security plan
2026-02-23 19:43:19 -05:00
Jeremie Fraeys
fc2459977c
refactor(worker): update worker tests and native bridge
**Worker Refactoring:**
- Update internal/worker/factory.go, worker.go, snapshot_store.go
- Update native_bridge.go and native_bridge_nocgo.go for native library integration

**Test Updates:**
- Update all worker unit tests for new interfaces
- Update chaos tests
- Update container/podman_test.go
- Add internal/workertest/worker.go for shared test utilities

**Documentation:**
- Update native/README.md
2026-02-23 18:04:22 -05:00
Jeremie Fraeys
fccced6bb3
test(security): add comprehensive security unit tests
Adds 13 security tests across 4 files for hardening verification:

**Path Traversal Tests (path_traversal_test.go):**
- TestSecurePathValidator_ValidRelativePath
- TestSecurePathValidator_PathTraversalBlocked
- TestSecurePathValidator_SymlinkEscape
- Tests symlink resolution and path boundary enforcement

**File Type Validation Tests (filetype_test.go):**
- TestValidateFileType_AllowedTypes
- TestValidateFileType_DangerousTypesBlocked
- TestValidateModelFile
- Tests magic bytes validation and dangerous extension blocking

**Secrets Management Tests (secrets_test.go):**
- TestExpandSecrets_BasicExpansion
- TestExpandSecrets_NestedAndMissingVars
- TestValidateNoPlaintextSecrets_HeuristicDetection
- Tests env variable expansion and plaintext secret detection with entropy

**Audit Logging Tests (audit_test.go):**
- TestAuditLogger_ChainIntegrity
- TestAuditLogger_VerifyChain
- TestAuditLogger_LogFileAccess
- TestAuditLogger_Disabled
- Tests tamper-evident chain hashing and file access logging
2026-02-23 18:00:45 -05:00
Jeremie Fraeys
ab20212d07
test: Update duplicate detection tests 2026-02-23 14:14:21 -05:00
Jeremie Fraeys
3b194ff2e8
feat: GPU detection transparency and artifact scanner improvements
Some checks failed
Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Waiting to run
Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Waiting to run
Build CLI with Embedded SQLite / build-macos (arm64) (push) Waiting to run
Build CLI with Embedded SQLite / build-macos (x86_64) (push) Waiting to run
Security Scan / Security Analysis (push) Waiting to run
Security Scan / Native Library Security (push) Waiting to run
Checkout test / test (push) Successful in 6s
CI/CD Pipeline / Test (push) Failing after 1s
CI/CD Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI/CD Pipeline / Build (push) Has been skipped
CI/CD Pipeline / Test Scripts (push) Has been skipped
CI/CD Pipeline / Test Native Libraries (push) Has been skipped
CI/CD Pipeline / GPU Golden Test Matrix (push) Has been skipped
Documentation / build-and-publish (push) Failing after 39s
CI/CD Pipeline / Docker Build (push) Has been skipped
- Surface GPUDetectionInfo from parseGPUCountFromConfig for detection metadata
- Document FETCH_ML_TOTAL_CPU and FETCH_ML_GPU_SLOTS_PER_GPU env vars
- Add debug logging for all env var overrides to stderr
- Track config-layer auto-detection in GPUDetectionInfo.ConfigLayerAutoDetected
- Add --include-all flag to artifact scanner (includeAll parameter)
- Add AMD production mode enforcement (error in non-local mode)
- Add GPU detector unit tests for env overrides and AMD aliasing
2026-02-23 12:29:34 -05:00
Jeremie Fraeys
bf4a8bcf78
test(auth): skip keychain tests when dbus unavailable
Some checks failed
CI/CD Pipeline / Docker Build (push) Blocked by required conditions
Security Scan / Security Analysis (push) Waiting to run
Security Scan / Native Library Security (push) Waiting to run
Checkout test / test (push) Successful in 4s
CI/CD Pipeline / Test (push) Failing after 1s
CI/CD Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI/CD Pipeline / Build (push) Has been skipped
CI/CD Pipeline / Test Scripts (push) Has been skipped
CI/CD Pipeline / Test Native Libraries (push) Has been skipped
Documentation / build-and-publish (push) Has been cancelled
2026-02-21 21:20:03 -05:00
Jeremie Fraeys
5f8e7c59a5
fix: resolve undefined DirOverallSHA256HexParallel in benchmark files
- Replace worker.DirOverallSHA256HexParallel with worker.DirOverallSHA256Hex
- Fixes in dataset_hash_bench_test.go and hash_bench_test.go
- All benchmarks pass with native_libs build tag
2026-02-21 14:30:22 -05:00
Jeremie Fraeys
23e5f3d1dc
refactor(api): internal refactoring for TUI and worker modules
- Refactor internal/worker and internal/queue packages
- Update cmd/tui for monitoring interface
- Update test configurations
2026-02-20 15:51:23 -05:00
Jeremie Fraeys
02811c0ffe
fix: resolve TODOs and standardize tests
- Fix duplicate check in security_test.go lint warning
- Mark SHA256 tests as Legacy for backward compatibility
- Convert TODO comments to documentation (task, handlers, privacy)
- Update user_manager_test to use GenerateAPIKey pattern
2026-02-19 15:34:59 -05:00
Jeremie Fraeys
27c8b08a16
test: Reorganize and add unit tests
Reorganize tests for better structure and coverage:
- Move container/security_test.go from internal/ to tests/unit/container/
- Move related tests to proper unit test locations
- Delete orphaned test files (startup_blacklist_test.go)
- Add privacy middleware unit tests
- Add worker config unit tests
- Update E2E tests for homelab and websocket scenarios
- Update test fixtures with utility functions
- Add CLI helper script for arraylist fixes
2026-02-18 21:28:13 -05:00
Jeremie Fraeys
0687ffa21f
refactor: move queue spec tests to tests/unit/ and fix test failures
- Move queue_spec_test.go from internal/queue/ to tests/unit/queue/
- Update imports to use github.com/jfraeys/fetch_ml/internal/queue
- Remove duplicate docker-compose.dev.yml from root (exists in deployments/)
- Fix spec tests: add required Status field, JobName field
- Fix loop variable capture in priority ordering test
- Fix missing closing brace between test functions
- Fix existing queue_test.go: change 50ms to 1s for Redis min duration

All tests pass: go test ./tests/unit/queue/...
2026-02-18 15:45:30 -05:00
Jeremie Fraeys
de877a3030
feat: implement WebSocket handler improvements and metrics persistence
- Add websocket_metrics table to SQLite and Postgres schemas
- Create db_metrics.go with RecordMetric, GetMetrics, GetMetricSummary methods
- Integrate metrics persistence into handleLogMetric WebSocket handler
- Remove duplicate db_datasets.go to fix type mismatches
- Move tests to tests/unit/api/ws/ following project structure
- Add payload parsing tests for handleLogMetric, handleGetExperiment, handleStatusRequest
- Update handler.go line count to 541 (still under 500 limit target)
2026-02-18 14:36:05 -05:00
Jeremie Fraeys
8ecdd36155
test(integration): add websocket queue and hash benchmarks
Some checks failed
Checkout test / test (push) Successful in 7s
CI with Native Libraries / Check Build Environment (push) Successful in 13s
CI/CD Pipeline / Test (push) Failing after 5m8s
CI/CD Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI/CD Pipeline / Build (push) Has been skipped
CI/CD Pipeline / Test Scripts (push) Has been skipped
CI/CD Pipeline / Security Scan (push) Failing after 4m51s
Documentation / build-and-publish (push) Failing after 37s
CI with Native Libraries / Build and Test Native Libraries (push) Failing after 14m38s
CI with Native Libraries / Build Release Libraries (push) Has been skipped
CI/CD Pipeline / Docker Build (push) Has been skipped
- Add websocket queue integration test
- Add worker hash benchmark test
- Add native detection script
2026-02-18 12:46:06 -05:00
Jeremie Fraeys
2a922542b1
test: fix ws_test.go to use updated NewHandler signature
Updated test file to pass jobs, jupyter, and datasets handlers to NewHandler.
All tests now pass.
2026-02-17 20:51:57 -05:00
Jeremie Fraeys
a1ce267b86
feat: Implement all worker stub methods with real functionality
- VerifySnapshot: SHA256 verification using integrity package
- EnforceTaskProvenance: Strict and best-effort provenance validation
- RunJupyterTask: Full Jupyter service lifecycle (start/stop/remove/restore/list_packages)
- RunJob: Job execution using executor.JobRunner
- PrewarmNextOnce: Prewarming with queue integration

All methods now use new architecture components instead of placeholders
2026-02-17 17:37:56 -05:00
Jeremie Fraeys
a775513037
refactor: Fix test_helpers.go package to worker_test
- Changed package from worker to worker_test to match other test files
- Updated all type references to use worker.* prefix
- Fixed Worker field access to use exported fields (ID, Config, etc.)

Build status: Compiles successfully
2026-02-17 16:57:21 -05:00
Jeremie Fraeys
713dba896c
refactor: Add test compatibility methods to worker package
- Added ComputeTaskProvenance function (delegates to integrity.ProvenanceCalculator)
- Added Worker.VerifyDatasetSpecs method
- Added Worker.EnforceTaskProvenance method (placeholder)
- Added Worker.VerifySnapshot method (placeholder)
- All methods added for backward compatibility with existing tests

Build status: Compiles successfully
2026-02-17 16:55:22 -05:00
Jeremie Fraeys
d8cc2a4efa
refactor: Migrate all test imports from api to api/ws package
Updated 6 test files to use proper api/ws package imports:

1. tests/e2e/websocket_e2e_test.go
   - api.NewWSHandler → ws.NewHandler

2. tests/e2e/wss_reverse_proxy_e2e_test.go
   - api.NewWSHandler → ws.NewHandler

3. tests/integration/ws_handler_integration_test.go
   - api.NewWSHandler → wspkg.NewHandler
   - api.Opcode* → wspkg.Opcode*

4. tests/integration/websocket_queue_integration_test.go
   - api.NewWSHandler → wspkg.NewHandler
   - api.Opcode* → wspkg.Opcode*

5. tests/unit/api/ws_test.go
   - api.NewWSHandler → wspkg.NewHandler
   - api.Opcode* → wspkg.Opcode*

6. tests/unit/api/ws_jobs_args_test.go
   - api.Opcode* → wspkg.Opcode*

Removed api/ws_compat.go shim as all tests now use proper imports.

Build status: Compiles successfully
2026-02-17 13:52:20 -05:00
Jeremie Fraeys
d2ffe042a4
cleanup: Remove obsolete ws_jupyter_errorcode_test.go
Removed tests/unit/jupyter/ws_jupyter_errorcode_test.go which referenced
non-existent api.JupyterTaskErrorCode function.

This test was validating functionality that was removed during Phase 5
API refactoring. The jupyter error code logic is now handled in the
api/jupyter/ package.

Build status: Compiles successfully
2026-02-17 13:45:01 -05:00
Jeremie Fraeys
d1bef0a450
refactor: Phase 3 - fix config/storage boundaries
Move schema ownership to infrastructure layer:

- Redis keys: config/constants.go -> queue/keys.go (TaskQueueKey, TaskPrefix, etc.)

- Filesystem paths: config/paths.go -> storage/paths.go (JobPaths)

- Create config/shared.go with RedisConfig, SSHConfig

- Update all imports: worker/, api/helpers, api/ws_jobs, api/ws_validate

- Clean up: remove duplicates from queue/task.go, queue/queue.go, config/paths.go

Build status: Compiles successfully
2026-02-17 12:49:53 -05:00
Jeremie Fraeys
7305e2bc21
test: add comprehensive test coverage and command improvements
- Add logs and debug end-to-end tests
- Add test helper utilities
- Improve test fixtures and templates
- Update API server and config lint commands
- Add multi-user database initialization
2026-02-16 20:38:15 -05:00
Jeremie Fraeys
2854d3df95
chore(cleanup): remove legacy artifacts and add tooling configs
Some checks failed
Documentation / build-and-publish (push) Has been cancelled
Checkout test / test (push) Has been cancelled
- Remove .github/ directory (migrated to .forgejo/)
- Remove .local-artifacts/ benchmark results
- Add AGENTS.md for coding assistants
- Add .windsurf/rules/ for development guidelines
- Update .gitignore
2026-02-12 12:06:09 -05:00