fetch_ml

Author	SHA1	Message	Date
Jeremie Fraeys	8f2495deb0	chore(cleanup): remove obsolete files and update .gitignore Remove deprecated components replaced by new scheduler: - Delete internal/controller/pacing_controller.go (replaced by scheduler/pacing.go) - Delete internal/manifest/schema_test.go (consolidated into tests/unit/) - Delete internal/workertest/worker.go (consolidated into tests/fixtures/) - Update .gitignore with scheduler binary and new patterns	2026-02-26 12:09:18 -05:00
Jeremie Fraeys	4cdb68907e	refactor(utilities): update supporting modules for scheduler integration Update utility modules: - File utilities with secure file operations - Environment pool with resource tracking - Error types with scheduler error categories - Logging with audit context support - Network/SSH with connection pooling - Privacy/PII handling with tenant boundaries - Resource manager with scheduler allocation - Security monitor with audit integration - Tracking plugins (MLflow, TensorBoard) with auth - Crypto signing with tenant keys - Database init with multi-user support	2026-02-26 12:07:15 -05:00
Jeremie Fraeys	6866ba9366	refactor(queue): integrate scheduler backend and storage improvements Update queue and storage systems for scheduler integration: - Queue backend with scheduler coordination - Filesystem queue with batch operations - Deduplication with tenant-aware keys - Storage layer with audit logging hooks - Domain models (Task, Events, Errors) with scheduler fields - Database layer with tenant isolation - Dataset storage with integrity checks	2026-02-26 12:06:46 -05:00
Jeremie Fraeys	6b2c377680	refactor(jupyter): enhance security and scheduler integration Update Jupyter integration for security and scheduler support: - Enhanced security configuration with audit logging - Health monitoring with scheduler event integration - Package manager with network policy enforcement - Service manager with lifecycle hooks - Network manager with tenant isolation - Workspace metadata with tenant tags - Config with resource limits - Podman container integration improvements - Experiment manager with tracking integration - Manifest runner with security checks	2026-02-26 12:06:35 -05:00
Jeremie Fraeys	3fb6902fa1	feat(worker): integrate scheduler endpoints and security hardening Update worker system for scheduler integration: - Worker server with scheduler registration - Configuration with scheduler endpoint support - Artifact handling with integrity verification - Container executor with supply chain validation - Local executor enhancements - GPU detection improvements (cross-platform) - Error handling with execution context - Factory pattern for executor instantiation - Hash integrity with native library support	2026-02-26 12:06:16 -05:00
Jeremie Fraeys	ef11d88a75	refactor(auth): add tenant scoping and permission enhancements Update authentication system for multi-tenant support: - API key management with tenant scoping - Permission checks for multi-tenant operations - Database layer with tenant isolation - Keychain integration with audit logging	2026-02-26 12:06:08 -05:00
Jeremie Fraeys	420de879ff	feat(api): integrate scheduler protocol and WebSocket enhancements Update API layer for scheduler integration: - WebSocket handlers with scheduler protocol support - Jobs WebSocket endpoint with priority queue integration - Validation middleware for scheduler messages - Server configuration with security hardening - Protocol definitions for worker-scheduler communication - Dataset handlers with tenant isolation checks - Response helpers with audit context - OpenAPI spec updates for new endpoints	2026-02-26 12:05:57 -05:00
Jeremie Fraeys	95adcba437	feat(worker): add Jupyter/vLLM plugins and process isolation Extend worker capabilities with new execution plugins and security features: - Jupyter plugin for notebook-based ML experiments - vLLM plugin for LLM inference workloads - Cross-platform process isolation (Unix/Windows) - Network policy enforcement with platform-specific implementations - Service manager integration for lifecycle management - Scheduler backend integration for queue coordination Update lifecycle management: - Enhanced runloop with state transitions - Service manager integration for plugin coordination - Improved state persistence and recovery Add test coverage: - Unit tests for Jupyter and vLLM plugins - Updated worker execution tests	2026-02-26 12:03:59 -05:00
Jeremie Fraeys	a981e89005	feat(security): add audit subsystem and tenant isolation Implement comprehensive audit and security infrastructure: - Immutable audit logs with platform-specific backends (Linux/Other) - Sealed log entries with tamper-evident checksums - Audit alert system for real-time security notifications - Log rotation with retention policies - Checkpoint-based audit verification Add multi-tenant security features: - Tenant manager with quota enforcement - Middleware for tenant authentication/authorization - Per-tenant cryptographic key isolation - Supply chain security for container verification - Cross-platform secure file utilities (Unix/Windows) Add test coverage: - Unit tests for audit alerts and sealed logs - Platform-specific audit backend tests	2026-02-26 12:03:45 -05:00
Jeremie Fraeys	43e6446587	feat(scheduler): implement multi-tenant job scheduler with gang scheduling Add new scheduler component for distributed ML workload orchestration: - Hub-based coordination for multi-worker clusters - Pacing controller for rate limiting job submissions - Priority queue with preemption support - Port allocator for dynamic service discovery - Protocol handlers for worker-scheduler communication - Service manager with OS-specific implementations - Connection management and state persistence - Template system for service deployment Includes comprehensive test suite: - Unit tests for all core components - Integration tests for distributed scenarios - Benchmark tests for performance validation - Mock fixtures for isolated testing Refs: scheduler-architecture.md	2026-02-26 12:03:23 -05:00
Jeremie Fraeys	8f9bcef754	test(phase-3): prerequisite security and reproducibility tests Implement 4 prerequisite test requirements: - TestConfigIntegrityVerification: Config signing, tamper detection, hash stability - TestManifestFilenameNonce: Cryptographic nonce generation and filename patterns - TestGPUDetectionAudit: Structured logging of GPU detection at startup - TestResourceEnvVarParsing: Resource env var parsing and override behavior Also update manifest run_manifest.go: - Add nonce-based filename support to WriteToDir - Add nonce-based file detection to LoadFromDir	2026-02-23 20:25:26 -05:00
Jeremie Fraeys	f71352202e	test(phase-1-2): naming alignment and partial test completion Rename and enhance existing tests to align with coverage map: - TestGPUDetectorAMDVendorAlias -> TestAMDAliasManifestRecord - TestScanArtifacts_SkipsKnownPathsAndLogs -> TestScanExclusionsRecorded - Add env var expansion verification to TestHIPAAValidation_InlineCredentials - Record exclusions in manifest.Artifacts for audit trail	2026-02-23 20:25:07 -05:00
Jeremie Fraeys	17d5c75e33	fix(security): Path validation improvements for symlink resolution Fix ValidatePath to correctly resolve symlinks and handle edge cases: - Resolve symlinks before boundary check to prevent traversal - Handle macOS /private prefix correctly - Add fallback for non-existent paths (parent directory resolution) - Double boundary checks: before AND after symlink resolution - Prevent race conditions between check and use Update path traversal tests: - Correct test expectations for "..." (three dots is valid filename, not traversal) - Add tests for symlink escape attempts - Add unicode attack tests - Add deeply nested traversal tests Security impact: Prevents path traversal via symlink following in artifact scanning and other file operations.	2026-02-23 19:44:16 -05:00
Jeremie Fraeys	58c1a5fa58	feat(audit): Tamper-evident audit chain verification system Add ChainVerifier for cryptographic audit log verification: - VerifyLogFile(): Validates entire audit chain integrity - Detects tampering at specific event index (FirstTampered) - Returns chain root hash for external verification - GetChainRootHash(): Standalone hash computation - VerifyAndAlert(): Boolean tampering detection with logging Add audit-verifier CLI tool: - Standalone binary for audit chain verification - Takes log path argument and reports tampering Update audit logger for chain integrity: - Each event includes sequence number and hash chain - SHA-256 linking: hash_n = SHA-256(prev_hash \|\| event_n) - Tamper detection through hash chain validation Add comprehensive test coverage: - Empty log handling - Valid chain verification - Tampering detection with modification - Root hash consistency - Alert mechanism tests Part of: V.7 audit verification from security plan	2026-02-23 19:43:50 -05:00
Jeremie Fraeys	4a4d3de8e1	feat(security): Manifest security - nonce generation, environment tracking, schema validation Add cryptographically secure manifest filename nonce generation: - GenerateManifestNonce() creates 16-byte random nonce (32 hex chars) - GenerateManifestFilename() creates unique filenames: run_manifest_<nonce>.json - Prevents enumeration attacks on manifest files Add ExecutionEnvironment struct to manifest: - Captures ConfigHash for reproducibility verification - Records GPU detection method (auto-detected, env override, config, etc.) - Records sandbox settings (NoNewPrivileges, DropAllCaps, NetworkMode) - Records compliance mode and manifest nonce - Records artifact scan exclusions with reason Add JSON Schema validation: - schema.json: Canonical schema for manifest validation - schema_version.go: Schema versioning and compatibility checking - schema_test.go: Drift detection with SHA-256 hash verification - Validates required fields (run_id, environment.config_hash, etc.) - Validates compliance_mode enum values (hipaa, standard) - Validates no negative sizes in artifacts Closes: manifest nonce, environment tracking, scan exclusions from security plan	2026-02-23 19:43:39 -05:00
Jeremie Fraeys	9434f4c8e6	feat(security): Artifact ingestion caps enforcement Add MaxArtifactFiles and MaxArtifactTotalBytes to SandboxConfig: - Default MaxArtifactFiles: 10,000 (configurable via SecurityDefaults) - Default MaxArtifactTotalBytes: 100GB (configurable via SecurityDefaults) - ApplySecurityDefaults() sets defaults if not specified Enforce caps in scanArtifacts() during directory walk: - Returns error immediately when MaxArtifactFiles exceeded - Returns error immediately when MaxArtifactTotalBytes exceeded - Prevents resource exhaustion attacks from malicious artifact trees Update all call sites to pass SandboxConfig for cap enforcement: - Native bridge libs updated to pass caps argument - Benchmark tests updated with nil caps (unlimited for benchmarks) - Unit tests updated with nil caps Closes: artifact ingestion caps items from security plan	2026-02-23 19:43:28 -05:00
Jeremie Fraeys	a8180f1f26	feat(security): HIPAA compliance mode and PHI denylist validation Add compliance_mode field to Config with strict HIPAA validation: - Requires SnapshotStore.Secure=true in HIPAA mode - Requires NetworkMode="none" for tenant isolation - Requires non-empty SeccompProfile - Requires NoNewPrivileges=true - Enforces credentials via environment variables only (no inline YAML) Add PHI denylist validation for AllowedSecrets: - Blocks secrets matching patterns: patient, ssn, mrn, medical_record, diagnosis, dob, birth, mrn_number, patient_id, patient_name - Prevents accidental PHI exfiltration via secret channels Add comprehensive test coverage in hipaa_validation_test.go: - Network mode enforcement tests - NoNewPrivileges requirement tests - Seccomp profile validation tests - Inline credential rejection tests - PHI denylist validation tests Closes: compliance_mode, PHI denylist items from security plan	2026-02-23 19:43:19 -05:00
Jeremie Fraeys	fc2459977c	refactor(worker): update worker tests and native bridge Worker Refactoring: - Update internal/worker/factory.go, worker.go, snapshot_store.go - Update native_bridge.go and native_bridge_nocgo.go for native library integration Test Updates: - Update all worker unit tests for new interfaces - Update chaos tests - Update container/podman_test.go - Add internal/workertest/worker.go for shared test utilities Documentation: - Update native/README.md	2026-02-23 18:04:22 -05:00
Jeremie Fraeys	a70d8aad8e	refactor: remove dead code and fix unused variables Cleanup: - Delete internal/worker/testutil.go (150 lines of unused test utilities) - Remove unused stateDir() function from internal/jupyter/service_manager.go - Silence unused variable warning in internal/worker/executor/container.go	2026-02-23 18:03:38 -05:00
Jeremie Fraeys	92aab06d76	feat(security): implement comprehensive security hardening phases 1-5,7 Implements defense-in-depth security for HIPAA and multi-tenant requirements: Phase 1 - File Ingestion Security: - SecurePathValidator with symlink resolution and path boundary enforcement in internal/fileutil/secure.go - Magic bytes validation for ML artifacts (safetensors, GGUF, HDF5, numpy) in internal/fileutil/filetype.go - Dangerous extension blocking (.pt, .pkl, .exe, .sh, .zip) - Upload limits (10GB size, 100MB/s rate, 10 uploads/min) Phase 2 - Sandbox Hardening: - ApplySecurityDefaults() with secure-by-default principle - network_mode: none, read_only_root: true, no_new_privileges: true - drop_all_caps: true, user_ns: true, run_as_uid/gid: 1000 - PodmanSecurityConfig and BuildSecurityArgs() in internal/container/podman.go - BuildPodmanCommand now accepts full security configuration - Container executor passes SandboxConfig to Podman command builder - configs/seccomp/default-hardened.json blocks dangerous syscalls (ptrace, mount, reboot, kexec_load, open_by_handle_at) Phase 3 - Secrets Management: - expandSecrets() for environment variable expansion using ${VAR} syntax - validateNoPlaintextSecrets() with entropy-based detection - Pattern matching for AWS, GitHub, GitLab, OpenAI, Stripe tokens - Shannon entropy calculation (>4 bits/char triggers detection) - Secrets expanded during LoadConfig() before validation Phase 5 - HIPAA Audit Logging: - Tamper-evident chain hashing with SHA-256 in internal/audit/audit.go - Event struct extended with PrevHash, EventHash, SequenceNum - File access event types: EventFileRead, EventFileWrite, EventFileDelete - LogFileAccess() helper for HIPAA compliance - VerifyChain() function for tamper detection Supporting Changes: - Add DeleteJob() and DeleteJobsByPrefix() to storage package - Integrate SecurePathValidator in artifact scanning	2026-02-23 18:00:33 -05:00
Jeremie Fraeys	3b194ff2e8	feat: GPU detection transparency and artifact scanner improvements Some checks failed Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Waiting to run Details Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Waiting to run Details Build CLI with Embedded SQLite / build-macos (arm64) (push) Waiting to run Details Build CLI with Embedded SQLite / build-macos (x86_64) (push) Waiting to run Details Security Scan / Security Analysis (push) Waiting to run Details Security Scan / Native Library Security (push) Waiting to run Details Checkout test / test (push) Successful in 6s Details CI/CD Pipeline / Test (push) Failing after 1s Details CI/CD Pipeline / Dev Compose Smoke Test (push) Has been skipped Details CI/CD Pipeline / Build (push) Has been skipped Details CI/CD Pipeline / Test Scripts (push) Has been skipped Details CI/CD Pipeline / Test Native Libraries (push) Has been skipped Details CI/CD Pipeline / GPU Golden Test Matrix (push) Has been skipped Details Documentation / build-and-publish (push) Failing after 39s Details CI/CD Pipeline / Docker Build (push) Has been skipped Details - Surface GPUDetectionInfo from parseGPUCountFromConfig for detection metadata - Document FETCH_ML_TOTAL_CPU and FETCH_ML_GPU_SLOTS_PER_GPU env vars - Add debug logging for all env var overrides to stderr - Track config-layer auto-detection in GPUDetectionInfo.ConfigLayerAutoDetected - Add --include-all flag to artifact scanner (includeAll parameter) - Add AMD production mode enforcement (error in non-local mode) - Add GPU detector unit tests for env overrides and AMD aliasing	2026-02-23 12:29:34 -05:00
Jeremie Fraeys	1b0781dc68	fix(auth): make DeleteAPIKey resilient to keyring errors Some checks failed Security Scan / Security Analysis (push) Waiting to run Details Security Scan / Native Library Security (push) Waiting to run Details Checkout test / test (push) Successful in 4s Details CI/CD Pipeline / Test (push) Has been cancelled Details CI/CD Pipeline / Dev Compose Smoke Test (push) Has been cancelled Details CI/CD Pipeline / Build (push) Has been cancelled Details CI/CD Pipeline / Test Scripts (push) Has been cancelled Details CI/CD Pipeline / Test Native Libraries (push) Has been cancelled Details CI/CD Pipeline / Docker Build (push) Has been cancelled Details Documentation / build-and-publish (push) Has been cancelled Details DeleteAPIKey now ignores primary keyring errors (e.g., dbus unavailable) and always cleans up the fallback store	2026-02-21 21:19:46 -05:00
Jeremie Fraeys	be39b37aec	feat: native GPU detection and NVML bridge for macOS and Linux - Add dynamic NVML loading for Linux GPU detection - Add macOS GPU detection via IOKit framework - Add Zig NVML wrapper for cross-platform GPU queries - Update native bridge to support platform-specific GPU libs - Add CMake support for NVML dynamic library	2026-02-21 17:59:59 -05:00
Jeremie Fraeys	05b7af6991	feat: implement NVML-based GPU monitoring - Add native/nvml_gpu/ C++ library wrapping NVIDIA Management Library - Add Go bindings in internal/worker/gpu_nvml_native.go and gpu_nvml_stub.go - Update gpu_detector.go to use NVML for accurate GPU count detection - Update native/CMakeLists.txt to build nvml_gpu library - Provides real-time GPU utilization, memory, temperature, clocks, power - Falls back to environment variable when NVML unavailable	2026-02-21 15:16:09 -05:00
Jeremie Fraeys	e557313e08	fix: context reuse benchmark uses temp directory - Replace hardcoded testdata path with b.TempDir() - Add createSmallDataset helper for self-contained benchmarks - Fixes FAIL: BenchmarkContextReuse / BenchmarkSequentialHashes	2026-02-21 14:38:00 -05:00
Jeremie Fraeys	158c525bef	fix: resolve benchmark and build tag conflicts - Remove duplicate hash_selector.go (build tags handle switching) - Fix benchmark to use worker.DirOverallSHA256Hex - Fix snapshot_store.go to use integrity.DirOverallSHA256Hex directly - Native tests pass, benchmarks now correctly test native vs Go	2026-02-21 14:26:48 -05:00
Jeremie Fraeys	90d702823b	fix: correct C type cast and add context reuse benchmark - Fix C.uint32_t cast for runtime.NumCPU() in native_bridge_libs.go - Add context_reuse_bench_test.go to verify performance gains - All native tests pass (8/8) - Benchmarks functional	2026-02-21 14:20:40 -05:00
Jeremie Fraeys	d1ac558107	perf: implement context reuse Go Worker (internal/worker/native_bridge_libs.go): - Add global hashCtx with sync.Once for lazy initialization - Eliminates 5-20ms fh_init/fh_cleanup per hash operation - Uses runtime.NumCPU() for optimal thread count - Log initialization time for observability Zig CLI (cli/src/native/hash.zig): - Add global_ctx with atomic flag and mutex - Thread-safe initialization with double-check pattern - Idempotent init() callable from multiple threads - Log init time for debugging	2026-02-21 14:19:14 -05:00
Jeremie Fraeys	48d00b8322	feat: integrate native queue backend into worker and API - Add QueueBackendNative constant to backend.go - Add case for native queue in NewBackend() switch - Native queue uses same FilesystemPath config - Build tag -tags native_libs enables native implementation Native library integration now complete: - dataset_hash: Worker (hash_selector), CLI (verify auto-hash) - queue_index: Worker/API (backend selection with 'native' type)	2026-02-21 14:11:10 -05:00
Jeremie Fraeys	c89d970210	refactor: migrate from env var to build tags for native libs Replace FETCHML_NATIVE_LIBS=1 environment variable with -tags native_libs: Changes: - internal/queue/native_queue.go: UseNativeQueue is now const true - internal/queue/native_queue_stub.go: UseNativeQueue is now const false - build/docker/simple.Dockerfile: Add -tags native_libs to go build - deployments/docker-compose.dev.yml: Remove FETCHML_NATIVE_LIBS env var - native/README.md: Update documentation for build tags - scripts/test-native-with-redis.sh: New test script with Redis via docker-compose Benefits: - Compile-time enforcement (no runtime checks needed) - Cleaner deployment (no env var management) - Type safety (const vs var) - Simpler testing with docker-compose Redis integration	2026-02-21 13:43:58 -05:00
Jeremie Fraeys	23e5f3d1dc	refactor(api): internal refactoring for TUI and worker modules - Refactor internal/worker and internal/queue packages - Update cmd/tui for monitoring interface - Update test configurations	2026-02-20 15:51:23 -05:00
Jeremie Fraeys	6028779239	feat: update CLI, TUI, and security documentation - Add safety checks to Zig build - Add TUI with job management and narrative views - Add WebSocket support and export services - Add smart configuration defaults - Update API routes with security headers - Update SECURITY.md with comprehensive policy - Add Makefile security scanning targets	2026-02-19 15:35:05 -05:00
Jeremie Fraeys	02811c0ffe	fix: resolve TODOs and standardize tests - Fix duplicate check in security_test.go lint warning - Mark SHA256 tests as Legacy for backward compatibility - Convert TODO comments to documentation (task, handlers, privacy) - Update user_manager_test to use GenerateAPIKey pattern	2026-02-19 15:34:59 -05:00
Jeremie Fraeys	37aad7ae87	feat: add manifest signing and native hashing support - Integrate RunManifest.Validate with existing Validator - Add manifest Sign() and Verify() methods - Add native C++ hashing libraries (dataset_hash, queue_index) - Add native bridge for Go/C++ integration - Add deduplication support in queue	2026-02-19 15:34:39 -05:00
Jeremie Fraeys	a3f9bf8731	feat: implement tamper-evident audit logging - Add hash-chained audit log entries for tamper detection - Add EventRecorder interface for structured event logging - Add TaskEvent helper method for consistent event emission	2026-02-19 15:34:28 -05:00
Jeremie Fraeys	e4d286f2e5	feat: add security monitoring and validation framework - Implement anomaly detection monitor (brute force, path traversal, etc.) - Add input validation framework with safety rules - Add environment-based secrets manager with redaction - Add security test suite for path traversal and injection - Add CI security scanning workflow	2026-02-19 15:34:25 -05:00
Jeremie Fraeys	34aaba8f17	feat: implement Argon2id hashing and Ed25519 manifest signing - Add Argon2id-based API key hashing with salt support - Implement Ed25519 manifest signing (key generation, sign, verify) - Add gen-keys CLI tool for manifest signing keys - Fix hash-key command to hash provided key (not generate new one) - Complete isHex helper function	2026-02-19 15:34:20 -05:00
Jeremie Fraeys	27c8b08a16	test: Reorganize and add unit tests Reorganize tests for better structure and coverage: - Move container/security_test.go from internal/ to tests/unit/container/ - Move related tests to proper unit test locations - Delete orphaned test files (startup_blacklist_test.go) - Add privacy middleware unit tests - Add worker config unit tests - Update E2E tests for homelab and websocket scenarios - Update test fixtures with utility functions - Add CLI helper script for arraylist fixes	2026-02-18 21:28:13 -05:00
Jeremie Fraeys	4756348c48	feat: Worker sandboxing and security configuration Add security hardening features for worker execution: - Worker config with sandboxing options (network_mode, read_only, secrets) - Execution setup with security context propagation - Podman container runtime security enhancements - Security configuration management in config package - Add homelab-sandbox.yaml example configuration Supports running jobs in isolated, restricted environments.	2026-02-18 21:27:59 -05:00
Jeremie Fraeys	cb826b74a3	feat: WebSocket API infrastructure improvements Enhance WebSocket client and server components: - Add new WebSocket opcodes (CompareRuns, FindRuns, ExportRun, SetRunOutcome) - Improve WebSocket client with additional response handlers - Add crypto utilities for secure WebSocket communications - Add I/O utilities for WebSocket payload handling - Enhance validation for WebSocket message payloads - Update routes for new WebSocket endpoints - Improve monitor and validate command WebSocket integrations	2026-02-18 21:27:48 -05:00
Jeremie Fraeys	aaeef69bab	feat: Privacy and PII detection Add privacy protection features to prevent accidental PII leakage: - PII detection engine supporting emails, phone numbers, SSNs, credit cards - CLI privacy command for scanning files and text - Privacy middleware for API request/response filtering - Suggestion utility for privacy-preserving alternatives Integrates PII scanning into manifest validation for narrative fields.	2026-02-18 21:27:23 -05:00
Jeremie Fraeys	260e18499e	feat: Research features - narrative fields and outcome tracking Add comprehensive research context tracking to jobs: - Narrative fields: hypothesis, context, intent, expected_outcome - Experiment groups and tags for organization - Run comparison (compare command) for diff analysis - Run search (find command) with criteria filtering - Run export (export command) for data portability - Outcome setting (outcome command) for experiment validation Update queue and requeue commands to support narrative fields. Add narrative validation to manifest validator. Add WebSocket handlers for compare, find, export, and outcome operations. Includes E2E tests for phase 2 features.	2026-02-18 21:27:05 -05:00
Jeremie Fraeys	38b6c3323a	refactor: adopt PathRegistry in jupyter workspace_metadata.go Update internal/jupyter/workspace_metadata.go to use centralized PathRegistry: Changes: - Add import for internal/config package - Update saveMetadata() to use config.FromEnv() for directory creation - Replace os.MkdirAll with paths.EnsureDir() for metadata directory Benefits: - Consistent directory creation via PathRegistry - Centralized path management for workspace metadata - Better error handling for directory creation	2026-02-18 16:58:36 -05:00
Jeremie Fraeys	d9ed8f4ffa	refactor: adopt PathRegistry in queue filesystem_queue.go Update internal/queue/filesystem_queue.go to use centralized PathRegistry: Changes: - Add import for internal/config package - Update NewFilesystemQueue to use config.FromEnv() for directory creation - Replace os.MkdirAll with paths.EnsureDir() for all queue directories: - pending/entries - running - finished - failed Benefits: - Consistent directory creation via PathRegistry - Centralized path management for queue storage - Better error handling for directory creation	2026-02-18 16:57:45 -05:00
Jeremie Fraeys	f7afb36a7c	refactor: adopt PathRegistry in execution/setup.go Update internal/worker/execution/setup.go to use centralized PathRegistry: Changes: - Add import for internal/config package - Update SetupJobDirectories to use config.FromEnv() for directory creation - Replace all os.MkdirAll calls with paths.EnsureDir() - pendingDir creation - jobDir creation - outputDir (running) creation Benefits: - Consistent directory creation via PathRegistry - Centralized path management for job execution directories - Better error handling for directory creation failures	2026-02-18 16:57:04 -05:00
Jeremie Fraeys	33b893a71a	refactor: adopt PathRegistry in worker snapshot_store.go Update internal/worker/snapshot_store.go to use centralized PathRegistry: Changes: - Add import for internal/config package - Update ResolveSnapshot to use config.FromEnv() for directory creation - Replace os.MkdirAll with paths.EnsureDir() for tmpRoot - Replace os.MkdirAll with paths.EnsureDir() for extractDir - Replace os.MkdirAll with paths.EnsureDir() for cacheDir parent Benefits: - Consistent directory creation via PathRegistry - Centralized path management for snapshot storage - Better error handling for directory creation	2026-02-18 16:56:27 -05:00
Jeremie Fraeys	a5059c5231	refactor: adopt PathRegistry in worker config Update internal/worker/config.go to use centralized PathRegistry: Changes: - Initialize PathRegistry with config.FromEnv() in LoadConfig - Update BasePath default to use paths.ExperimentsDir() - Update DataDir default to use paths.DataDir() - Simplify DataDir logic by using PathRegistry directly Benefits: - Consistent directory locations via PathRegistry - Centralized path management across worker and api-server - Simpler configuration with fewer conditional branches	2026-02-18 16:55:18 -05:00
Jeremie Fraeys	4bee42493b	refactor: adopt PathRegistry in api server_config.go Update internal/api/server_config.go to use centralized PathRegistry: Changes: - Update EnsureLogDirectory() to use config.FromEnv().LogDir() with EnsureDir() - Update Validate() to use PathRegistry for default BasePath and DataDir - Remove hardcoded /tmp/ml-experiments default - Use paths.ExperimentsDir() and paths.DataDir() for consistent paths Benefits: - Consistent directory locations via PathRegistry - Centralized directory creation with EnsureDir() - Better error handling for directory creation	2026-02-18 16:54:24 -05:00
Jeremie Fraeys	2101e4a01c	refactor: adopt PathRegistry in experiment manager Update internal/experiment/manager.go to use centralized PathRegistry: Changes: - Add import for internal/config package - Add NewManagerFromPaths() constructor using PathRegistry - Update Initialize() to use config.FromEnv().ExperimentsDir() with EnsureDir() - Update archiveExperiment() to use PathRegistry pattern Benefits: - Consistent experiment directory location via PathRegistry - Centralized directory creation with EnsureDir() - Backward compatible: existing NewManager() still works - New code can use NewManagerFromPaths() for PathRegistry integration	2026-02-18 16:53:41 -05:00
Jeremie Fraeys	3e744bf312	refactor: adopt PathRegistry in jupyter service_manager.go Update internal/jupyter/service_manager.go to use centralized PathRegistry: Changes: - Import config package for PathRegistry access - Update stateDir() to use config.FromEnv().JupyterStateDir() - Update workspaceBaseDir() to use config.FromEnv().ActiveDataDir() - Update trashBaseDir() to use config.FromEnv().JupyterStateDir() - Update NewServiceManager() to use PathRegistry for workspace metadata file - Update loadServices() to use PathRegistry for services file path - Update saveServices() to use PathRegistry with EnsureDir() - Rename parameter 'config' to 'svcConfig' to avoid shadowing import Benefits: - Consistent path management across codebase - Centralized directory creation with EnsureDir() - Environment variable override still supported (backward compatible) - Proper error handling for directory creation failures	2026-02-18 16:52:03 -05:00

1 2 3

109 commits