fetch_ml

Author	SHA1	Message	Date
Jeremie Fraeys	9b2d5986a3	docs(architecture): add technical documentation for scheduler and security Add comprehensive architecture documentation: - scheduler-architecture.md - Design of distributed job scheduler - Hub coordination model - Gang scheduling algorithm - Service discovery mechanisms - Failure recovery strategies - multi-tenant-security.md - Security isolation patterns - Tenant boundary enforcement - Resource quota management - Cross-tenant data protection - runtime-security.md - Operational security guidelines - Container security configurations - Network policy enforcement - Audit logging requirements	2026-02-26 12:04:33 -05:00
Jeremie Fraeys	685f79c4a7	ci(deploy): add Forgejo workflows and deployment automation Add CI/CD pipelines for Forgejo/GitHub Actions: - build.yml - Main build pipeline with matrix builds - deploy-staging.yml - Automated staging deployment - deploy-prod.yml - Production deployment with rollback support - security-modes-test.yml - Security mode validation tests Add deployment artifacts: - docker-compose.staging.yml for staging environment - ROLLBACK.md with rollback procedures and playbooks Supports multi-environment deployment workflow with proper gates between staging and production.	2026-02-26 12:04:23 -05:00
Jeremie Fraeys	86f9ae5a7e	docs(config): reorganize configuration structure and add documentation Restructure configuration files for better organization: - Add scheduler configuration examples (scheduler.yaml.example) - Reorganize worker configs into subdirectories: - distributed/ - Multi-node cluster configurations - standalone/ - Single-node deployment configs - Add environment-specific configs: - dev-local.yaml, docker-dev.yaml, docker-prod.yaml - homelab-secure.yaml, worker-prod.toml - Add deployment configs for different security modes: - docker-standard.yaml, docker-hipaa.yaml, docker-dev.yaml Add documentation: - configs/README.md with configuration guidelines - configs/SECURITY.md with security configuration best practices	2026-02-26 12:04:11 -05:00
Jeremie Fraeys	95adcba437	feat(worker): add Jupyter/vLLM plugins and process isolation Extend worker capabilities with new execution plugins and security features: - Jupyter plugin for notebook-based ML experiments - vLLM plugin for LLM inference workloads - Cross-platform process isolation (Unix/Windows) - Network policy enforcement with platform-specific implementations - Service manager integration for lifecycle management - Scheduler backend integration for queue coordination Update lifecycle management: - Enhanced runloop with state transitions - Service manager integration for plugin coordination - Improved state persistence and recovery Add test coverage: - Unit tests for Jupyter and vLLM plugins - Updated worker execution tests	2026-02-26 12:03:59 -05:00
Jeremie Fraeys	a981e89005	feat(security): add audit subsystem and tenant isolation Implement comprehensive audit and security infrastructure: - Immutable audit logs with platform-specific backends (Linux/Other) - Sealed log entries with tamper-evident checksums - Audit alert system for real-time security notifications - Log rotation with retention policies - Checkpoint-based audit verification Add multi-tenant security features: - Tenant manager with quota enforcement - Middleware for tenant authentication/authorization - Per-tenant cryptographic key isolation - Supply chain security for container verification - Cross-platform secure file utilities (Unix/Windows) Add test coverage: - Unit tests for audit alerts and sealed logs - Platform-specific audit backend tests	2026-02-26 12:03:45 -05:00
Jeremie Fraeys	43e6446587	feat(scheduler): implement multi-tenant job scheduler with gang scheduling Add new scheduler component for distributed ML workload orchestration: - Hub-based coordination for multi-worker clusters - Pacing controller for rate limiting job submissions - Priority queue with preemption support - Port allocator for dynamic service discovery - Protocol handlers for worker-scheduler communication - Service manager with OS-specific implementations - Connection management and state persistence - Template system for service deployment Includes comprehensive test suite: - Unit tests for all core components - Integration tests for distributed scenarios - Benchmark tests for performance validation - Mock fixtures for isolated testing Refs: scheduler-architecture.md	2026-02-26 12:03:23 -05:00
Jeremie Fraeys	6e0e7d9d2e	fix(smoke-test): copy promtail config file instead of directory Some checks failed Checkout test / test (push) Successful in 5s Details CI/CD Pipeline / Test (push) Failing after 1s Details CI/CD Pipeline / Dev Compose Smoke Test (push) Has been skipped Details CI/CD Pipeline / Build (push) Has been skipped Details CI/CD Pipeline / Test Scripts (push) Has been skipped Details CI/CD Pipeline / Test Native Libraries (push) Has been skipped Details CI/CD Pipeline / GPU Golden Test Matrix (push) Has been skipped Details Documentation / build-and-publish (push) Failing after 38s Details CI/CD Pipeline / Docker Build (push) Has been skipped Details Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Has been cancelled Details Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Has been cancelled Details Build CLI with Embedded SQLite / build-macos (arm64) (push) Has been cancelled Details Build CLI with Embedded SQLite / build-macos (x86_64) (push) Has been cancelled Details Security Scan / Security Analysis (push) Has been cancelled Details Security Scan / Native Library Security (push) Has been cancelled Details Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled Details Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled Details Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled Details Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled Details Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled Details Verification & Maintenance / Verification Summary (push) Has been cancelled Details Copy just promtail-config.yml to temp root instead of entire monitoring/ directory. This fixes the mount error where promtail couldn't find its config at the expected path.	2026-02-24 11:57:35 -05:00
Jeremie Fraeys	bcc432a524	fix(deployments): use relative paths instead of FETCHML_REPO_ROOT with wrong fallback Replace all .. with proper relative paths: - Build context: Use '.' (current directory = project root when using --project-directory) - Volume mounts: Use './data/...' instead of '../data/...' - Config mounts: Use './configs/...' instead of '../configs/...' The '..' fallback was incorrect - when --project-directory is set to repo root, '..' would point to parent of repo instead of repo itself. Using '.' or './path' correctly resolves relative to project root. Environment variables for data directories (SMOKE_TEST_DATA_DIR, PROD_DATA_DIR, HOMELAB_DATA_DIR, LOCAL_DATA_DIR) are preserved for runtime customization.	2026-02-24 11:53:19 -05:00
Jeremie Fraeys	cebcb6115f	fix(smoke-test): add FETCHML_REPO_ROOT to env file Ensure FETCHML_REPO_ROOT is set in the env file passed to docker-compose. This fixes path resolution so fallback paths don't incorrectly use parent directory.	2026-02-24 11:48:10 -05:00
Jeremie Fraeys	3ff5ef320a	fix(deployments): add HOMELAB_DATA_DIR support to homelab-secure Update docker-compose.homelab-secure.yml to use HOMELAB_DATA_DIR environment variable with fallback to data/homelab for all volume mounts.	2026-02-24 11:43:38 -05:00
Jeremie Fraeys	5691b06876	fix(deployments): add env var support for data directories Update all docker-compose files to use environment variables for data paths: - docker-compose.local.yml: Use LOCAL_DATA_DIR with fallback to ../data/dev - docker-compose.prod.yml: Use PROD_DATA_DIR with fallback to data/prod - docker-compose.prod.smoke.yml: Use SMOKE_TEST_DATA_DIR with fallback This allows smoke tests and local development to use temp directories instead of repo-relative paths, avoiding file sharing permission issues on macOS with Docker Desktop or Colima.	2026-02-24 11:43:11 -05:00
Jeremie Fraeys	ce4106a837	fix(smoke-test): copy monitoring configs to temp directory Promtail mounts monitoring configs from repo root which fails in Colima: - Copy monitoring/ directory to temp SMOKE_TEST_DATA_DIR - Update promtail volume path to use SMOKE_TEST_DATA_DIR for configs - This ensures all mounts are from accessible temp directories	2026-02-24 11:40:32 -05:00
Jeremie Fraeys	225ef5bfb5	fix(smoke-test): use actual env file instead of process substitution Process substitution <(echo ...) doesn't work with docker-compose. Write the env file to an actual temp file instead.	2026-02-24 11:38:18 -05:00
Jeremie Fraeys	bff2336db2	fix(smoke-test): use temp directory for smoke test data Use /tmp for smoke test data to avoid file sharing issues on macOS/Colima: - smoke-test.sh: Create temp dir with mktemp, export SMOKE_TEST_DATA_DIR - docker-compose.dev.yml: Use SMOKE_TEST_DATA_DIR with fallback to data/dev - Remove file sharing permission checks (no longer needed with tmp) This avoids Docker Desktop/Colima file sharing permission issues entirely by using a system temp directory that's always accessible.	2026-02-24 11:37:45 -05:00
Jeremie Fraeys	d3a861063f	fix(smoke-test): add Colima-specific file sharing instructions Detect if user is running Colima and provide appropriate fix instructions: - Check for colima command presence - If Colima detected: suggest virtiofs/sshfs mount options - Show colima.yaml mount configuration example - Include verification command: colima ssh -- ls ... Maintains Docker Desktop instructions for non-Colima users.	2026-02-24 11:35:58 -05:00
Jeremie Fraeys	00f938861c	fix(smoke-test): add Docker file sharing permission check for macOS Add pre-flight check to detect Docker Desktop file sharing issues: - After creating data directories, verify Docker can access them - If access fails, print helpful error message with fix instructions - Directs users to Docker Desktop Settings -> Resources -> File sharing Prevents confusing 'operation not permitted' errors during smoke tests.	2026-02-24 11:35:23 -05:00
Jeremie Fraeys	8a054169ad	fix(docker): skip NVML GPU build for non-GPU systems Dockerfile targets systems without GPUs: - Add -DBUILD_NVML_GPU=OFF to cmake in simple.Dockerfile - Add BUILD_NVML_GPU option to native/CMakeLists.txt (default ON) - Conditionally include nvml_gpu subdirectory - Update all_native_libs target to exclude nvml_gpu when disabled This allows native libraries (dataset_hash, queue_index) to build without requiring NVIDIA drivers/libraries.	2026-02-23 20:47:13 -05:00
Jeremie Fraeys	2a41032414	fix(deployments): fix docker-compose build context paths Fix build context resolution in smoke test scripts: - docker-compose.dev.yml: Use ${FETCHML_REPO_ROOT:-..} for api-server and worker - docker-compose.prod.smoke.yml: Simplify dockerfile path (remove redundant FETCHML_REPO_ROOT) Previously used 'context: ..' which resolved incorrectly when docker-compose was run with --project-directory. Now consistently uses FETCHML_REPO_ROOT env var for proper path resolution in both dev and prod smoke tests.	2026-02-23 20:30:07 -05:00
Jeremie Fraeys	6fc2e373c1	fix: resolve IDE warnings and test errors Bug fixes and cleanup for test infrastructure: - schema_test.go: Fix SchemaVersion reference with proper manifest import - schema_test.go: Update all schema.json paths to internal/manifest location - manifestenv.go: Remove unused helper functions (isArtifactsType, getPackagePath) - nobaredetector.go: Fix exprToString syntax error, remove unused functions All tests now pass without errors or warnings	2026-02-23 20:26:20 -05:00
Jeremie Fraeys	799afb9efa	docs: update coverage map and development documentation Comprehensive documentation updates for 100% test coverage: - TEST_COVERAGE_MAP.md: 49/49 requirements marked complete (100% coverage) - CHANGELOG.md: Document Phase 8 test coverage implementation - DEVELOPMENT.md: Add testing strategy and property-based test guidelines - README.md: Add Testing & Security section with coverage highlights All security and reproducibility requirements now tracked and tested	2026-02-23 20:26:13 -05:00
Jeremie Fraeys	e0aae73cf4	test(phase-7-9): audit verification, fault injection, integration tests Implement V.7, V.9, and integration test requirements: Audit Verification (V.7): - TestAuditVerificationJob: Chain verification and tamper detection Fault Injection (V.9): - TestNVMLUnavailableProvenanceFail, TestManifestWritePartialFailure - TestRedisUnavailableQueueBehavior, TestAuditLogUnavailableHaltsJob - TestConfigHashFailureProvenanceClosed, TestDiskFullDuringArtifactScan Integration Tests: - TestCrossTenantIsolation: Filesystem isolation verification - TestRunManifestReproducibility: Cross-run reproducibility - TestAuditLogPHIRedaction: PHI leak prevention	2026-02-23 20:26:01 -05:00
Jeremie Fraeys	80370e9f4a	test(phase-6): property-based tests with gopter Implement property-based invariant verification: - TestPropertyConfigHashAlwaysPresent: Valid configs produce non-empty hash - TestPropertyConfigHashDeterministic: Same config produces same hash - TestPropertyDetectionSourceAlwaysValid: CreateDetectorWithInfo returns valid source - TestPropertyProvenanceFailClosed: Strict mode fails on incomplete env - TestPropertyScanArtifactsNeverNilEnvironment: Artifacts can hold Environment - TestPropertyManifestEnvironmentSurvivesRoundtrip: Environment survives write/load Uses gopter for property-based testing with deterministic seeds	2026-02-23 20:25:49 -05:00
Jeremie Fraeys	9f9d75dd68	test(phase-4): reproducibility crossover tests Implement reproducibility crossover requirements: - TestManifestEnvironmentCapture: Environment population with ConfigHash and DetectionMethod - TestConfigHashPostDefaults: Hash computation after env expansion and defaults Verifies manifest.Environment is properly populated for reproducibility tracking	2026-02-23 20:25:37 -05:00
Jeremie Fraeys	8f9bcef754	test(phase-3): prerequisite security and reproducibility tests Implement 4 prerequisite test requirements: - TestConfigIntegrityVerification: Config signing, tamper detection, hash stability - TestManifestFilenameNonce: Cryptographic nonce generation and filename patterns - TestGPUDetectionAudit: Structured logging of GPU detection at startup - TestResourceEnvVarParsing: Resource env var parsing and override behavior Also update manifest run_manifest.go: - Add nonce-based filename support to WriteToDir - Add nonce-based file detection to LoadFromDir	2026-02-23 20:25:26 -05:00
Jeremie Fraeys	f71352202e	test(phase-1-2): naming alignment and partial test completion Rename and enhance existing tests to align with coverage map: - TestGPUDetectorAMDVendorAlias -> TestAMDAliasManifestRecord - TestScanArtifacts_SkipsKnownPathsAndLogs -> TestScanExclusionsRecorded - Add env var expansion verification to TestHIPAAValidation_InlineCredentials - Record exclusions in manifest.Artifacts for audit trail	2026-02-23 20:25:07 -05:00
Jeremie Fraeys	a769d9a430	chore(deps): Update dependencies for verification and security features Add dependencies for verification framework: - golang.org/x/tools/go/analysis (custom linting) - Testing framework updates Add dependencies for audit system: - crypto/sha256 for chain hashing - encoding/hex for hash representation All dependencies verified compatible with Go 1.25+ toolchain.	2026-02-23 19:44:41 -05:00
Jeremie Fraeys	b33c6c4878	test(security): Add PHI denylist tests to secrets validation Add comprehensive PHI detection tests: - patient_id rejection - ssn rejection - medical_record_number rejection - diagnosis_code rejection - Mixed secrets with PHI rejection - Normal secrets acceptance (HF_TOKEN, WANDB_API_KEY, etc.) Ensures AllowedSecrets PHI denylist validation works correctly across all PHI pattern variations. Part of: PHI denylist validation from security plan	2026-02-23 19:44:33 -05:00
Jeremie Fraeys	fe75b6e27a	build(verification): Add Makefile targets and CI for verification suite Add verification targets to Makefile: - verify-schema: Check manifest schema hasn't drifted (V.1) - test-schema-validation: Test schema validation with examples - lint-custom: Build and run fetchml-vet analyzers (V.4) - verify-audit: Run audit chain verification tests (V.7) - verify-audit-chain: CLI tool for verifying specific log files - verify-all: Run all verification checks (CI target) - verify-quick: Fast checks for development - verify-full: Comprehensive verification with unit/integration tests Add install targets for verification tools: - install-property-test-deps: gopter for property-based testing (V.2) - install-mutation-test-deps: go-mutesting for mutation testing (V.3) - install-security-scan-deps: gosec, nancy for supply chain (V.6) - install-scorecard: OpenSSF Scorecard (V.10) Add Forgejo CI workflow (.forgejo/workflows/verification.yml): - Runs on every push and PR - Schema drift detection - Custom linting - Audit chain verification - Security scanning integration Add verification documentation (docs/src/verification.md): - V.1: Schema validation details - V.4: Custom linting rules - V.7: Audit chain verification - CI integration guide	2026-02-23 19:44:25 -05:00
Jeremie Fraeys	17d5c75e33	fix(security): Path validation improvements for symlink resolution Fix ValidatePath to correctly resolve symlinks and handle edge cases: - Resolve symlinks before boundary check to prevent traversal - Handle macOS /private prefix correctly - Add fallback for non-existent paths (parent directory resolution) - Double boundary checks: before AND after symlink resolution - Prevent race conditions between check and use Update path traversal tests: - Correct test expectations for "..." (three dots is valid filename, not traversal) - Add tests for symlink escape attempts - Add unicode attack tests - Add deeply nested traversal tests Security impact: Prevents path traversal via symlink following in artifact scanning and other file operations.	2026-02-23 19:44:16 -05:00
Jeremie Fraeys	651318bc93	test(security): Integration tests for sandbox escape and secrets handling Add sandbox escape integration tests: - Container breakout attempts via privileged mode - Host path mounting restrictions - Network namespace isolation verification - Capability dropping validation - Seccomp profile enforcement Add secrets integration tests: - End-to-end credential expansion testing - PHI denylist enforcement in real configs - Environment variable reference resolution - Plaintext secret detection across config boundaries - Secret rotation workflow validation Tests run with real container runtime (Podman/Docker) when available. Provides defense-in-depth beyond unit tests. Part of: security integration testing from security plan	2026-02-23 19:44:07 -05:00
Jeremie Fraeys	90ae9edfff	feat(verification): Custom linting tool (fetchml-vet) for structural invariants Add golang.org/x/tools/go/analysis based linting tool: - fetchml-vet: Custom go vet tool for security invariants Add analyzers for critical security patterns: - noBareDetector: Ensures CreateDetector always captures DetectionInfo (prevents silent metadata loss in GPU detection) - manifestEnv: Validates functions returning Artifacts populate Environment (ensures reproducibility metadata capture) - noInlineCredentials: Detects inline credential patterns in config structs (enforces environment variable references) - hipaaComplete: Validates HIPAA mode configs have all required fields (structural check for compliance completeness) Integration with make lint-custom: - Builds bin/fetchml-vet from tools/fetchml-vet/cmd/fetchml-vet/ - Runs with: go vet -vettool=bin/fetchml-vet ./internal/... Part of: V.4 custom linting from security plan	2026-02-23 19:44:00 -05:00
Jeremie Fraeys	58c1a5fa58	feat(audit): Tamper-evident audit chain verification system Add ChainVerifier for cryptographic audit log verification: - VerifyLogFile(): Validates entire audit chain integrity - Detects tampering at specific event index (FirstTampered) - Returns chain root hash for external verification - GetChainRootHash(): Standalone hash computation - VerifyAndAlert(): Boolean tampering detection with logging Add audit-verifier CLI tool: - Standalone binary for audit chain verification - Takes log path argument and reports tampering Update audit logger for chain integrity: - Each event includes sequence number and hash chain - SHA-256 linking: hash_n = SHA-256(prev_hash \|\| event_n) - Tamper detection through hash chain validation Add comprehensive test coverage: - Empty log handling - Valid chain verification - Tampering detection with modification - Root hash consistency - Alert mechanism tests Part of: V.7 audit verification from security plan	2026-02-23 19:43:50 -05:00
Jeremie Fraeys	4a4d3de8e1	feat(security): Manifest security - nonce generation, environment tracking, schema validation Add cryptographically secure manifest filename nonce generation: - GenerateManifestNonce() creates 16-byte random nonce (32 hex chars) - GenerateManifestFilename() creates unique filenames: run_manifest_<nonce>.json - Prevents enumeration attacks on manifest files Add ExecutionEnvironment struct to manifest: - Captures ConfigHash for reproducibility verification - Records GPU detection method (auto-detected, env override, config, etc.) - Records sandbox settings (NoNewPrivileges, DropAllCaps, NetworkMode) - Records compliance mode and manifest nonce - Records artifact scan exclusions with reason Add JSON Schema validation: - schema.json: Canonical schema for manifest validation - schema_version.go: Schema versioning and compatibility checking - schema_test.go: Drift detection with SHA-256 hash verification - Validates required fields (run_id, environment.config_hash, etc.) - Validates compliance_mode enum values (hipaa, standard) - Validates no negative sizes in artifacts Closes: manifest nonce, environment tracking, scan exclusions from security plan	2026-02-23 19:43:39 -05:00
Jeremie Fraeys	9434f4c8e6	feat(security): Artifact ingestion caps enforcement Add MaxArtifactFiles and MaxArtifactTotalBytes to SandboxConfig: - Default MaxArtifactFiles: 10,000 (configurable via SecurityDefaults) - Default MaxArtifactTotalBytes: 100GB (configurable via SecurityDefaults) - ApplySecurityDefaults() sets defaults if not specified Enforce caps in scanArtifacts() during directory walk: - Returns error immediately when MaxArtifactFiles exceeded - Returns error immediately when MaxArtifactTotalBytes exceeded - Prevents resource exhaustion attacks from malicious artifact trees Update all call sites to pass SandboxConfig for cap enforcement: - Native bridge libs updated to pass caps argument - Benchmark tests updated with nil caps (unlimited for benchmarks) - Unit tests updated with nil caps Closes: artifact ingestion caps items from security plan	2026-02-23 19:43:28 -05:00
Jeremie Fraeys	a8180f1f26	feat(security): HIPAA compliance mode and PHI denylist validation Add compliance_mode field to Config with strict HIPAA validation: - Requires SnapshotStore.Secure=true in HIPAA mode - Requires NetworkMode="none" for tenant isolation - Requires non-empty SeccompProfile - Requires NoNewPrivileges=true - Enforces credentials via environment variables only (no inline YAML) Add PHI denylist validation for AllowedSecrets: - Blocks secrets matching patterns: patient, ssn, mrn, medical_record, diagnosis, dob, birth, mrn_number, patient_id, patient_name - Prevents accidental PHI exfiltration via secret channels Add comprehensive test coverage in hipaa_validation_test.go: - Network mode enforcement tests - NoNewPrivileges requirement tests - Seccomp profile validation tests - Inline credential rejection tests - PHI denylist validation tests Closes: compliance_mode, PHI denylist items from security plan	2026-02-23 19:43:19 -05:00
Jeremie Fraeys	fc2459977c	refactor(worker): update worker tests and native bridge Worker Refactoring: - Update internal/worker/factory.go, worker.go, snapshot_store.go - Update native_bridge.go and native_bridge_nocgo.go for native library integration Test Updates: - Update all worker unit tests for new interfaces - Update chaos tests - Update container/podman_test.go - Add internal/workertest/worker.go for shared test utilities Documentation: - Update native/README.md	2026-02-23 18:04:22 -05:00
Jeremie Fraeys	4b8df60e83	deploy: clean up docker-compose configurations Remove unnecessary service definitions and debug logging configuration from docker-compose files across all deployment environments.	2026-02-23 18:04:09 -05:00
Jeremie Fraeys	305e1b3f2e	ci: update test and benchmark scripts scripts/benchmarks/run-benchmarks-local.sh: - Add support for native library benchmarks scripts/ci/test.sh: - Update CI test commands for new test structure scripts/dev/smoke-test.sh: - Improve smoke test reliability and output	2026-02-23 18:04:01 -05:00
Jeremie Fraeys	be67cb77d3	test(benchmarks): update benchmark tests with job cleanup and improvements Payload Performance Test: - Add job cleanup after each iteration using DeleteJob() - Ensure isolated memory measurements between test runs All Benchmark Tests: - General improvements and maintenance updates	2026-02-23 18:03:54 -05:00
Jeremie Fraeys	54ddab887e	build: update Makefile and Zig build for new targets Makefile: - Add native build targets and test infrastructure - Update benchmark and CI test commands cli/build.zig: - Build configuration updates for CLI compilation	2026-02-23 18:03:47 -05:00
Jeremie Fraeys	a70d8aad8e	refactor: remove dead code and fix unused variables Cleanup: - Delete internal/worker/testutil.go (150 lines of unused test utilities) - Remove unused stateDir() function from internal/jupyter/service_manager.go - Silence unused variable warning in internal/worker/executor/container.go	2026-02-23 18:03:38 -05:00
Jeremie Fraeys	b00439b86e	docs(security): document comprehensive security hardening Updates documentation with new security features and hardening guide: CHANGELOG.md: - Added detailed security hardening section (2026-02-23) - Documents all phases: file ingestion, sandbox, secrets, audit logging, tests - Lists specific files changed and security controls implemented docs/src/security.md: - Added Overview section with defense-in-depth layers - Added Comprehensive Security Hardening section with: - File ingestion security with code examples - Sandbox hardening with complete YAML config - Secrets management with env expansion syntax - HIPAA audit logging with tamper-evident chain hashing	2026-02-23 18:03:25 -05:00
Jeremie Fraeys	fccced6bb3	test(security): add comprehensive security unit tests Adds 13 security tests across 4 files for hardening verification: Path Traversal Tests (path_traversal_test.go): - TestSecurePathValidator_ValidRelativePath - TestSecurePathValidator_PathTraversalBlocked - TestSecurePathValidator_SymlinkEscape - Tests symlink resolution and path boundary enforcement File Type Validation Tests (filetype_test.go): - TestValidateFileType_AllowedTypes - TestValidateFileType_DangerousTypesBlocked - TestValidateModelFile - Tests magic bytes validation and dangerous extension blocking Secrets Management Tests (secrets_test.go): - TestExpandSecrets_BasicExpansion - TestExpandSecrets_NestedAndMissingVars - TestValidateNoPlaintextSecrets_HeuristicDetection - Tests env variable expansion and plaintext secret detection with entropy Audit Logging Tests (audit_test.go): - TestAuditLogger_ChainIntegrity - TestAuditLogger_VerifyChain - TestAuditLogger_LogFileAccess - TestAuditLogger_Disabled - Tests tamper-evident chain hashing and file access logging	2026-02-23 18:00:45 -05:00
Jeremie Fraeys	92aab06d76	feat(security): implement comprehensive security hardening phases 1-5,7 Implements defense-in-depth security for HIPAA and multi-tenant requirements: Phase 1 - File Ingestion Security: - SecurePathValidator with symlink resolution and path boundary enforcement in internal/fileutil/secure.go - Magic bytes validation for ML artifacts (safetensors, GGUF, HDF5, numpy) in internal/fileutil/filetype.go - Dangerous extension blocking (.pt, .pkl, .exe, .sh, .zip) - Upload limits (10GB size, 100MB/s rate, 10 uploads/min) Phase 2 - Sandbox Hardening: - ApplySecurityDefaults() with secure-by-default principle - network_mode: none, read_only_root: true, no_new_privileges: true - drop_all_caps: true, user_ns: true, run_as_uid/gid: 1000 - PodmanSecurityConfig and BuildSecurityArgs() in internal/container/podman.go - BuildPodmanCommand now accepts full security configuration - Container executor passes SandboxConfig to Podman command builder - configs/seccomp/default-hardened.json blocks dangerous syscalls (ptrace, mount, reboot, kexec_load, open_by_handle_at) Phase 3 - Secrets Management: - expandSecrets() for environment variable expansion using ${VAR} syntax - validateNoPlaintextSecrets() with entropy-based detection - Pattern matching for AWS, GitHub, GitLab, OpenAI, Stripe tokens - Shannon entropy calculation (>4 bits/char triggers detection) - Secrets expanded during LoadConfig() before validation Phase 5 - HIPAA Audit Logging: - Tamper-evident chain hashing with SHA-256 in internal/audit/audit.go - Event struct extended with PrevHash, EventHash, SequenceNum - File access event types: EventFileRead, EventFileWrite, EventFileDelete - LogFileAccess() helper for HIPAA compliance - VerifyChain() function for tamper detection Supporting Changes: - Add DeleteJob() and DeleteJobsByPrefix() to storage package - Integrate SecurePathValidator in artifact scanning	2026-02-23 18:00:33 -05:00
Jeremie Fraeys	aed59967b7	fix(make): Reduce profile-ws-queue test count to prevent timeouts Change -count=5 to -count=2 to avoid resource contention 5 sequential runs with 60s timeout each could exceed reasonable time limits	2026-02-23 14:44:34 -05:00
Jeremie Fraeys	ec9e845bb6	fix(test): Fix WebSocketQueue test timeout and race conditions Reduce worker polling interval from 5ms to 1ms for faster task pickup Add 100ms buffer after job submission to allow queue to settle Increase timeout from 30s to 60s to prevent flaky failures Fixes intermittent timeout issues in integration tests	2026-02-23 14:38:18 -05:00
Jeremie Fraeys	551e6d4dbc	fix(make): Create tests/bin directory for CPU profiling output Add @mkdir -p tests/bin to profile-load, profile-load-norate, and profile-ws-queue targets Fixes 'no such file or directory' error when writing CPU profile files	2026-02-23 14:31:08 -05:00
Jeremie Fraeys	7d1ba75092	chore: Update security scan workflow and SQLite build script	2026-02-23 14:24:00 -05:00
Jeremie Fraeys	6d200b5ac2	fix(docker): Use named volume for Redis to fix permission errors Replace bind mount with Docker named volume for Redis data This fixes 'operation not permitted' errors on macOS Docker Desktop where bind mounts fail due to file sharing restrictions	2026-02-23 14:20:23 -05:00
Jeremie Fraeys	0ea2ac00cd	fix(scripts): Create data directories before starting Docker Fix Docker mount permission error by creating data/dev/* directories before docker-compose up, preventing 'operation not permitted' error	2026-02-23 14:17:37 -05:00

1 2 3 4 5 ...

309 commits