Commit graph

343 commits

Author SHA1 Message Date
Jeremie Fraeys
7cd86fb88a
feat: add new API handlers, build scripts, and ADRs
Some checks failed
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 6s
CI Pipeline / Test (ubuntu-latest on self-hosted) (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Contract Tests / Spec Drift Detection (push) Failing after 11s
Contract Tests / API Contract Tests (push) Has been skipped
Deploy API Docs / Build API Documentation (push) Failing after 5s
Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped
Documentation / build-and-publish (push) Failing after 40s
Test Matrix / test-native-vs-pure (cgo) (push) Failing after 14s
Test Matrix / test-native-vs-pure (native) (push) Failing after 35s
Test Matrix / test-native-vs-pure (pure) (push) Failing after 18s
CI Pipeline / Trigger Build Workflow (push) Failing after 1s
Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (arm64) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (x86_64) (push) Has been cancelled
Security Scan / Security Analysis (push) Has been cancelled
Security Scan / Native Library Security (push) Has been cancelled
Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled
Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled
Verification & Maintenance / Verification Summary (push) Has been cancelled
- Introduce audit, plugin, and scheduler API handlers
- Add spec_embed.go for OpenAPI spec embedding
- Create modular build scripts (cli, go, native, cross-platform)
- Add deployment cleanup and health-check utilities
- New ADRs: hot reload, audit store, SSE updates, RBAC, caching, offline mode, KMS regions, tenant offboarding
- Add KMS configuration schema and worker variants
- Include KMS benchmark tests
2026-03-04 13:24:27 -05:00
Jeremie Fraeys
5f53104fcd
test: modernize test suite for streamlined infrastructure
- Update E2E tests for consolidated docker-compose.test.yml
- Remove references to obsolete logs-debug.yml
- Enhance test fixtures and utilities
- Improve integration test coverage for KMS, queue, scheduler
- Update unit tests for config constants and worker execution
- Modernize cleanup-status.sh with new Makefile targets
2026-03-04 13:24:24 -05:00
Jeremie Fraeys
61081655d2
feat: enhance worker execution and scheduler service templates
- Refactor worker configuration management
- Improve container executor lifecycle handling
- Update runloop and worker core logic
- Enhance scheduler service template generation
- Remove obsolete 'scheduler' symlink/directory
2026-03-04 13:24:20 -05:00
Jeremie Fraeys
daf14bfafa
chore: update dependencies and remove obsolete compose files
- Update go.mod and go.sum with latest dependencies
- Remove docker-compose.local.yml and prod.smoke.yml (consolidated)
- Update CI workflow configurations
2026-03-04 13:23:52 -05:00
Jeremie Fraeys
5d75f3576b
docs: comprehensive documentation updates
- Update TEST_COVERAGE_MAP with current requirements
- Refresh ADR-004 with C++ implementation details
- Update architecture, deployment, and security docs
- Improve CLI/TUI UX contract documentation
2026-03-04 13:23:48 -05:00
Jeremie Fraeys
66f262d788
security: improve audit, crypto, and config handling
- Enhance audit checkpoint system
- Update KMS provider and tenant key management
- Refine configuration constants
- Improve TUI config handling
2026-03-04 13:23:42 -05:00
Jeremie Fraeys
a4f2c36069
feat: enhance task domain and scheduler protocol
- Update task domain model
- Improve scheduler hub and priority queue
- Enhance protocol definitions
- Update manifest schema and run handling
2026-03-04 13:23:38 -05:00
Jeremie Fraeys
1f495dfbb7
api: regenerate OpenAPI types and server code
- Update openapi.yaml spec
- Regenerate server_gen.go with oapi-codegen
- Update adapter, routes, and server configuration
2026-03-04 13:23:34 -05:00
Jeremie Fraeys
743bc4be3b
cli: update Zig CLI build and native hash integration
- Update build.zig configuration
- Improve queue command implementation
- Enhance native hash support
2026-03-04 13:23:30 -05:00
Jeremie Fraeys
fd5f2ad672
deploy: remove obsolete deployment files
- Delete Caddyfile.smoke (merged into prod.yml smoke profile)
- Remove deployments/Makefile (use root Makefile instead)
- Remove deploy.sh (use scripts/deploy/deploy.sh)
- Clean up nested configs/worker/ in deployments/
- Update homelab-secure and staging compose files
2026-03-04 13:22:56 -05:00
Jeremie Fraeys
8a7e7695f4
config: consolidate and cleanup configuration files
- Remove redundant config examples (distributed/, standalone/, examples/)
- Delete dev-local.yaml variants (use dev.yaml with env vars)
- Delete prod.yaml (use multi-user.yaml or homelab-secure.yaml)
- Clean up worker configs: remove docker.yaml, homelab-sandbox.yaml
- Update remaining configs with current best practices
- Simplify config schema and documentation
2026-03-04 13:22:52 -05:00
Jeremie Fraeys
7948639b1e
docs: update documentation for streamlined Makefile
- Replace 'make test-full' with 'make test' throughout docs
- Replace 'make self-cleanup' with 'make clean'
- Replace 'make tech-excellence' with 'make complete-suite'
- Replace 'make deploy-up' with 'make dev-up'
- Update docker-compose commands to docker compose v2
- Update CI workflow to use new Makefile targets
2026-03-04 13:22:29 -05:00
Jeremie Fraeys
e08feae6ab
chore: migrate scripts from docker-compose v1 to v2
- Update all scripts to use 'docker compose' instead of 'docker-compose'
- Fix compose file paths after consolidation (test.yml, prod.yml)
- Update cleanup.sh to handle --profile debug and --profile smoke
- Update test fixtures to reference consolidated compose files
2026-03-04 13:22:26 -05:00
Jeremie Fraeys
537faa6825
build: streamline Makefile from 1000+ to ~730 lines
- Replace bloated 1016-line Makefile with focused 730-line version
- Add DC := docker compose variable for v2 compatibility
- Consolidate 5 install-* targets into single install-tools
- Remove redundant targets: self-cleanup, test-full, deploy-up/down/status
- Add missing implementations for all 85 declared .PHONY targets
- Include native lib variants, verification, security, OpenAPI, docs, perf targets
2026-03-04 13:22:21 -05:00
Jeremie Fraeys
98a0d42213
deploy: consolidate docker-compose files using profiles
- Merge logs-debug.yml into test.yml with 'debug' profile
- Merge local.yml into dev.yml with 'local' profile
- Merge prod.smoke.yml into prod.yml with 'smoke' profile
- Reduces compose files from 8 to 5, simplifies maintenance
- Update TEST_COMPOSE to use deployments/docker-compose.test.yml
2026-03-04 13:22:17 -05:00
Jeremie Fraeys
16343e6c2a
test(kms): add comprehensive unit and integration tests
Unit tests for DEK cache:
- Put/Get operations, TTL expiry, LRU eviction
- Tenant isolation, flush/clear, stats, empty DEK rejection

Unit tests for KMS protocol:
- Encrypt/decrypt round-trip with MemoryProvider
- Multi-tenant isolation (wrong key fails MAC verification)
- Cache hit verification, key rotation flow
- Health check protocol

Integration tests with testcontainers:
- VaultProvider with hashicorp/vault:1.15 container
- AWSProvider with localstack/localstack container
- TenantKeyManager end-to-end with MemoryProvider
2026-03-03 19:14:31 -05:00
Jeremie Fraeys
e1ec255ad2
refactor(crypto): integrate KMS with TenantKeyManager
Replace in-memory root keys with KMS interface:
- GenerateDataEncryptionKey: generate DEK, wrap via KMS, cache
- UnwrapDataEncryptionKey: cache check, KMS decrypt, cache store
- EncryptArtifact/DecryptArtifact: use DEK from KMS
- RotateTenantKey: create new KMS key, flush cache
- RevokeTenant: disable KMS key, schedule deletion per ADR-015

Remove deprecated methods: wrapKey, unwrapKey (replaced by KMS)
2026-03-03 19:14:27 -05:00
Jeremie Fraeys
7c03c8b5bd
feat(kms): add HashiCorp Vault and AWS KMS providers
Implement VaultProvider with Transit engine:
- AppRole, Kubernetes, and Token authentication
- Encrypt/Decrypt via /transit/encrypt and /transit/decrypt
- Key lifecycle via /transit/keys API
- Health check via /sys/health

Implement AWSProvider with SDK v2:
- Per-region key naming with alias prefix
- Encrypt/Decrypt via KMS SDK
- Key lifecycle (CreateKey, Disable, ScheduleDeletion, Enable)
- AWS endpoint support for LocalStack testing
2026-03-03 19:14:21 -05:00
Jeremie Fraeys
cb25677695
feat(kms): implement core KMS infrastructure with DEK cache
Add KMSProvider interface for external key management systems:
- Encrypt/Decrypt operations for DEK wrapping
- Key lifecycle management (Create, Disable, ScheduleDeletion, Enable)
- HealthCheck and Close methods

Implement MemoryProvider for development/testing:
- XOR encryption with HMAC-SHA256 authentication
- Secure random key generation using crypto/rand
- MAC verification to detect wrong keys

Implement DEKCache per ADR-012:
- 15-minute TTL with configurable grace window (1 hour)
- LRU eviction with 1000 entry limit
- Cache key includes (tenantID, artifactID, kmsKeyID) for isolation
- Thread-safe operations with RWMutex
- Secure memory wiping on eviction/cleanup

Add config package with types:
- ProviderType enum (vault, aws, memory)
- VaultConfig with AppRole/Kubernetes/Token auth
- AWSConfig with region and alias prefix
- CacheConfig with TTL, MaxEntries, GraceWindow
- Validation methods for all config types
2026-03-03 19:13:55 -05:00
Jeremie Fraeys
da104367d6
feat: add Plugin GPU Quota implementation and tests
Some checks failed
Build Pipeline / Build Binaries (push) Failing after 1m59s
Build Pipeline / Build Docker Images (push) Has been skipped
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 5s
CI Pipeline / Test (ubuntu-latest on self-hosted) (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Documentation / build-and-publish (push) Failing after 35s
CI Pipeline / Trigger Build Workflow (push) Failing after 0s
Security Scan / Security Analysis (push) Has been cancelled
Security Scan / Native Library Security (push) Has been cancelled
Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled
Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled
Verification & Maintenance / Verification Summary (push) Has been cancelled
- Add plugin_quota.go with GPU quota management for scheduler

- Update scheduler hub and protocol for plugin support

- Add comprehensive plugin quota unit tests

- Update gang service and WebSocket queue integration tests
2026-02-26 14:35:05 -05:00
Jeremie Fraeys
ef05f200ba
build: add Scheduler & Services section to Makefile help
- Add new help section for scheduler and service targets

- Document dev-up, dev-down, prod-up, prod-down targets
2026-02-26 14:34:58 -05:00
Jeremie Fraeys
a653a2d0ed
ci: add plugin, quota, and scheduler tests to workflows
- Add plugin quota, service templates, scheduler tests to ci.yml

- Add vLLM plugin and audit logging test steps

- Add plugin configuration validation to security-modes-test.yml:

  - Verify HIPAA mode disables plugins

  - Verify standard mode enables plugins with security

  - Verify dev mode enables plugins with relaxed security
2026-02-26 14:34:49 -05:00
Jeremie Fraeys
b3a0c78903
config: add Plugin GPU Quota, plugins, and audit logging to configs
- Add Plugin GPU Quota config section to scheduler.yaml.example

- Add audit logging config to homelab-secure.yaml (HIPAA-compliant)

- Add Jupyter and vLLM plugin configs to all worker configs:

  - Security settings (passwords, trusted channels, blocked packages)

  - Resource limits (GPU, memory, CPU)

  - Model cache paths and quantization options for vLLM

- Disable plugins in HIPAA deployment mode for compliance

- Update deployments README with plugin services and GPU quotas
2026-02-26 14:34:42 -05:00
Jeremie Fraeys
90ea18555c
docs: add vLLM workflow and cross-link documentation
Some checks failed
Security Scan / Security Analysis (push) Waiting to run
Security Scan / Native Library Security (push) Waiting to run
Verification & Maintenance / V.1 - Schema Drift Detection (push) Waiting to run
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Waiting to run
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Waiting to run
Verification & Maintenance / V.6 - Extended Security Scanning (push) Waiting to run
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Waiting to run
Verification & Maintenance / Verification Summary (push) Blocked by required conditions
Build Pipeline / Build Binaries (push) Failing after 2m4s
Build Pipeline / Build Docker Images (push) Has been skipped
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 5s
CI Pipeline / Test (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Contract Tests / Spec Drift Detection (push) Failing after 16s
Contract Tests / API Contract Tests (push) Has been skipped
Deploy API Docs / Build API Documentation (push) Failing after 5s
Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped
Documentation / build-and-publish (push) Failing after 44s
CI Pipeline / Trigger Build Workflow (push) Failing after 0s
- Add new vLLM workflow documentation (vllm-workflow.md)
- Update scheduler-architecture.md with Plugin GPU Quota and audit logging
- Add See Also sections to jupyter-workflow.md, quick-start.md,
  configuration-reference.md for better navigation
- Update landing page and index with vLLM and scheduler links
- Cross-link all documentation for improved discoverability
2026-02-26 13:04:39 -05:00
Jeremie Fraeys
8f2495deb0
chore(cleanup): remove obsolete files and update .gitignore
Remove deprecated components replaced by new scheduler:
- Delete internal/controller/pacing_controller.go (replaced by scheduler/pacing.go)
- Delete internal/manifest/schema_test.go (consolidated into tests/unit/)
- Delete internal/workertest/worker.go (consolidated into tests/fixtures/)
- Update .gitignore with scheduler binary and new patterns
2026-02-26 12:09:18 -05:00
Jeremie Fraeys
dddc2913e1
chore(tools): update scripts, native libs, and documentation
Update tooling and documentation:
- Smoke test script with scheduler health checks
- Release cleanup script
- Native test scripts with Redis integration
- TUI SSH test script
- Performance regression detector with scheduler metrics
- Profiler with distributed tracing
- Native CMake with test targets
- Dataset hash tests
- Storage symlink resistance tests
- Configuration reference documentation updates
2026-02-26 12:08:58 -05:00
Jeremie Fraeys
d87c556afa
test(all): update test suite for scheduler and security features
Update comprehensive test coverage:
- E2E tests with scheduler integration
- Integration tests with tenant isolation
- Unit tests with security assertions
- Security tests with audit validation
- Audit verification tests
- Auth tests with tenant scoping
- Config validation tests
- Container security tests
- Worker tests with scheduler mock
- Environment pool tests
- Load tests with distributed patterns
- Test fixtures with scheduler support
- Update go.mod/go.sum with new dependencies
2026-02-26 12:08:46 -05:00
Jeremie Fraeys
c459285cab
chore(deploy): update deployment configs and TUI for scheduler
Update deployment and CLI tooling:
- TUI models (jobs, state) with scheduler data
- TUI store with scheduler endpoints
- TUI config with scheduler settings
- Deployment Makefile with scheduler targets
- Deploy script with scheduler registration
- Docker Compose files with scheduler services
- Remove obsolete Dockerfiles (api-server, full-prod, test)
- Update remaining Dockerfiles with scheduler integration
2026-02-26 12:08:31 -05:00
Jeremie Fraeys
4cdb68907e
refactor(utilities): update supporting modules for scheduler integration
Update utility modules:
- File utilities with secure file operations
- Environment pool with resource tracking
- Error types with scheduler error categories
- Logging with audit context support
- Network/SSH with connection pooling
- Privacy/PII handling with tenant boundaries
- Resource manager with scheduler allocation
- Security monitor with audit integration
- Tracking plugins (MLflow, TensorBoard) with auth
- Crypto signing with tenant keys
- Database init with multi-user support
2026-02-26 12:07:15 -05:00
Jeremie Fraeys
6866ba9366
refactor(queue): integrate scheduler backend and storage improvements
Update queue and storage systems for scheduler integration:
- Queue backend with scheduler coordination
- Filesystem queue with batch operations
- Deduplication with tenant-aware keys
- Storage layer with audit logging hooks
- Domain models (Task, Events, Errors) with scheduler fields
- Database layer with tenant isolation
- Dataset storage with integrity checks
2026-02-26 12:06:46 -05:00
Jeremie Fraeys
6b2c377680
refactor(jupyter): enhance security and scheduler integration
Update Jupyter integration for security and scheduler support:
- Enhanced security configuration with audit logging
- Health monitoring with scheduler event integration
- Package manager with network policy enforcement
- Service manager with lifecycle hooks
- Network manager with tenant isolation
- Workspace metadata with tenant tags
- Config with resource limits
- Podman container integration improvements
- Experiment manager with tracking integration
- Manifest runner with security checks
2026-02-26 12:06:35 -05:00
Jeremie Fraeys
3fb6902fa1
feat(worker): integrate scheduler endpoints and security hardening
Update worker system for scheduler integration:
- Worker server with scheduler registration
- Configuration with scheduler endpoint support
- Artifact handling with integrity verification
- Container executor with supply chain validation
- Local executor enhancements
- GPU detection improvements (cross-platform)
- Error handling with execution context
- Factory pattern for executor instantiation
- Hash integrity with native library support
2026-02-26 12:06:16 -05:00
Jeremie Fraeys
ef11d88a75
refactor(auth): add tenant scoping and permission enhancements
Update authentication system for multi-tenant support:
- API key management with tenant scoping
- Permission checks for multi-tenant operations
- Database layer with tenant isolation
- Keychain integration with audit logging
2026-02-26 12:06:08 -05:00
Jeremie Fraeys
420de879ff
feat(api): integrate scheduler protocol and WebSocket enhancements
Update API layer for scheduler integration:
- WebSocket handlers with scheduler protocol support
- Jobs WebSocket endpoint with priority queue integration
- Validation middleware for scheduler messages
- Server configuration with security hardening
- Protocol definitions for worker-scheduler communication
- Dataset handlers with tenant isolation checks
- Response helpers with audit context
- OpenAPI spec updates for new endpoints
2026-02-26 12:05:57 -05:00
Jeremie Fraeys
9b2d5986a3
docs(architecture): add technical documentation for scheduler and security
Add comprehensive architecture documentation:
- scheduler-architecture.md - Design of distributed job scheduler
  - Hub coordination model
  - Gang scheduling algorithm
  - Service discovery mechanisms
  - Failure recovery strategies

- multi-tenant-security.md - Security isolation patterns
  - Tenant boundary enforcement
  - Resource quota management
  - Cross-tenant data protection

- runtime-security.md - Operational security guidelines
  - Container security configurations
  - Network policy enforcement
  - Audit logging requirements
2026-02-26 12:04:33 -05:00
Jeremie Fraeys
685f79c4a7
ci(deploy): add Forgejo workflows and deployment automation
Add CI/CD pipelines for Forgejo/GitHub Actions:
- build.yml - Main build pipeline with matrix builds
- deploy-staging.yml - Automated staging deployment
- deploy-prod.yml - Production deployment with rollback support
- security-modes-test.yml - Security mode validation tests

Add deployment artifacts:
- docker-compose.staging.yml for staging environment
- ROLLBACK.md with rollback procedures and playbooks

Supports multi-environment deployment workflow with proper
gates between staging and production.
2026-02-26 12:04:23 -05:00
Jeremie Fraeys
86f9ae5a7e
docs(config): reorganize configuration structure and add documentation
Restructure configuration files for better organization:
- Add scheduler configuration examples (scheduler.yaml.example)
- Reorganize worker configs into subdirectories:
  - distributed/ - Multi-node cluster configurations
  - standalone/ - Single-node deployment configs
- Add environment-specific configs:
  - dev-local.yaml, docker-dev.yaml, docker-prod.yaml
  - homelab-secure.yaml, worker-prod.toml
- Add deployment configs for different security modes:
  - docker-standard.yaml, docker-hipaa.yaml, docker-dev.yaml

Add documentation:
- configs/README.md with configuration guidelines
- configs/SECURITY.md with security configuration best practices
2026-02-26 12:04:11 -05:00
Jeremie Fraeys
95adcba437
feat(worker): add Jupyter/vLLM plugins and process isolation
Extend worker capabilities with new execution plugins and security features:
- Jupyter plugin for notebook-based ML experiments
- vLLM plugin for LLM inference workloads
- Cross-platform process isolation (Unix/Windows)
- Network policy enforcement with platform-specific implementations
- Service manager integration for lifecycle management
- Scheduler backend integration for queue coordination

Update lifecycle management:
- Enhanced runloop with state transitions
- Service manager integration for plugin coordination
- Improved state persistence and recovery

Add test coverage:
- Unit tests for Jupyter and vLLM plugins
- Updated worker execution tests
2026-02-26 12:03:59 -05:00
Jeremie Fraeys
a981e89005
feat(security): add audit subsystem and tenant isolation
Implement comprehensive audit and security infrastructure:
- Immutable audit logs with platform-specific backends (Linux/Other)
- Sealed log entries with tamper-evident checksums
- Audit alert system for real-time security notifications
- Log rotation with retention policies
- Checkpoint-based audit verification

Add multi-tenant security features:
- Tenant manager with quota enforcement
- Middleware for tenant authentication/authorization
- Per-tenant cryptographic key isolation
- Supply chain security for container verification
- Cross-platform secure file utilities (Unix/Windows)

Add test coverage:
- Unit tests for audit alerts and sealed logs
- Platform-specific audit backend tests
2026-02-26 12:03:45 -05:00
Jeremie Fraeys
43e6446587
feat(scheduler): implement multi-tenant job scheduler with gang scheduling
Add new scheduler component for distributed ML workload orchestration:
- Hub-based coordination for multi-worker clusters
- Pacing controller for rate limiting job submissions
- Priority queue with preemption support
- Port allocator for dynamic service discovery
- Protocol handlers for worker-scheduler communication
- Service manager with OS-specific implementations
- Connection management and state persistence
- Template system for service deployment

Includes comprehensive test suite:
- Unit tests for all core components
- Integration tests for distributed scenarios
- Benchmark tests for performance validation
- Mock fixtures for isolated testing

Refs: scheduler-architecture.md
2026-02-26 12:03:23 -05:00
Jeremie Fraeys
6e0e7d9d2e
fix(smoke-test): copy promtail config file instead of directory
Some checks failed
Checkout test / test (push) Successful in 5s
CI/CD Pipeline / Test (push) Failing after 1s
CI/CD Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI/CD Pipeline / Build (push) Has been skipped
CI/CD Pipeline / Test Scripts (push) Has been skipped
CI/CD Pipeline / Test Native Libraries (push) Has been skipped
CI/CD Pipeline / GPU Golden Test Matrix (push) Has been skipped
Documentation / build-and-publish (push) Failing after 38s
CI/CD Pipeline / Docker Build (push) Has been skipped
Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (arm64) (push) Has been cancelled
Build CLI with Embedded SQLite / build-macos (x86_64) (push) Has been cancelled
Security Scan / Security Analysis (push) Has been cancelled
Security Scan / Native Library Security (push) Has been cancelled
Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled
Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled
Verification & Maintenance / Verification Summary (push) Has been cancelled
Copy just promtail-config.yml to temp root instead of entire monitoring/
directory. This fixes the mount error where promtail couldn't find its
config at the expected path.
2026-02-24 11:57:35 -05:00
Jeremie Fraeys
bcc432a524
fix(deployments): use relative paths instead of FETCHML_REPO_ROOT with wrong fallback
Replace all .. with proper relative paths:

- Build context: Use '.' (current directory = project root when using --project-directory)
- Volume mounts: Use './data/...' instead of '../data/...'
- Config mounts: Use './configs/...' instead of '../configs/...'

The '..' fallback was incorrect - when --project-directory is set to repo root,
'..' would point to parent of repo instead of repo itself. Using '.' or
'./path' correctly resolves relative to project root.

Environment variables for data directories (SMOKE_TEST_DATA_DIR, PROD_DATA_DIR,
HOMELAB_DATA_DIR, LOCAL_DATA_DIR) are preserved for runtime customization.
2026-02-24 11:53:19 -05:00
Jeremie Fraeys
cebcb6115f
fix(smoke-test): add FETCHML_REPO_ROOT to env file
Ensure FETCHML_REPO_ROOT is set in the env file passed to docker-compose.
This fixes path resolution so fallback paths don't incorrectly use parent directory.
2026-02-24 11:48:10 -05:00
Jeremie Fraeys
3ff5ef320a
fix(deployments): add HOMELAB_DATA_DIR support to homelab-secure
Update docker-compose.homelab-secure.yml to use HOMELAB_DATA_DIR
environment variable with fallback to data/homelab for all volume mounts.
2026-02-24 11:43:38 -05:00
Jeremie Fraeys
5691b06876
fix(deployments): add env var support for data directories
Update all docker-compose files to use environment variables for data paths:

- docker-compose.local.yml: Use LOCAL_DATA_DIR with fallback to ../data/dev
- docker-compose.prod.yml: Use PROD_DATA_DIR with fallback to data/prod
- docker-compose.prod.smoke.yml: Use SMOKE_TEST_DATA_DIR with fallback

This allows smoke tests and local development to use temp directories
instead of repo-relative paths, avoiding file sharing permission issues
on macOS with Docker Desktop or Colima.
2026-02-24 11:43:11 -05:00
Jeremie Fraeys
ce4106a837
fix(smoke-test): copy monitoring configs to temp directory
Promtail mounts monitoring configs from repo root which fails in Colima:

- Copy monitoring/ directory to temp SMOKE_TEST_DATA_DIR
- Update promtail volume path to use SMOKE_TEST_DATA_DIR for configs
- This ensures all mounts are from accessible temp directories
2026-02-24 11:40:32 -05:00
Jeremie Fraeys
225ef5bfb5
fix(smoke-test): use actual env file instead of process substitution
Process substitution <(echo ...) doesn't work with docker-compose.
Write the env file to an actual temp file instead.
2026-02-24 11:38:18 -05:00
Jeremie Fraeys
bff2336db2
fix(smoke-test): use temp directory for smoke test data
Use /tmp for smoke test data to avoid file sharing issues on macOS/Colima:

- smoke-test.sh: Create temp dir with mktemp, export SMOKE_TEST_DATA_DIR
- docker-compose.dev.yml: Use SMOKE_TEST_DATA_DIR with fallback to data/dev
- Remove file sharing permission checks (no longer needed with tmp)

This avoids Docker Desktop/Colima file sharing permission issues entirely
by using a system temp directory that's always accessible.
2026-02-24 11:37:45 -05:00
Jeremie Fraeys
d3a861063f
fix(smoke-test): add Colima-specific file sharing instructions
Detect if user is running Colima and provide appropriate fix instructions:

- Check for colima command presence
- If Colima detected: suggest virtiofs/sshfs mount options
- Show colima.yaml mount configuration example
- Include verification command: colima ssh -- ls ...

Maintains Docker Desktop instructions for non-Colima users.
2026-02-24 11:35:58 -05:00
Jeremie Fraeys
00f938861c
fix(smoke-test): add Docker file sharing permission check for macOS
Add pre-flight check to detect Docker Desktop file sharing issues:

- After creating data directories, verify Docker can access them
- If access fails, print helpful error message with fix instructions
- Directs users to Docker Desktop Settings -> Resources -> File sharing

Prevents confusing 'operation not permitted' errors during smoke tests.
2026-02-24 11:35:23 -05:00