Commit graph

14 commits

Author SHA1 Message Date
Jeremie Fraeys
b3a0c78903
config: add Plugin GPU Quota, plugins, and audit logging to configs
- Add Plugin GPU Quota config section to scheduler.yaml.example

- Add audit logging config to homelab-secure.yaml (HIPAA-compliant)

- Add Jupyter and vLLM plugin configs to all worker configs:

  - Security settings (passwords, trusted channels, blocked packages)

  - Resource limits (GPU, memory, CPU)

  - Model cache paths and quantization options for vLLM

- Disable plugins in HIPAA deployment mode for compliance

- Update deployments README with plugin services and GPU quotas
2026-02-26 14:34:42 -05:00
Jeremie Fraeys
86f9ae5a7e
docs(config): reorganize configuration structure and add documentation
Restructure configuration files for better organization:
- Add scheduler configuration examples (scheduler.yaml.example)
- Reorganize worker configs into subdirectories:
  - distributed/ - Multi-node cluster configurations
  - standalone/ - Single-node deployment configs
- Add environment-specific configs:
  - dev-local.yaml, docker-dev.yaml, docker-prod.yaml
  - homelab-secure.yaml, worker-prod.toml
- Add deployment configs for different security modes:
  - docker-standard.yaml, docker-hipaa.yaml, docker-dev.yaml

Add documentation:
- configs/README.md with configuration guidelines
- configs/SECURITY.md with security configuration best practices
2026-02-26 12:04:11 -05:00
Jeremie Fraeys
92aab06d76
feat(security): implement comprehensive security hardening phases 1-5,7
Implements defense-in-depth security for HIPAA and multi-tenant requirements:

**Phase 1 - File Ingestion Security:**
- SecurePathValidator with symlink resolution and path boundary enforcement
  in internal/fileutil/secure.go
- Magic bytes validation for ML artifacts (safetensors, GGUF, HDF5, numpy)
  in internal/fileutil/filetype.go
- Dangerous extension blocking (.pt, .pkl, .exe, .sh, .zip)
- Upload limits (10GB size, 100MB/s rate, 10 uploads/min)

**Phase 2 - Sandbox Hardening:**
- ApplySecurityDefaults() with secure-by-default principle
  - network_mode: none, read_only_root: true, no_new_privileges: true
  - drop_all_caps: true, user_ns: true, run_as_uid/gid: 1000
- PodmanSecurityConfig and BuildSecurityArgs() in internal/container/podman.go
- BuildPodmanCommand now accepts full security configuration
- Container executor passes SandboxConfig to Podman command builder
- configs/seccomp/default-hardened.json blocks dangerous syscalls
  (ptrace, mount, reboot, kexec_load, open_by_handle_at)

**Phase 3 - Secrets Management:**
- expandSecrets() for environment variable expansion using ${VAR} syntax
- validateNoPlaintextSecrets() with entropy-based detection
- Pattern matching for AWS, GitHub, GitLab, OpenAI, Stripe tokens
- Shannon entropy calculation (>4 bits/char triggers detection)
- Secrets expanded during LoadConfig() before validation

**Phase 5 - HIPAA Audit Logging:**
- Tamper-evident chain hashing with SHA-256 in internal/audit/audit.go
- Event struct extended with PrevHash, EventHash, SequenceNum
- File access event types: EventFileRead, EventFileWrite, EventFileDelete
- LogFileAccess() helper for HIPAA compliance
- VerifyChain() function for tamper detection

**Supporting Changes:**
- Add DeleteJob() and DeleteJobsByPrefix() to storage package
- Integrate SecurePathValidator in artifact scanning
2026-02-23 18:00:33 -05:00
Jeremie Fraeys
6028779239
feat: update CLI, TUI, and security documentation
- Add safety checks to Zig build
- Add TUI with job management and narrative views
- Add WebSocket support and export services
- Add smart configuration defaults
- Update API routes with security headers
- Update SECURITY.md with comprehensive policy
- Add Makefile security scanning targets
2026-02-19 15:35:05 -05:00
Jeremie Fraeys
4756348c48
feat: Worker sandboxing and security configuration
Add security hardening features for worker execution:
- Worker config with sandboxing options (network_mode, read_only, secrets)
- Execution setup with security context propagation
- Podman container runtime security enhancements
- Security configuration management in config package
- Add homelab-sandbox.yaml example configuration

Supports running jobs in isolated, restricted environments.
2026-02-18 21:27:59 -05:00
Jeremie Fraeys
8b4e1753d1
chore: update configurations and deployment files
- Add Redis secure configuration
- Update worker configurations for homelab and Docker
- Add Forgejo workflow configurations
- Update docker-compose files with improved networking
- Add Caddy configurations for different environments
2026-02-16 20:38:19 -05:00
Jeremie Fraeys
2209ae24c6
chore(config): update configurations and deployment scripts
- Update API server and worker config schemas
- Refine Docker Compose configurations (dev/prod)
- Update deployment scripts and documentation
2026-02-12 12:05:37 -05:00
Jeremie Fraeys
f726806770 chore(ops): reorganize deployments/monitoring and remove legacy scripts 2026-01-05 12:31:26 -05:00
Jeremie Fraeys
cd5640ebd2 Slim and secure: move scripts, clean configs, remove secrets
- Move ci-test.sh and setup.sh to scripts/
- Trim docs/src/zig-cli.md to current structure
- Replace hardcoded secrets with placeholders in configs
- Update .gitignore to block .env*, secrets/, keys, build artifacts
- Slim README.md to reflect current CLI/TUI split
- Add cleanup trap to ci-test.sh
- Ensure no secrets are committed
2025-12-07 13:57:51 -05:00
Jeremie Fraeys
83ba2f3415 Fix multi-user authentication and WebSocket issues
- Fix CLI WebSocket port (9101 vs 9103) in both status and authenticateUser
- Add researcher_user and analyst_user to server config with proper permissions
- Fix API key hashes for all users (complete 64-char SHA256)
- Enable IP whitelist with localhost and private network ranges
- Fix memory leaks in WebSocket handshake (proper key cleanup)
- Fix binary character display in server responses
- All authentication tests now pass: admin, researcher, analyst

Status: Multi-user authentication fully functional
2025-12-06 13:38:08 -05:00
Jeremie Fraeys
7125dc3ab8 Partially fix API server and CLI connection
- Add Redis configuration to local config
- Fix API key format (api_keys vs apikeys)
- Update CLI to use port 9101
- Disable IP whitelist for testing
- Server now connects to Redis and authenticates
- WebSocket connection reaches server but handshake fails
- CLI needs WebSocket protocol implementation fix

Status: Server running, auth working, WebSocket handshake needs debugging
2025-12-06 13:19:07 -05:00
Jeremie Fraeys
5a19358d00 Organize configs and scripts, create testing protocol
- Reorganize configs into environments/, workers/, deprecated/ folders
- Reorganize scripts into testing/, deployment/, maintenance/, benchmarks/ folders
- Add comprehensive testing guide documentation
- Add new Makefile targets: test-full, test-auth, test-status
- Update script paths in Makefile to match new organization
- Create testing protocol documentation
- Add cleanup status checking functionality

Testing framework now includes:
- Quick authentication tests (make test-auth)
- Full test suite runner (make test-full)
- Cleanup status monitoring (make test-status)
- Comprehensive documentation and troubleshooting guides
2025-12-06 13:08:15 -05:00
Jeremie Fraeys
ea15af1833 Fix multi-user authentication and clean up debug code
- Fix YAML tags in auth config struct (json -> yaml)
- Update CLI configs to use pre-hashed API keys
- Remove double hashing in WebSocket client
- Fix port mapping (9102 -> 9103) in CLI commands
- Update permission keys to use jobs:read, jobs:create, etc.
- Clean up all debug logging from CLI and server
- All user roles now authenticate correctly:
  * Admin: Can queue jobs and see all jobs
  * Researcher: Can queue jobs and see own jobs
  * Analyst: Can see status (read-only access)

Multi-user authentication is now fully functional.
2025-12-06 12:35:32 -05:00
Jeremie Fraeys
3de1e6e9ab feat: add comprehensive configuration and deployment infrastructure
- Add development and production configuration templates
- Include Docker build files for containerized deployment
- Add Nginx configuration with SSL/TLS setup
- Include environment configuration examples
- Add SSL certificate setup and management
- Configure application schemas and validation
- Support for both local and production deployment scenarios

Provides flexible deployment options from development to production
with proper security, monitoring, and configuration management.
2025-12-04 16:54:02 -05:00