ML Experiment Manager - Deployment Guide¶
Overview¶
The ML Experiment Manager supports multiple deployment methods from local development to homelab Docker setups.
Quick Start¶
Docker Compose (Recommended for Development)¶
# Clone repository
git clone https://github.com/your-org/fetch_ml.git
cd fetch_ml
# Start all services
docker-compose up -d (testing only)
# Check status
docker-compose ps
# View logs
docker-compose logs -f api-server
Access the API at http://localhost:9100
Deployment Options¶
1. Local Development¶
Prerequisites¶
Container Runtimes: - Docker Compose: For testing and development only - Podman: For production experiment execution - Go 1.25+ - Zig 0.15.2 - Redis 7+ - Docker & Docker Compose (optional)
Manual Setup¶
# Start Redis
redis-server
# Build and run Go server
go build -o bin/api-server ./cmd/api-server
./bin/api-server -config configs/config-local.yaml
# Build Zig CLI
cd cli
zig build prod
./zig-out/bin/ml --help
2. Docker Deployment¶
Build Image¶
docker build -t ml-experiment-manager:latest .
Run Container¶
docker run -d \
--name ml-api \
-p 9100:9100 \
-p 9101:9101 \
-v $(pwd)/configs:/app/configs:ro \
-v experiment-data:/data/ml-experiments \
ml-experiment-manager:latest
Docker Compose¶
# Production mode
docker-compose -f docker-compose.yml up -d
# Development mode with logs
docker-compose -f docker-compose.yml up
3. Homelab Setup¶
# Use the simple setup script
./setup.sh
# Or manually with Docker Compose
docker-compose up -d (testing only)
4. Cloud Deployment¶
AWS ECS¶
# Build and push to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REGISTRY
docker build -t $ECR_REGISTRY/ml-experiment-manager:latest .
docker push $ECR_REGISTRY/ml-experiment-manager:latest
# Deploy with ECS CLI
ecs-cli compose --project-name ml-experiment-manager up
Google Cloud Run¶
# Build and push
gcloud builds submit --tag gcr.io/$PROJECT_ID/ml-experiment-manager
# Deploy
gcloud run deploy ml-experiment-manager \
--image gcr.io/$PROJECT_ID/ml-experiment-manager \
--platform managed \
--region us-central1 \
--allow-unauthenticated
Configuration¶
Environment Variables¶
# configs/config-local.yaml
base_path: "/data/ml-experiments"
auth:
enabled: true
api_keys:
- "your-production-api-key"
server:
address: ":9100"
tls:
enabled: true
cert_file: "/app/ssl/cert.pem"
key_file: "/app/ssl/key.pem"
Docker Compose Environment¶
# docker-compose.yml
version: '3.8'
services:
api-server:
environment:
- REDIS_URL=redis://redis:6379
- LOG_LEVEL=info
volumes:
- ./configs:/configs:ro
- ./data:/data/experiments
Monitoring & Logging¶
Health Checks¶
- HTTP:
GET /health - WebSocket: Connection test
- Redis: Ping check
Metrics¶
- Prometheus metrics at
/metrics - Custom application metrics
- Container resource usage
Logging¶
- Structured JSON logging
- Log levels: DEBUG, INFO, WARN, ERROR
- Centralized logging via ELK stack
Security¶
TLS Configuration¶
# Generate self-signed cert (development)
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes
# Production - use Let's Encrypt
certbot certonly --standalone -d ml-experiments.example.com
Network Security¶
- Firewall rules (ports 9100, 9101, 6379)
- VPN access for internal services
- API key authentication
- Rate limiting
Performance Tuning¶
Resource Allocation¶
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
Scaling Strategies¶
- Horizontal pod autoscaling
- Redis clustering
- Load balancing
- CDN for static assets
Backup & Recovery¶
Data Backup¶
# Backup experiment data
docker-compose exec redis redis-cli BGSAVE
docker cp $(docker-compose ps -q redis):/data/dump.rdb ./redis-backup.rdb
# Backup data volume
docker run --rm -v ml-experiments_redis_data:/data -v $(pwd):/backup alpine tar czf /backup/redis-backup.tar.gz -C /data .
Disaster Recovery¶
- Restore Redis data
- Restart services
- Verify experiment metadata
- Test API endpoints
Troubleshooting¶
Common Issues¶
API Server Not Starting¶
# Check logs
docker-compose logs api-server
# Check configuration
cat configs/config-local.yaml
# Check Redis connection
docker-compose exec redis redis-cli ping
WebSocket Connection Issues¶
# Test WebSocket
wscat -c ws://localhost:9100/ws
# Check TLS
openssl s_client -connect localhost:9101 -servername localhost
Performance Issues¶
# Check resource usage
docker-compose exec api-server ps aux
# Check Redis memory
docker-compose exec redis redis-cli info memory
Debug Mode¶
# Enable debug logging
export LOG_LEVEL=debug
./bin/api-server -config configs/config-local.yaml
CI/CD Integration¶
GitHub Actions¶
- Automated testing on PR
- Multi-platform builds
- Security scanning
- Automatic releases
Deployment Pipeline¶
- Code commit → GitHub
- CI/CD pipeline triggers
- Build and test
- Security scan
- Deploy to staging
- Run integration tests
- Deploy to production
- Post-deployment verification
Support¶
For deployment issues: 1. Check this guide 2. Review logs 3. Check GitHub Issues 4. Contact maintainers