# Quick Start Get Fetch ML running in minutes with Docker Compose and integrated monitoring. ## Prerequisites **Container Runtimes:** - **Docker Compose**: For testing and development only - **Podman**: For production experiment execution **Requirements:** - Go 1.25+ - Zig 0.15+ - Docker Compose (testing only) - 4GB+ RAM - 2GB+ disk space - Git ## One-Command Setup ```bash # Clone and start git clone https://github.com/jfraeys/fetch_ml.git cd fetch_ml make dev-up # Wait for services (30 seconds) sleep 30 # Verify setup curl http://localhost:8080/health ``` Note: the development compose runs the API server over HTTP/WS for CLI compatibility. For HTTPS/WSS, terminate TLS at a reverse proxy. **Access Services:** - **API Server (via Caddy)**: http://localhost:8080 - **API Server (via Caddy + internal TLS)**: https://localhost:8443 - **Grafana**: http://localhost:3000 (admin/admin123) - **Prometheus**: http://localhost:9090 - **Loki**: http://localhost:3100 ## Development Setup ### Build Components ```bash # Build all components make build # Development build make dev ``` ### Start Services ```bash # Start development stack with monitoring make dev-up # Check status make dev-status # Stop services make dev-down ``` ### Verify Setup ```bash # Check API health curl -f http://localhost:8080/health # Check monitoring services curl -f http://localhost:3000/api/health curl -f http://localhost:9090/api/v1/query?query=up curl -f http://localhost:3100/ready # Check Redis docker exec ml-experiments-redis redis-cli ping ``` ## First Experiment ### 1. Setup CLI ```bash # Build CLI cd cli && zig build --release=fast # Initialize CLI config ./cli/zig-out/bin/ml init ``` ### 2. Queue Job ```bash # Simple test job echo "test experiment" | ./cli/zig-out/bin/ml queue test-job # Check status ./cli/zig-out/bin/ml status ``` ### 3. Monitor Progress ```bash # View in Grafana open http://localhost:3000 # Check logs in Grafana Log Analysis dashboard # Or view container logs docker logs ml-experiments-api -f ``` ## Key Commands ### Development Commands ```bash make help # Show all commands make build # Build all components make dev-up # Start dev environment make dev-down # Stop dev environment make dev-status # Check dev status make test # Run tests make test-unit # Run unit tests make test-integration # Run integration tests ``` ### CLI Commands ```bash # Build CLI cd cli && zig build --release=fast # Common operations ./cli/zig-out/bin/ml status # Check system status ./cli/zig-out/bin/ml queue job-name # Queue job ./cli/zig-out/bin/ml --help # Show help ``` ### Monitoring Commands ```bash # Access monitoring services open http://localhost:3000 # Grafana open http://localhost:9090 # Prometheus open http://localhost:3100 # Loki # (Optional) Re-generate Grafana provisioning (datasources/providers) python3 scripts/setup_monitoring.py ``` ## Configuration ### Environment Setup ```bash # Copy example environment cp deployments/env.dev.example .env # Edit as needed vim .env ``` **Key Variables**: - `LOG_LEVEL=info` - `GRAFANA_ADMIN_PASSWORD=admin123` ### CLI Configuration ```bash # Setup CLI config mkdir -p ~/.ml # Create config file if needed touch ~/.ml/config.toml # Edit configuration vim ~/.ml/config.toml ``` ## Testing ### Quick Test ```bash # 5-minute authentication test make test-auth # Clean up make self-cleanup ``` ### Full Test Suite ```bash # Run all tests make test # Run with coverage make test-coverage # Run specific test types make test-unit make test-integration make test-e2e ``` ### Load Testing ```bash # Run load tests make load-test # Run benchmarks make benchmark # Track performance ./scripts/track_performance.sh ``` ## Troubleshooting ### Common Issues **Port Conflicts**: ```bash # Check port usage lsof -i :8080 lsof -i :8443 lsof -i :3000 lsof -i :9090 # Kill conflicting processes kill -9 ``` **Build Issues**: ```bash # Fix Go modules go mod tidy # Fix Zig build cd cli && rm -rf zig-out zig-cache && zig build --release=fast ``` **Container Issues**: ```bash # Check container status docker ps --filter "name=ml-" # View logs docker logs ml-experiments-api docker logs ml-experiments-grafana # Restart services make dev-down && make dev-up ``` **Monitoring Issues**: ```bash # Re-setup monitoring python3 scripts/setup_monitoring.py # Restart Grafana docker restart ml-experiments-grafana # Check datasources in Grafana # Settings → Data Sources → Test connection ``` ### Debug Mode ```bash # Enable debug logging export LOG_LEVEL=debug make dev-up ``` ## Next Steps ### Explore Features 1. **Job Management**: Queue and monitor ML experiments 2. **WebSocket Communication**: Real-time updates 3. **Multi-User Authentication**: Role-based access control 4. **Performance Monitoring**: Grafana dashboards and metrics 5. **Log Aggregation**: Centralized logging with Loki ### Advanced Configuration - **Production Setup**: See [Deployment Guide](deployment.md) - **Performance Monitoring**: See [Performance Monitoring](performance-monitoring.md) - **Testing Procedures**: See [Testing Guide](testing.md) - **CLI Reference**: See [CLI Reference](cli-reference.md) ### Production Deployment For production deployment: 1. Review [Deployment Guide](deployment.md) 2. Set up production monitoring 3. Configure security and authentication 4. Set up backup procedures ## Help and Support ### Get Help ```bash make help # Show all available commands ./cli/zig-out/bin/ml --help # CLI help ``` ### Documentation - **[Testing Guide](testing.md)** - Comprehensive testing procedures - **[Deployment Guide](deployment.md)** - Production deployment - **[Performance Monitoring](performance-monitoring.md)** - Monitoring setup - **[Architecture Guide](architecture.md)** - System architecture - **[Troubleshooting](troubleshooting.md)** - Common issues ### Community - Check logs: `docker logs ml-experiments-api` - Review documentation in `docs/src/` - Use `--debug` flag with CLI commands for detailed output --- *Ready in minutes!* ## See Also - **[Architecture](architecture.md)** - System architecture overview - **[Scheduler Architecture](scheduler-architecture.md)** - Job scheduling and service management - **[Jupyter Workflow](jupyter-workflow.md)** - Jupyter notebook services - **[vLLM Workflow](vllm-workflow.md)** - LLM inference services - **[Configuration Reference](configuration-reference.md)** - Configuration options - **[Security Guide](security.md)** - Security best practices