- Add architecture, CI/CD, CLI reference documentation - Update installation, operations, and quick-start guides - Add Jupyter workflow and queue documentation - New landing page and research runner plan
332 lines
No EOL
6 KiB
Markdown
332 lines
No EOL
6 KiB
Markdown
# Quick Start
|
|
|
|
Get Fetch ML running in minutes with Docker Compose and integrated monitoring.
|
|
|
|
## Prerequisites
|
|
|
|
**Container Runtimes:**
|
|
- **Docker Compose**: For testing and development only
|
|
- **Podman**: For production experiment execution
|
|
|
|
**Requirements:**
|
|
- Go 1.25+
|
|
- Zig 0.15+
|
|
- Docker Compose (testing only)
|
|
- 4GB+ RAM
|
|
- 2GB+ disk space
|
|
- Git
|
|
|
|
## One-Command Setup
|
|
|
|
```bash
|
|
# Clone and start
|
|
git clone https://github.com/jfraeys/fetch_ml.git
|
|
cd fetch_ml
|
|
make dev-up
|
|
|
|
# Wait for services (30 seconds)
|
|
sleep 30
|
|
|
|
# Verify setup
|
|
curl http://localhost:8080/health
|
|
```
|
|
|
|
Note: the development compose runs the API server over HTTP/WS for CLI compatibility. For HTTPS/WSS, terminate TLS at a reverse proxy.
|
|
|
|
**Access Services:**
|
|
- **API Server (via Caddy)**: http://localhost:8080
|
|
- **API Server (via Caddy + internal TLS)**: https://localhost:8443
|
|
- **Grafana**: http://localhost:3000 (admin/admin123)
|
|
- **Prometheus**: http://localhost:9090
|
|
- **Loki**: http://localhost:3100
|
|
|
|
## Development Setup
|
|
|
|
### Build Components
|
|
|
|
```bash
|
|
# Build all components
|
|
make build
|
|
|
|
# Development build
|
|
make dev
|
|
```
|
|
|
|
### Start Services
|
|
|
|
```bash
|
|
# Start development stack with monitoring
|
|
make dev-up
|
|
|
|
# Check status
|
|
make dev-status
|
|
|
|
# Stop services
|
|
make dev-down
|
|
```
|
|
|
|
### Verify Setup
|
|
|
|
```bash
|
|
# Check API health
|
|
curl -f http://localhost:8080/health
|
|
|
|
# Check monitoring services
|
|
curl -f http://localhost:3000/api/health
|
|
curl -f http://localhost:9090/api/v1/query?query=up
|
|
curl -f http://localhost:3100/ready
|
|
|
|
# Check Redis
|
|
docker exec ml-experiments-redis redis-cli ping
|
|
```
|
|
|
|
## First Experiment
|
|
|
|
### 1. Setup CLI
|
|
|
|
```bash
|
|
# Build CLI
|
|
cd cli && zig build --release=fast
|
|
|
|
# Initialize CLI config
|
|
./cli/zig-out/bin/ml init
|
|
```
|
|
|
|
### 2. Queue Job
|
|
|
|
```bash
|
|
# Simple test job
|
|
echo "test experiment" | ./cli/zig-out/bin/ml queue test-job
|
|
|
|
# Check status
|
|
./cli/zig-out/bin/ml status
|
|
```
|
|
|
|
### 3. Monitor Progress
|
|
|
|
```bash
|
|
# View in Grafana
|
|
open http://localhost:3000
|
|
|
|
# Check logs in Grafana Log Analysis dashboard
|
|
# Or view container logs
|
|
docker logs ml-experiments-api -f
|
|
```
|
|
|
|
## Key Commands
|
|
|
|
### Development Commands
|
|
|
|
```bash
|
|
make help # Show all commands
|
|
make build # Build all components
|
|
make dev-up # Start dev environment
|
|
make dev-down # Stop dev environment
|
|
make dev-status # Check dev status
|
|
make test # Run tests
|
|
make test-unit # Run unit tests
|
|
make test-integration # Run integration tests
|
|
```
|
|
|
|
### CLI Commands
|
|
|
|
```bash
|
|
# Build CLI
|
|
cd cli && zig build --release=fast
|
|
|
|
# Common operations
|
|
./cli/zig-out/bin/ml status # Check system status
|
|
./cli/zig-out/bin/ml queue job-name # Queue job
|
|
./cli/zig-out/bin/ml --help # Show help
|
|
```
|
|
|
|
### Monitoring Commands
|
|
|
|
```bash
|
|
# Access monitoring services
|
|
open http://localhost:3000 # Grafana
|
|
open http://localhost:9090 # Prometheus
|
|
open http://localhost:3100 # Loki
|
|
|
|
# (Optional) Re-generate Grafana provisioning (datasources/providers)
|
|
python3 scripts/setup_monitoring.py
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Setup
|
|
|
|
```bash
|
|
# Copy example environment
|
|
cp deployments/env.dev.example .env
|
|
|
|
# Edit as needed
|
|
vim .env
|
|
```
|
|
|
|
**Key Variables**:
|
|
- `LOG_LEVEL=info`
|
|
- `GRAFANA_ADMIN_PASSWORD=admin123`
|
|
|
|
### CLI Configuration
|
|
|
|
```bash
|
|
# Setup CLI config
|
|
mkdir -p ~/.ml
|
|
|
|
# Create config file if needed
|
|
touch ~/.ml/config.toml
|
|
|
|
# Edit configuration
|
|
vim ~/.ml/config.toml
|
|
```
|
|
|
|
## Testing
|
|
|
|
### Quick Test
|
|
|
|
```bash
|
|
# 5-minute authentication test
|
|
make test-auth
|
|
|
|
# Clean up
|
|
make self-cleanup
|
|
```
|
|
|
|
### Full Test Suite
|
|
|
|
```bash
|
|
# Run all tests
|
|
make test
|
|
|
|
# Run with coverage
|
|
make test-coverage
|
|
|
|
# Run specific test types
|
|
make test-unit
|
|
make test-integration
|
|
make test-e2e
|
|
```
|
|
|
|
### Load Testing
|
|
|
|
```bash
|
|
# Run load tests
|
|
make load-test
|
|
|
|
# Run benchmarks
|
|
make benchmark
|
|
|
|
# Track performance
|
|
./scripts/track_performance.sh
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Port Conflicts**:
|
|
```bash
|
|
# Check port usage
|
|
lsof -i :8080
|
|
lsof -i :8443
|
|
lsof -i :3000
|
|
lsof -i :9090
|
|
|
|
# Kill conflicting processes
|
|
kill -9 <PID>
|
|
```
|
|
|
|
**Build Issues**:
|
|
```bash
|
|
# Fix Go modules
|
|
go mod tidy
|
|
|
|
# Fix Zig build
|
|
cd cli && rm -rf zig-out zig-cache && zig build --release=fast
|
|
```
|
|
|
|
**Container Issues**:
|
|
```bash
|
|
# Check container status
|
|
docker ps --filter "name=ml-"
|
|
|
|
# View logs
|
|
docker logs ml-experiments-api
|
|
docker logs ml-experiments-grafana
|
|
|
|
# Restart services
|
|
make dev-down && make dev-up
|
|
```
|
|
|
|
**Monitoring Issues**:
|
|
```bash
|
|
# Re-setup monitoring
|
|
python3 scripts/setup_monitoring.py
|
|
|
|
# Restart Grafana
|
|
docker restart ml-experiments-grafana
|
|
|
|
# Check datasources in Grafana
|
|
# Settings → Data Sources → Test connection
|
|
```
|
|
|
|
### Debug Mode
|
|
|
|
```bash
|
|
# Enable debug logging
|
|
export LOG_LEVEL=debug
|
|
make dev-up
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
### Explore Features
|
|
|
|
1. **Job Management**: Queue and monitor ML experiments
|
|
2. **WebSocket Communication**: Real-time updates
|
|
3. **Multi-User Authentication**: Role-based access control
|
|
4. **Performance Monitoring**: Grafana dashboards and metrics
|
|
5. **Log Aggregation**: Centralized logging with Loki
|
|
|
|
### Advanced Configuration
|
|
|
|
- **Production Setup**: See [Deployment Guide](deployment.md)
|
|
- **Performance Monitoring**: See [Performance Monitoring](performance-monitoring.md)
|
|
- **Testing Procedures**: See [Testing Guide](testing.md)
|
|
- **CLI Reference**: See [CLI Reference](cli-reference.md)
|
|
|
|
### Production Deployment
|
|
|
|
For production deployment:
|
|
1. Review [Deployment Guide](deployment.md)
|
|
2. Set up production monitoring
|
|
3. Configure security and authentication
|
|
4. Set up backup procedures
|
|
|
|
## Help and Support
|
|
|
|
### Get Help
|
|
|
|
```bash
|
|
make help # Show all available commands
|
|
./cli/zig-out/bin/ml --help # CLI help
|
|
```
|
|
|
|
### Documentation
|
|
|
|
- **[Testing Guide](testing.md)** - Comprehensive testing procedures
|
|
- **[Deployment Guide](deployment.md)** - Production deployment
|
|
- **[Performance Monitoring](performance-monitoring.md)** - Monitoring setup
|
|
- **[Architecture Guide](architecture.md)** - System architecture
|
|
- **[Troubleshooting](troubleshooting.md)** - Common issues
|
|
|
|
### Community
|
|
|
|
- Check logs: `docker logs ml-experiments-api`
|
|
- Review documentation in `docs/src/`
|
|
- Use `--debug` flag with CLI commands for detailed output
|
|
|
|
---
|
|
|
|
*Ready in minutes!* |