# Quick Start

Get Fetch ML running in minutes with Docker Compose and integrated monitoring.

## Prerequisites

**Container Runtimes:**
- **Docker Compose**: For testing and development only
- **Podman**: For production experiment execution

**Requirements:**
- Go 1.25+
- Zig 0.15+
- Docker Compose (testing only)
- 4GB+ RAM
- 2GB+ disk space
- Git

## One-Command Setup

```bash
# Clone and start
git clone https://github.com/jfraeys/fetch_ml.git
cd fetch_ml
make dev-up

# Wait for services (30 seconds)
sleep 30

# Verify setup
curl http://localhost:8080/health
```

Note: the development compose runs the API server over HTTP/WS for CLI compatibility. For HTTPS/WSS, terminate TLS at a reverse proxy.

**Access Services:**
- **API Server (via Caddy)**: http://localhost:8080
- **API Server (via Caddy + internal TLS)**: https://localhost:8443
- **Grafana**: http://localhost:3000 (admin/admin123)
- **Prometheus**: http://localhost:9090
- **Loki**: http://localhost:3100

## Development Setup

### Build Components

```bash
# Build all components
make build

# Development build
make dev
```

### Start Services

```bash
# Start development stack with monitoring
make dev-up

# Check status
make dev-status

# Stop services
make dev-down
```

### Verify Setup

```bash
# Check API health
curl -f http://localhost:8080/health

# Check monitoring services
curl -f http://localhost:3000/api/health
curl -f http://localhost:9090/api/v1/query?query=up
curl -f http://localhost:3100/ready

# Check Redis
docker exec ml-experiments-redis redis-cli ping
```

## First Experiment

### 1. Setup CLI

```bash
# Build CLI
cd cli && zig build --release=fast

# Initialize CLI config
./cli/zig-out/bin/ml init
```

### 2. Queue Job

```bash
# Simple test job
echo "test experiment" | ./cli/zig-out/bin/ml queue test-job

# Check status
./cli/zig-out/bin/ml status
```

### 3. Monitor Progress

```bash
# View in Grafana
open http://localhost:3000

# Check logs in Grafana Log Analysis dashboard
# Or view container logs
docker logs ml-experiments-api -f
```

## Key Commands

### Development Commands

```bash
make help              # Show all commands
make build             # Build all components
make dev-up            # Start dev environment
make dev-down          # Stop dev environment
make dev-status        # Check dev status
make test              # Run tests
make test-unit         # Run unit tests
make test-integration  # Run integration tests
```

### CLI Commands

```bash
# Build CLI
cd cli && zig build --release=fast

# Common operations
./cli/zig-out/bin/ml status          # Check system status
./cli/zig-out/bin/ml queue job-name  # Queue job
./cli/zig-out/bin/ml --help         # Show help
```

### Monitoring Commands

```bash
# Access monitoring services
open http://localhost:3000  # Grafana
open http://localhost:9090  # Prometheus
open http://localhost:3100  # Loki

# (Optional) Re-generate Grafana provisioning (datasources/providers)
python3 scripts/setup_monitoring.py
```

## Configuration

### Environment Setup

```bash
# Copy example environment
cp deployments/env.dev.example .env

# Edit as needed
vim .env
```

**Key Variables**:
- `LOG_LEVEL=info`
- `GRAFANA_ADMIN_PASSWORD=admin123`

### CLI Configuration

```bash
# Setup CLI config
mkdir -p ~/.ml

# Create config file if needed
touch ~/.ml/config.toml

# Edit configuration
vim ~/.ml/config.toml
```

## Testing

### Quick Test

```bash
# 5-minute authentication test
make test-auth

# Clean up
make self-cleanup
```

### Full Test Suite

```bash
# Run all tests
make test

# Run with coverage
make test-coverage

# Run specific test types
make test-unit
make test-integration
make test-e2e
```

### Load Testing

```bash
# Run load tests
make load-test

# Run benchmarks
make benchmark

# Track performance
./scripts/track_performance.sh
```

## Troubleshooting

### Common Issues

**Port Conflicts**:
```bash
# Check port usage
lsof -i :8080
lsof -i :8443
lsof -i :3000
lsof -i :9090

# Kill conflicting processes
kill -9 <PID>
```

**Build Issues**:
```bash
# Fix Go modules
go mod tidy

# Fix Zig build
cd cli && rm -rf zig-out zig-cache && zig build --release=fast
```

**Container Issues**:
```bash
# Check container status
docker ps --filter "name=ml-"

# View logs
docker logs ml-experiments-api
docker logs ml-experiments-grafana

# Restart services
make dev-down && make dev-up
```

**Monitoring Issues**:
```bash
# Re-setup monitoring
python3 scripts/setup_monitoring.py

# Restart Grafana
docker restart ml-experiments-grafana

# Check datasources in Grafana
# Settings → Data Sources → Test connection
```

### Debug Mode

```bash
# Enable debug logging
export LOG_LEVEL=debug
make dev-up
```

## Next Steps

### Explore Features

1. **Job Management**: Queue and monitor ML experiments
2. **WebSocket Communication**: Real-time updates
3. **Multi-User Authentication**: Role-based access control
4. **Performance Monitoring**: Grafana dashboards and metrics
5. **Log Aggregation**: Centralized logging with Loki

### Advanced Configuration

- **Production Setup**: See [Deployment Guide](deployment.md)
- **Performance Monitoring**: See [Performance Monitoring](performance-monitoring.md)
- **Testing Procedures**: See [Testing Guide](testing.md)
- **CLI Reference**: See [CLI Reference](cli-reference.md)

### Production Deployment

For production deployment:
1. Review [Deployment Guide](deployment.md)
2. Set up production monitoring
3. Configure security and authentication
4. Set up backup procedures

## Help and Support

### Get Help

```bash
make help              # Show all available commands
./cli/zig-out/bin/ml --help  # CLI help
```

### Documentation

- **[Testing Guide](testing.md)** - Comprehensive testing procedures
- **[Deployment Guide](deployment.md)** - Production deployment
- **[Performance Monitoring](performance-monitoring.md)** - Monitoring setup
- **[Architecture Guide](architecture.md)** - System architecture
- **[Troubleshooting](troubleshooting.md)** - Common issues

### Community

- Check logs: `docker logs ml-experiments-api`
- Review documentation in `docs/src/`
- Use `--debug` flag with CLI commands for detailed output

---

*Ready in minutes!*

## See Also

- **[Architecture](architecture.md)** - System architecture overview
- **[Scheduler Architecture](scheduler-architecture.md)** - Job scheduling and service management
- **[Jupyter Workflow](jupyter-workflow.md)** - Jupyter notebook services
- **[vLLM Workflow](vllm-workflow.md)** - LLM inference services
- **[Configuration Reference](configuration-reference.md)** - Configuration options
- **[Security Guide](security.md)** - Security best practices