fetch_ml/docs/src/quick-start.md
Jeremie Fraeys 90ea18555c
Some checks failed
Security Scan / Security Analysis (push) Waiting to run
Security Scan / Native Library Security (push) Waiting to run
Verification & Maintenance / V.1 - Schema Drift Detection (push) Waiting to run
Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Waiting to run
Verification & Maintenance / V.7 - Audit Chain Integrity (push) Waiting to run
Verification & Maintenance / V.6 - Extended Security Scanning (push) Waiting to run
Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Waiting to run
Verification & Maintenance / Verification Summary (push) Blocked by required conditions
Build Pipeline / Build Binaries (push) Failing after 2m4s
Build Pipeline / Build Docker Images (push) Has been skipped
Build Pipeline / Sign HIPAA Config (push) Has been skipped
Build Pipeline / Generate SLSA Provenance (push) Has been skipped
Checkout test / test (push) Successful in 5s
CI Pipeline / Test (push) Failing after 1s
CI Pipeline / Dev Compose Smoke Test (push) Has been skipped
CI Pipeline / Security Scan (push) Has been skipped
CI Pipeline / Test Scripts (push) Has been skipped
CI Pipeline / Test Native Libraries (push) Has been skipped
CI Pipeline / Native Library Build Matrix (push) Has been skipped
Contract Tests / Spec Drift Detection (push) Failing after 16s
Contract Tests / API Contract Tests (push) Has been skipped
Deploy API Docs / Build API Documentation (push) Failing after 5s
Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped
Documentation / build-and-publish (push) Failing after 44s
CI Pipeline / Trigger Build Workflow (push) Failing after 0s
docs: add vLLM workflow and cross-link documentation
- Add new vLLM workflow documentation (vllm-workflow.md)
- Update scheduler-architecture.md with Plugin GPU Quota and audit logging
- Add See Also sections to jupyter-workflow.md, quick-start.md,
  configuration-reference.md for better navigation
- Update landing page and index with vLLM and scheduler links
- Cross-link all documentation for improved discoverability
2026-02-26 13:04:39 -05:00

6.5 KiB

Quick Start

Get Fetch ML running in minutes with Docker Compose and integrated monitoring.

Prerequisites

Container Runtimes:

  • Docker Compose: For testing and development only
  • Podman: For production experiment execution

Requirements:

  • Go 1.25+
  • Zig 0.15+
  • Docker Compose (testing only)
  • 4GB+ RAM
  • 2GB+ disk space
  • Git

One-Command Setup

# Clone and start
git clone https://github.com/jfraeys/fetch_ml.git
cd fetch_ml
make dev-up

# Wait for services (30 seconds)
sleep 30

# Verify setup
curl http://localhost:8080/health

Note: the development compose runs the API server over HTTP/WS for CLI compatibility. For HTTPS/WSS, terminate TLS at a reverse proxy.

Access Services:

Development Setup

Build Components

# Build all components
make build

# Development build
make dev

Start Services

# Start development stack with monitoring
make dev-up

# Check status
make dev-status

# Stop services
make dev-down

Verify Setup

# Check API health
curl -f http://localhost:8080/health

# Check monitoring services
curl -f http://localhost:3000/api/health
curl -f http://localhost:9090/api/v1/query?query=up
curl -f http://localhost:3100/ready

# Check Redis
docker exec ml-experiments-redis redis-cli ping

First Experiment

1. Setup CLI

# Build CLI
cd cli && zig build --release=fast

# Initialize CLI config
./cli/zig-out/bin/ml init

2. Queue Job

# Simple test job
echo "test experiment" | ./cli/zig-out/bin/ml queue test-job

# Check status
./cli/zig-out/bin/ml status

3. Monitor Progress

# View in Grafana
open http://localhost:3000

# Check logs in Grafana Log Analysis dashboard
# Or view container logs
docker logs ml-experiments-api -f

Key Commands

Development Commands

make help              # Show all commands
make build             # Build all components
make dev-up            # Start dev environment
make dev-down          # Stop dev environment
make dev-status        # Check dev status
make test              # Run tests
make test-unit         # Run unit tests
make test-integration  # Run integration tests

CLI Commands

# Build CLI
cd cli && zig build --release=fast

# Common operations
./cli/zig-out/bin/ml status          # Check system status
./cli/zig-out/bin/ml queue job-name  # Queue job
./cli/zig-out/bin/ml --help         # Show help

Monitoring Commands

# Access monitoring services
open http://localhost:3000  # Grafana
open http://localhost:9090  # Prometheus
open http://localhost:3100  # Loki

# (Optional) Re-generate Grafana provisioning (datasources/providers)
python3 scripts/setup_monitoring.py

Configuration

Environment Setup

# Copy example environment
cp deployments/env.dev.example .env

# Edit as needed
vim .env

Key Variables:

  • LOG_LEVEL=info
  • GRAFANA_ADMIN_PASSWORD=admin123

CLI Configuration

# Setup CLI config
mkdir -p ~/.ml

# Create config file if needed
touch ~/.ml/config.toml

# Edit configuration
vim ~/.ml/config.toml

Testing

Quick Test

# 5-minute authentication test
make test-auth

# Clean up
make self-cleanup

Full Test Suite

# Run all tests
make test

# Run with coverage
make test-coverage

# Run specific test types
make test-unit
make test-integration
make test-e2e

Load Testing

# Run load tests
make load-test

# Run benchmarks
make benchmark

# Track performance
./scripts/track_performance.sh

Troubleshooting

Common Issues

Port Conflicts:

# Check port usage
lsof -i :8080
lsof -i :8443
lsof -i :3000
lsof -i :9090

# Kill conflicting processes
kill -9 <PID>

Build Issues:

# Fix Go modules
go mod tidy

# Fix Zig build
cd cli && rm -rf zig-out zig-cache && zig build --release=fast

Container Issues:

# Check container status
docker ps --filter "name=ml-"

# View logs
docker logs ml-experiments-api
docker logs ml-experiments-grafana

# Restart services
make dev-down && make dev-up

Monitoring Issues:

# Re-setup monitoring
python3 scripts/setup_monitoring.py

# Restart Grafana
docker restart ml-experiments-grafana

# Check datasources in Grafana
# Settings → Data Sources → Test connection

Debug Mode

# Enable debug logging
export LOG_LEVEL=debug
make dev-up

Next Steps

Explore Features

  1. Job Management: Queue and monitor ML experiments
  2. WebSocket Communication: Real-time updates
  3. Multi-User Authentication: Role-based access control
  4. Performance Monitoring: Grafana dashboards and metrics
  5. Log Aggregation: Centralized logging with Loki

Advanced Configuration

Production Deployment

For production deployment:

  1. Review Deployment Guide
  2. Set up production monitoring
  3. Configure security and authentication
  4. Set up backup procedures

Help and Support

Get Help

make help              # Show all available commands
./cli/zig-out/bin/ml --help  # CLI help

Documentation

Community

  • Check logs: docker logs ml-experiments-api
  • Review documentation in docs/src/
  • Use --debug flag with CLI commands for detailed output

Ready in minutes!

See Also