jfraeysd/fetch_ml

Fork 0

Jeremie Fraeys 90ea18555c

Security Scan / Security Analysis (push) Waiting to run

Details

Security Scan / Native Library Security (push) Waiting to run

Details

Verification & Maintenance / V.1 - Schema Drift Detection (push) Waiting to run

Details

Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Waiting to run

Details

Verification & Maintenance / V.7 - Audit Chain Integrity (push) Waiting to run

Details

Verification & Maintenance / V.6 - Extended Security Scanning (push) Waiting to run

Details

Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Waiting to run

Details

Verification & Maintenance / Verification Summary (push) Blocked by required conditions

Details

Build Pipeline / Build Binaries (push) Failing after 2m4s

Details

Build Pipeline / Build Docker Images (push) Has been skipped

Details

Build Pipeline / Sign HIPAA Config (push) Has been skipped

Details

Build Pipeline / Generate SLSA Provenance (push) Has been skipped

Details

Checkout test / test (push) Successful in 5s

Details

CI Pipeline / Test (push) Failing after 1s

Details

CI Pipeline / Dev Compose Smoke Test (push) Has been skipped

Details

CI Pipeline / Security Scan (push) Has been skipped

Details

CI Pipeline / Test Scripts (push) Has been skipped

Details

CI Pipeline / Test Native Libraries (push) Has been skipped

Details

CI Pipeline / Native Library Build Matrix (push) Has been skipped

Details

Contract Tests / Spec Drift Detection (push) Failing after 16s

Details

Contract Tests / API Contract Tests (push) Has been skipped

Details

Deploy API Docs / Build API Documentation (push) Failing after 5s

Details

Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped

Details

Documentation / build-and-publish (push) Failing after 44s

Details

CI Pipeline / Trigger Build Workflow (push) Failing after 0s

Details

docs: add vLLM workflow and cross-link documentation

- Add new vLLM workflow documentation (vllm-workflow.md)
- Update scheduler-architecture.md with Plugin GPU Quota and audit logging
- Add See Also sections to jupyter-workflow.md, quick-start.md,
  configuration-reference.md for better navigation
- Update landing page and index with vLLM and scheduler links
- Cross-link all documentation for improved discoverability

2026-02-26 13:04:39 -05:00

6.5 KiB

Raw Blame History

Quick Start

Get Fetch ML running in minutes with Docker Compose and integrated monitoring.

Prerequisites

Container Runtimes:

Docker Compose: For testing and development only
Podman: For production experiment execution

Requirements:

Go 1.25+
Zig 0.15+
Docker Compose (testing only)
4GB+ RAM
2GB+ disk space
Git

One-Command Setup

# Clone and start
git clone https://github.com/jfraeys/fetch_ml.git
cd fetch_ml
make dev-up

# Wait for services (30 seconds)
sleep 30

# Verify setup
curl http://localhost:8080/health

Note: the development compose runs the API server over HTTP/WS for CLI compatibility. For HTTPS/WSS, terminate TLS at a reverse proxy.

Access Services:

API Server (via Caddy): http://localhost:8080
API Server (via Caddy + internal TLS): https://localhost:8443
Grafana: http://localhost:3000 (admin/admin123)
Prometheus: http://localhost:9090
Loki: http://localhost:3100

Development Setup

Build Components

# Build all components
make build

# Development build
make dev

Start Services

# Start development stack with monitoring
make dev-up

# Check status
make dev-status

# Stop services
make dev-down

Verify Setup

# Check API health
curl -f http://localhost:8080/health

# Check monitoring services
curl -f http://localhost:3000/api/health
curl -f http://localhost:9090/api/v1/query?query=up
curl -f http://localhost:3100/ready

# Check Redis
docker exec ml-experiments-redis redis-cli ping

First Experiment

1. Setup CLI

# Build CLI
cd cli && zig build --release=fast

# Initialize CLI config
./cli/zig-out/bin/ml init

2. Queue Job

# Simple test job
echo "test experiment" | ./cli/zig-out/bin/ml queue test-job

# Check status
./cli/zig-out/bin/ml status

3. Monitor Progress

# View in Grafana
open http://localhost:3000

# Check logs in Grafana Log Analysis dashboard
# Or view container logs
docker logs ml-experiments-api -f

Key Commands

Development Commands

make help              # Show all commands
make build             # Build all components
make dev-up            # Start dev environment
make dev-down          # Stop dev environment
make dev-status        # Check dev status
make test              # Run tests
make test-unit         # Run unit tests
make test-integration  # Run integration tests

CLI Commands

# Build CLI
cd cli && zig build --release=fast

# Common operations
./cli/zig-out/bin/ml status          # Check system status
./cli/zig-out/bin/ml queue job-name  # Queue job
./cli/zig-out/bin/ml --help         # Show help

Monitoring Commands

# Access monitoring services
open http://localhost:3000  # Grafana
open http://localhost:9090  # Prometheus
open http://localhost:3100  # Loki

# (Optional) Re-generate Grafana provisioning (datasources/providers)
python3 scripts/setup_monitoring.py

Configuration

Environment Setup

# Copy example environment
cp deployments/env.dev.example .env

# Edit as needed
vim .env

Key Variables:

LOG_LEVEL=info
GRAFANA_ADMIN_PASSWORD=admin123

CLI Configuration

# Setup CLI config
mkdir -p ~/.ml

# Create config file if needed
touch ~/.ml/config.toml

# Edit configuration
vim ~/.ml/config.toml

Testing

Quick Test

# 5-minute authentication test
make test-auth

# Clean up
make self-cleanup

Full Test Suite

# Run all tests
make test

# Run with coverage
make test-coverage

# Run specific test types
make test-unit
make test-integration
make test-e2e

Load Testing

# Run load tests
make load-test

# Run benchmarks
make benchmark

# Track performance
./scripts/track_performance.sh

Troubleshooting

Common Issues

Port Conflicts:

# Check port usage
lsof -i :8080
lsof -i :8443
lsof -i :3000
lsof -i :9090

# Kill conflicting processes
kill -9 <PID>

Build Issues:

# Fix Go modules
go mod tidy

# Fix Zig build
cd cli && rm -rf zig-out zig-cache && zig build --release=fast

Container Issues:

# Check container status
docker ps --filter "name=ml-"

# View logs
docker logs ml-experiments-api
docker logs ml-experiments-grafana

# Restart services
make dev-down && make dev-up

Monitoring Issues:

# Re-setup monitoring
python3 scripts/setup_monitoring.py

# Restart Grafana
docker restart ml-experiments-grafana

# Check datasources in Grafana
# Settings → Data Sources → Test connection

Debug Mode

# Enable debug logging
export LOG_LEVEL=debug
make dev-up

Next Steps

Explore Features

Job Management: Queue and monitor ML experiments
WebSocket Communication: Real-time updates
Multi-User Authentication: Role-based access control
Performance Monitoring: Grafana dashboards and metrics
Log Aggregation: Centralized logging with Loki

Advanced Configuration

Production Setup: See Deployment Guide
Performance Monitoring: See Performance Monitoring
Testing Procedures: See Testing Guide
CLI Reference: See CLI Reference

Production Deployment

For production deployment:

Review Deployment Guide
Set up production monitoring
Configure security and authentication
Set up backup procedures

Help and Support

Get Help

make help              # Show all available commands
./cli/zig-out/bin/ml --help  # CLI help

Documentation

Testing Guide - Comprehensive testing procedures
Deployment Guide - Production deployment
Performance Monitoring - Monitoring setup
Architecture Guide - System architecture
Troubleshooting - Common issues

Community

Check logs: docker logs ml-experiments-api
Review documentation in docs/src/
Use --debug flag with CLI commands for detailed output

Ready in minutes!

6.5 KiB Raw Blame History

Quick Start

Prerequisites

One-Command Setup

Development Setup

Build Components

Start Services

Verify Setup

First Experiment

1. Setup CLI

2. Queue Job

3. Monitor Progress

Key Commands

Development Commands

CLI Commands

Monitoring Commands

Configuration

Environment Setup

CLI Configuration

Testing

Quick Test

Full Test Suite

Load Testing

Troubleshooting

Common Issues

Debug Mode

Next Steps

Explore Features

Advanced Configuration

Production Deployment

Help and Support

Get Help

Documentation

Community

See Also

6.5 KiB

Raw Blame History