jfraeysd/fetch_ml

Fork 0

Jeremie Fraeys c0eeeda940 feat(experiment): improve experiment lifecycle and update first-experiment guide

2026-01-05 12:37:34 -05:00

5.2 KiB

Raw Blame History

First Experiment

Run your first machine learning experiment with Fetch ML.

Prerequisites

Container Runtimes:

Docker Compose: For testing and development only
Podman: For production experiment execution
Fetch ML installed and running
API key (see Security and API Key Process)
Basic ML knowledge

Experiment Workflow

1. Prepare Your ML Code

Create a simple Python script:

# experiment.py
import argparse
import json
import sys
import time

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--lr', type=float, default=0.001)
    parser.add_argument('--output', default='results.json')
    
    args = parser.parse_args()
    
    # Simulate training
    results = {
        'epochs': args.epochs,
        'learning_rate': args.lr,
        'accuracy': 0.85 + (args.lr * 0.1),
        'loss': 0.5 - (args.epochs * 0.01),
        'training_time': args.epochs * 0.1
    }
    
    # Save results
    with open(args.output, 'w') as f:
        json.dump(results, f, indent=2)
    
    print(f"Training completed: {results}")
    return results

if __name__ == '__main__':
    main()

2. Submit Job via API

# Submit experiment
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "first-experiment",
    "args": "--epochs 20 --lr 0.01 --output experiment_results.json",
    "priority": 1,
    "metadata": {
      "experiment_type": "training",
      "dataset": "sample_data"
    }
  }'

3. Monitor Progress

# Check job status
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment

# List all jobs
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs

# Get job metrics
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment/metrics

4. Use CLI

# Submit with CLI
cd cli && zig build --release=fast
./cli/zig-out/bin/ml submit \
  --name "cli-experiment" \
  --args "--epochs 15 --lr 0.005" \
  --server http://localhost:8080

# Monitor with CLI
./cli/zig-out/bin/ml list-jobs --server http://localhost:8080
./cli/zig-out/bin/ml job-status cli-experiment --server http://localhost:8080

Advanced Experiment

Hyperparameter Tuning

# Submit multiple experiments
for lr in 0.001 0.01 0.1; do
  curl -X POST http://localhost:8080/api/v1/jobs \
    -H "Content-Type: application/json" \
    -H "X-API-Key: your-api-key" \
    -d "{
      \"job_name\": \"tune-lr-$lr\",
      \"args\": \"--epochs 10 --lr $lr\",
      \"metadata\": {\"learning_rate\": $lr}
    }"
done

Batch Processing

# Submit batch job
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "batch-processing",
    "args": "--input data/ --output results/ --batch-size 32",
    "priority": 2,
    "datasets": ["training_data", "validation_data"]
  }'

Results and Output

Access Results

# Download results
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment/results

# View job details
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment | jq .

Result Format

{
  "job_id": "first-experiment",
  "status": "completed",
  "results": {
    "epochs": 20,
    "learning_rate": 0.01,
    "accuracy": 0.86,
    "loss": 0.3,
    "training_time": 2.0
  },
  "metrics": {
    "gpu_utilization": "85%",
    "memory_usage": "2GB",
    "execution_time": "120s"
  }
}

Best Practices

Job Naming

Use descriptive names: model-training-v2, data-preprocessing
Include version numbers: experiment-v1, experiment-v2
Add timestamps: daily-batch-2024-01-15

Metadata Usage

{
  "metadata": {
    "experiment_type": "training",
    "model_version": "v2.1",
    "dataset": "imagenet-2024",
    "environment": "gpu",
    "team": "ml-team"
  }
}

Error Handling

# Check failed jobs
curl -H "X-API-Key: your-api-key" \
  "http://localhost:8080/api/v1/jobs?status=failed"

# Retry failed job
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "retry-experiment",
    "args": "--epochs 20 --lr 0.01",
    "metadata": {"retry_of": "first-experiment"}
  }'

Quick Start - Local development environment and dev stack
Testing Guide - Test your experiments
Deployment - Scale to production
Performance Monitoring - Track experiment performance

Troubleshooting

Job stuck in pending?

Check worker status: curl http://localhost:8080/api/v1/workers
Verify resources: docker stats
Check logs: docker logs ml-experiments-api

Job failed?

Check error message: curl /api/v1/jobs/job-id
Review job arguments
Verify input data

No results?

Check job completion status
Verify output file paths
Check storage permissions

5.2 KiB Raw Blame History

First Experiment

Prerequisites

Experiment Workflow

1. Prepare Your ML Code

2. Submit Job via API

3. Monitor Progress

4. Use CLI

Advanced Experiment

Hyperparameter Tuning

Batch Processing

Results and Output

Access Results

Result Format

Best Practices

Job Naming

Metadata Usage

Error Handling

Related Documentation

Troubleshooting

5.2 KiB

Raw Blame History