fetch_ml/docs/src/first-experiment.md
Jeremie Fraeys 385d2cf386 docs: add comprehensive documentation with MkDocs site
- Add complete API documentation and architecture guides
- Include quick start, installation, and deployment guides
- Add troubleshooting and security documentation
- Include CLI reference and configuration schema docs
- Add production monitoring and operations guides
- Implement MkDocs configuration with search functionality
- Include comprehensive user and developer documentation

Provides complete documentation for users and developers
covering all aspects of the FetchML platform.
2025-12-04 16:54:57 -05:00

5.2 KiB

First Experiment

Run your first machine learning experiment with Fetch ML.

Prerequisites

Container Runtimes:

  • Docker Compose: For testing and development only

  • Podman: For production experiment execution

  • Fetch ML installed and running

  • API key (see Security and API Key Process)

  • Basic ML knowledge

Experiment Workflow

1. Prepare Your ML Code

Create a simple Python script:

# experiment.py
import argparse
import json
import sys
import time

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--lr', type=float, default=0.001)
    parser.add_argument('--output', default='results.json')
    
    args = parser.parse_args()
    
    # Simulate training
    results = {
        'epochs': args.epochs,
        'learning_rate': args.lr,
        'accuracy': 0.85 + (args.lr * 0.1),
        'loss': 0.5 - (args.epochs * 0.01),
        'training_time': args.epochs * 0.1
    }
    
    # Save results
    with open(args.output, 'w') as f:
        json.dump(results, f, indent=2)
    
    print(f"Training completed: {results}")
    return results

if __name__ == '__main__':
    main()

2. Submit Job via API

# Submit experiment
curl -X POST http://localhost:9101/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "first-experiment",
    "args": "--epochs 20 --lr 0.01 --output experiment_results.json",
    "priority": 1,
    "metadata": {
      "experiment_type": "training",
      "dataset": "sample_data"
    }
  }'

3. Monitor Progress

# Check job status
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment

# List all jobs
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs

# Get job metrics
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment/metrics

4. Use CLI

# Submit with CLI
cd cli && zig build dev
./cli/zig-out/dev/ml submit \
  --name "cli-experiment" \
  --args "--epochs 15 --lr 0.005" \
  --server http://localhost:9101

# Monitor with CLI
./cli/zig-out/dev/ml list-jobs --server http://localhost:9101
./cli/zig-out/dev/ml job-status cli-experiment --server http://localhost:9101

Advanced Experiment

Hyperparameter Tuning

# Submit multiple experiments
for lr in 0.001 0.01 0.1; do
  curl -X POST http://localhost:9101/api/v1/jobs \
    -H "Content-Type: application/json" \
    -H "X-API-Key: your-api-key" \
    -d "{
      \"job_name\": \"tune-lr-$lr\",
      \"args\": \"--epochs 10 --lr $lr\",
      \"metadata\": {\"learning_rate\": $lr}
    }"
done

Batch Processing

# Submit batch job
curl -X POST http://localhost:9101/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "batch-processing",
    "args": "--input data/ --output results/ --batch-size 32",
    "priority": 2,
    "datasets": ["training_data", "validation_data"]
  }'

Results and Output

Access Results

# Download results
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment/results

# View job details
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment | jq .

Result Format

{
  "job_id": "first-experiment",
  "status": "completed",
  "results": {
    "epochs": 20,
    "learning_rate": 0.01,
    "accuracy": 0.86,
    "loss": 0.3,
    "training_time": 2.0
  },
  "metrics": {
    "gpu_utilization": "85%",
    "memory_usage": "2GB",
    "execution_time": "120s"
  }
}

Best Practices

Job Naming

  • Use descriptive names: model-training-v2, data-preprocessing
  • Include version numbers: experiment-v1, experiment-v2
  • Add timestamps: daily-batch-2024-01-15

Metadata Usage

{
  "metadata": {
    "experiment_type": "training",
    "model_version": "v2.1",
    "dataset": "imagenet-2024",
    "environment": "gpu",
    "team": "ml-team"
  }
}

Error Handling

# Check failed jobs
curl -H "X-API-Key: your-api-key" \
  "http://localhost:9101/api/v1/jobs?status=failed"

# Retry failed job
curl -X POST http://localhost:9101/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "retry-experiment",
    "args": "--epochs 20 --lr 0.01",
    "metadata": {"retry_of": "first-experiment"}
  }'
  • [Development Setup (see Development Setup)](development-setup.md) - Local development environment
  • [Testing Guide (see Testing Guide)](testing.md) - Test your experiments
  • [Production Deployment (see Deployment)](deployment.md) - Scale to production
  • Monitoring - Track experiment performance

Troubleshooting

Job stuck in pending?

  • Check worker status: curl /api/v1/workers
  • Verify resources: docker stats
  • Check logs: docker-compose logs api-server

Job failed?

  • Check error message: curl /api/v1/jobs/job-id
  • Review job arguments
  • Verify input data

No results?

  • Check job completion status
  • Verify output file paths
  • Check storage permissions