# First Experiment Run your first machine learning experiment with Fetch ML. ## Prerequisites **Container Runtimes:** - **Docker Compose**: For testing and development only - **Podman**: For production experiment execution - Fetch ML installed and running - API key (see [Security](security.md) and [API Key Process](api-key-process.md)) - Basic ML knowledge ## Experiment Workflow ### 1. Prepare Your ML Code Create a simple Python script: ```python # experiment.py import argparse import json import sys import time def main(): parser = argparse.ArgumentParser() parser.add_argument('--epochs', type=int, default=10) parser.add_argument('--lr', type=float, default=0.001) parser.add_argument('--output', default='results.json') args = parser.parse_args() # Simulate training results = { 'epochs': args.epochs, 'learning_rate': args.lr, 'accuracy': 0.85 + (args.lr * 0.1), 'loss': 0.5 - (args.epochs * 0.01), 'training_time': args.epochs * 0.1 } # Save results with open(args.output, 'w') as f: json.dump(results, f, indent=2) print(f"Training completed: {results}") return results if __name__ == '__main__': main() ``` ### 2. Submit Job via API ```bash # Submit experiment curl -X POST http://localhost:8080/api/v1/jobs \ -H "Content-Type: application/json" \ -H "X-API-Key: your-api-key" \ -d '{ "job_name": "first-experiment", "args": "--epochs 20 --lr 0.01 --output experiment_results.json", "priority": 1, "metadata": { "experiment_type": "training", "dataset": "sample_data" } }' ``` ### 3. Monitor Progress ```bash # Check job status curl -H "X-API-Key: your-api-key" \ http://localhost:8080/api/v1/jobs/first-experiment # List all jobs curl -H "X-API-Key: your-api-key" \ http://localhost:8080/api/v1/jobs # Get job metrics curl -H "X-API-Key: your-api-key" \ http://localhost:8080/api/v1/jobs/first-experiment/metrics ``` ### 4. Use CLI ```bash # Submit with CLI cd cli && zig build --release=fast ./cli/zig-out/bin/ml submit \ --name "cli-experiment" \ --args "--epochs 15 --lr 0.005" \ --server http://localhost:8080 # Monitor with CLI ./cli/zig-out/bin/ml list-jobs --server http://localhost:8080 ./cli/zig-out/bin/ml job-status cli-experiment --server http://localhost:8080 ``` ## Advanced Experiment ### Hyperparameter Tuning ```bash # Submit multiple experiments for lr in 0.001 0.01 0.1; do curl -X POST http://localhost:8080/api/v1/jobs \ -H "Content-Type: application/json" \ -H "X-API-Key: your-api-key" \ -d "{ \"job_name\": \"tune-lr-$lr\", \"args\": \"--epochs 10 --lr $lr\", \"metadata\": {\"learning_rate\": $lr} }" done ``` ### Batch Processing ```bash # Submit batch job curl -X POST http://localhost:8080/api/v1/jobs \ -H "Content-Type: application/json" \ -H "X-API-Key: your-api-key" \ -d '{ "job_name": "batch-processing", "args": "--input data/ --output results/ --batch-size 32", "priority": 2, "datasets": ["training_data", "validation_data"] }' ``` ## Results and Output ### Access Results ```bash # Download results curl -H "X-API-Key: your-api-key" \ http://localhost:8080/api/v1/jobs/first-experiment/results # View job details curl -H "X-API-Key: your-api-key" \ http://localhost:8080/api/v1/jobs/first-experiment | jq . ``` ### Result Format ```json { "job_id": "first-experiment", "status": "completed", "results": { "epochs": 20, "learning_rate": 0.01, "accuracy": 0.86, "loss": 0.3, "training_time": 2.0 }, "metrics": { "gpu_utilization": "85%", "memory_usage": "2GB", "execution_time": "120s" } } ``` ## Best Practices ### Job Naming - Use descriptive names: `model-training-v2`, `data-preprocessing` - Include version numbers: `experiment-v1`, `experiment-v2` - Add timestamps: `daily-batch-2024-01-15` ### Metadata Usage ```json { "metadata": { "experiment_type": "training", "model_version": "v2.1", "dataset": "imagenet-2024", "environment": "gpu", "team": "ml-team" } } ``` ### Error Handling ```bash # Check failed jobs curl -H "X-API-Key: your-api-key" \ "http://localhost:8080/api/v1/jobs?status=failed" # Retry failed job curl -X POST http://localhost:8080/api/v1/jobs \ -H "Content-Type: application/json" \ -H "X-API-Key: your-api-key" \ -d '{ "job_name": "retry-experiment", "args": "--epochs 20 --lr 0.01", "metadata": {"retry_of": "first-experiment"} }' ``` ## Related Documentation - [Quick Start](quick-start.md) - Local development environment and dev stack - [Testing Guide](testing.md) - Test your experiments - [Deployment](deployment.md) - Scale to production - [Performance Monitoring](performance-monitoring.md) - Track experiment performance ## Troubleshooting **Job stuck in pending?** - Check worker status: `curl http://localhost:8080/api/v1/workers` - Verify resources: `docker stats` - Check logs: `docker logs ml-experiments-api` **Job failed?** - Check error message: `curl /api/v1/jobs/job-id` - Review job arguments - Verify input data **No results?** - Check job completion status - Verify output file paths - Check storage permissions