5.2 KiB
5.2 KiB
First Experiment
Run your first machine learning experiment with Fetch ML.
Prerequisites
Container Runtimes:
-
Docker Compose: For testing and development only
-
Podman: For production experiment execution
-
Fetch ML installed and running
-
API key (see Security and API Key Process)
-
Basic ML knowledge
Experiment Workflow
1. Prepare Your ML Code
Create a simple Python script:
# experiment.py
import argparse
import json
import sys
import time
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--lr', type=float, default=0.001)
parser.add_argument('--output', default='results.json')
args = parser.parse_args()
# Simulate training
results = {
'epochs': args.epochs,
'learning_rate': args.lr,
'accuracy': 0.85 + (args.lr * 0.1),
'loss': 0.5 - (args.epochs * 0.01),
'training_time': args.epochs * 0.1
}
# Save results
with open(args.output, 'w') as f:
json.dump(results, f, indent=2)
print(f"Training completed: {results}")
return results
if __name__ == '__main__':
main()
2. Submit Job via API
# Submit experiment
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "first-experiment",
"args": "--epochs 20 --lr 0.01 --output experiment_results.json",
"priority": 1,
"metadata": {
"experiment_type": "training",
"dataset": "sample_data"
}
}'
3. Monitor Progress
# Check job status
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment
# List all jobs
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs
# Get job metrics
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment/metrics
4. Use CLI
# Submit with CLI
cd cli && zig build --release=fast
./cli/zig-out/bin/ml submit \
--name "cli-experiment" \
--args "--epochs 15 --lr 0.005" \
--server http://localhost:8080
# Monitor with CLI
./cli/zig-out/bin/ml list-jobs --server http://localhost:8080
./cli/zig-out/bin/ml job-status cli-experiment --server http://localhost:8080
Advanced Experiment
Hyperparameter Tuning
# Submit multiple experiments
for lr in 0.001 0.01 0.1; do
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d "{
\"job_name\": \"tune-lr-$lr\",
\"args\": \"--epochs 10 --lr $lr\",
\"metadata\": {\"learning_rate\": $lr}
}"
done
Batch Processing
# Submit batch job
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "batch-processing",
"args": "--input data/ --output results/ --batch-size 32",
"priority": 2,
"datasets": ["training_data", "validation_data"]
}'
Results and Output
Access Results
# Download results
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment/results
# View job details
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment | jq .
Result Format
{
"job_id": "first-experiment",
"status": "completed",
"results": {
"epochs": 20,
"learning_rate": 0.01,
"accuracy": 0.86,
"loss": 0.3,
"training_time": 2.0
},
"metrics": {
"gpu_utilization": "85%",
"memory_usage": "2GB",
"execution_time": "120s"
}
}
Best Practices
Job Naming
- Use descriptive names:
model-training-v2,data-preprocessing - Include version numbers:
experiment-v1,experiment-v2 - Add timestamps:
daily-batch-2024-01-15
Metadata Usage
{
"metadata": {
"experiment_type": "training",
"model_version": "v2.1",
"dataset": "imagenet-2024",
"environment": "gpu",
"team": "ml-team"
}
}
Error Handling
# Check failed jobs
curl -H "X-API-Key: your-api-key" \
"http://localhost:8080/api/v1/jobs?status=failed"
# Retry failed job
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "retry-experiment",
"args": "--epochs 20 --lr 0.01",
"metadata": {"retry_of": "first-experiment"}
}'
Related Documentation
- Quick Start - Local development environment and dev stack
- Testing Guide - Test your experiments
- Deployment - Scale to production
- Performance Monitoring - Track experiment performance
Troubleshooting
Job stuck in pending?
- Check worker status:
curl http://localhost:8080/api/v1/workers - Verify resources:
docker stats - Check logs:
docker logs ml-experiments-api
Job failed?
- Check error message:
curl /api/v1/jobs/job-id - Review job arguments
- Verify input data
No results?
- Check job completion status
- Verify output file paths
- Check storage permissions