fetch_ml/docs/src/first-experiment.md

235 lines
5.2 KiB
Markdown

# First Experiment
Run your first machine learning experiment with Fetch ML.
## Prerequisites
**Container Runtimes:**
- **Docker Compose**: For testing and development only
- **Podman**: For production experiment execution
- Fetch ML installed and running
- API key (see [Security](security.md) and [API Key Process](api-key-process.md))
- Basic ML knowledge
## Experiment Workflow
### 1. Prepare Your ML Code
Create a simple Python script:
```python
# experiment.py
import argparse
import json
import sys
import time
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--lr', type=float, default=0.001)
parser.add_argument('--output', default='results.json')
args = parser.parse_args()
# Simulate training
results = {
'epochs': args.epochs,
'learning_rate': args.lr,
'accuracy': 0.85 + (args.lr * 0.1),
'loss': 0.5 - (args.epochs * 0.01),
'training_time': args.epochs * 0.1
}
# Save results
with open(args.output, 'w') as f:
json.dump(results, f, indent=2)
print(f"Training completed: {results}")
return results
if __name__ == '__main__':
main()
```
### 2. Submit Job via API
```bash
# Submit experiment
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "first-experiment",
"args": "--epochs 20 --lr 0.01 --output experiment_results.json",
"priority": 1,
"metadata": {
"experiment_type": "training",
"dataset": "sample_data"
}
}'
```
### 3. Monitor Progress
```bash
# Check job status
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment
# List all jobs
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs
# Get job metrics
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment/metrics
```
### 4. Use CLI
```bash
# Submit with CLI
cd cli && zig build --release=fast
./cli/zig-out/bin/ml submit \
--name "cli-experiment" \
--args "--epochs 15 --lr 0.005" \
--server http://localhost:8080
# Monitor with CLI
./cli/zig-out/bin/ml list-jobs --server http://localhost:8080
./cli/zig-out/bin/ml job-status cli-experiment --server http://localhost:8080
```
## Advanced Experiment
### Hyperparameter Tuning
```bash
# Submit multiple experiments
for lr in 0.001 0.01 0.1; do
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d "{
\"job_name\": \"tune-lr-$lr\",
\"args\": \"--epochs 10 --lr $lr\",
\"metadata\": {\"learning_rate\": $lr}
}"
done
```
### Batch Processing
```bash
# Submit batch job
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "batch-processing",
"args": "--input data/ --output results/ --batch-size 32",
"priority": 2,
"datasets": ["training_data", "validation_data"]
}'
```
## Results and Output
### Access Results
```bash
# Download results
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment/results
# View job details
curl -H "X-API-Key: your-api-key" \
http://localhost:8080/api/v1/jobs/first-experiment | jq .
```
### Result Format
```json
{
"job_id": "first-experiment",
"status": "completed",
"results": {
"epochs": 20,
"learning_rate": 0.01,
"accuracy": 0.86,
"loss": 0.3,
"training_time": 2.0
},
"metrics": {
"gpu_utilization": "85%",
"memory_usage": "2GB",
"execution_time": "120s"
}
}
```
## Best Practices
### Job Naming
- Use descriptive names: `model-training-v2`, `data-preprocessing`
- Include version numbers: `experiment-v1`, `experiment-v2`
- Add timestamps: `daily-batch-2024-01-15`
### Metadata Usage
```json
{
"metadata": {
"experiment_type": "training",
"model_version": "v2.1",
"dataset": "imagenet-2024",
"environment": "gpu",
"team": "ml-team"
}
}
```
### Error Handling
```bash
# Check failed jobs
curl -H "X-API-Key: your-api-key" \
"http://localhost:8080/api/v1/jobs?status=failed"
# Retry failed job
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "retry-experiment",
"args": "--epochs 20 --lr 0.01",
"metadata": {"retry_of": "first-experiment"}
}'
```
## Related Documentation
- [Quick Start](quick-start.md) - Local development environment and dev stack
- [Testing Guide](testing.md) - Test your experiments
- [Deployment](deployment.md) - Scale to production
- [Performance Monitoring](performance-monitoring.md) - Track experiment performance
## Troubleshooting
**Job stuck in pending?**
- Check worker status: `curl http://localhost:8080/api/v1/workers`
- Verify resources: `docker stats`
- Check logs: `docker logs ml-experiments-api`
**Job failed?**
- Check error message: `curl /api/v1/jobs/job-id`
- Review job arguments
- Verify input data
**No results?**
- Check job completion status
- Verify output file paths
- Check storage permissions