- Add complete API documentation and architecture guides - Include quick start, installation, and deployment guides - Add troubleshooting and security documentation - Include CLI reference and configuration schema docs - Add production monitoring and operations guides - Implement MkDocs configuration with search functionality - Include comprehensive user and developer documentation Provides complete documentation for users and developers covering all aspects of the FetchML platform.
5.2 KiB
5.2 KiB
First Experiment
Run your first machine learning experiment with Fetch ML.
Prerequisites
Container Runtimes:
-
Docker Compose: For testing and development only
-
Podman: For production experiment execution
-
Fetch ML installed and running
-
API key (see Security and API Key Process)
-
Basic ML knowledge
Experiment Workflow
1. Prepare Your ML Code
Create a simple Python script:
# experiment.py
import argparse
import json
import sys
import time
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--lr', type=float, default=0.001)
parser.add_argument('--output', default='results.json')
args = parser.parse_args()
# Simulate training
results = {
'epochs': args.epochs,
'learning_rate': args.lr,
'accuracy': 0.85 + (args.lr * 0.1),
'loss': 0.5 - (args.epochs * 0.01),
'training_time': args.epochs * 0.1
}
# Save results
with open(args.output, 'w') as f:
json.dump(results, f, indent=2)
print(f"Training completed: {results}")
return results
if __name__ == '__main__':
main()
2. Submit Job via API
# Submit experiment
curl -X POST http://localhost:9101/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "first-experiment",
"args": "--epochs 20 --lr 0.01 --output experiment_results.json",
"priority": 1,
"metadata": {
"experiment_type": "training",
"dataset": "sample_data"
}
}'
3. Monitor Progress
# Check job status
curl -H "X-API-Key: your-api-key" \
http://localhost:9101/api/v1/jobs/first-experiment
# List all jobs
curl -H "X-API-Key: your-api-key" \
http://localhost:9101/api/v1/jobs
# Get job metrics
curl -H "X-API-Key: your-api-key" \
http://localhost:9101/api/v1/jobs/first-experiment/metrics
4. Use CLI
# Submit with CLI
cd cli && zig build dev
./cli/zig-out/dev/ml submit \
--name "cli-experiment" \
--args "--epochs 15 --lr 0.005" \
--server http://localhost:9101
# Monitor with CLI
./cli/zig-out/dev/ml list-jobs --server http://localhost:9101
./cli/zig-out/dev/ml job-status cli-experiment --server http://localhost:9101
Advanced Experiment
Hyperparameter Tuning
# Submit multiple experiments
for lr in 0.001 0.01 0.1; do
curl -X POST http://localhost:9101/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d "{
\"job_name\": \"tune-lr-$lr\",
\"args\": \"--epochs 10 --lr $lr\",
\"metadata\": {\"learning_rate\": $lr}
}"
done
Batch Processing
# Submit batch job
curl -X POST http://localhost:9101/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "batch-processing",
"args": "--input data/ --output results/ --batch-size 32",
"priority": 2,
"datasets": ["training_data", "validation_data"]
}'
Results and Output
Access Results
# Download results
curl -H "X-API-Key: your-api-key" \
http://localhost:9101/api/v1/jobs/first-experiment/results
# View job details
curl -H "X-API-Key: your-api-key" \
http://localhost:9101/api/v1/jobs/first-experiment | jq .
Result Format
{
"job_id": "first-experiment",
"status": "completed",
"results": {
"epochs": 20,
"learning_rate": 0.01,
"accuracy": 0.86,
"loss": 0.3,
"training_time": 2.0
},
"metrics": {
"gpu_utilization": "85%",
"memory_usage": "2GB",
"execution_time": "120s"
}
}
Best Practices
Job Naming
- Use descriptive names:
model-training-v2,data-preprocessing - Include version numbers:
experiment-v1,experiment-v2 - Add timestamps:
daily-batch-2024-01-15
Metadata Usage
{
"metadata": {
"experiment_type": "training",
"model_version": "v2.1",
"dataset": "imagenet-2024",
"environment": "gpu",
"team": "ml-team"
}
}
Error Handling
# Check failed jobs
curl -H "X-API-Key: your-api-key" \
"http://localhost:9101/api/v1/jobs?status=failed"
# Retry failed job
curl -X POST http://localhost:9101/api/v1/jobs \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"job_name": "retry-experiment",
"args": "--epochs 20 --lr 0.01",
"metadata": {"retry_of": "first-experiment"}
}'
## Related Documentation
- [Development Setup (see Development Setup)](development-setup.md) - Local development environment
- [Testing Guide (see Testing Guide)](testing.md) - Test your experiments
- [Production Deployment (see Deployment)](deployment.md) - Scale to production
- Monitoring - Track experiment performance
Troubleshooting
Job stuck in pending?
- Check worker status:
curl /api/v1/workers - Verify resources:
docker stats - Check logs:
docker-compose logs api-server
Job failed?
- Check error message:
curl /api/v1/jobs/job-id - Review job arguments
- Verify input data
No results?
- Check job completion status
- Verify output file paths
- Check storage permissions