fetch_ml/docs/src/first-experiment.md

# First Experiment

Run your first machine learning experiment with Fetch ML.

## Prerequisites

**Container Runtimes:**
- **Docker Compose**: For testing and development only
- **Podman**: For production experiment execution

- Fetch ML installed and running
- API key (see [Security](security.md) and [API Key Process](api-key-process.md))
- Basic ML knowledge

## Experiment Workflow

### 1. Prepare Your ML Code

Create a simple Python script:

```python
# experiment.py
import argparse
import json
import sys
import time

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--lr', type=float, default=0.001)
    parser.add_argument('--output', default='results.json')

    args = parser.parse_args()

    # Simulate training
    results = {
        'epochs': args.epochs,
        'learning_rate': args.lr,
        'accuracy': 0.85 + (args.lr * 0.1),
        'loss': 0.5 - (args.epochs * 0.01),
        'training_time': args.epochs * 0.1
    }

    # Save results
    with open(args.output, 'w') as f:
        json.dump(results, f, indent=2)

    print(f"Training completed: {results}")
    return results

if __name__ == '__main__':
    main()
```

### 2. Submit Job via API

```bash
# Submit experiment
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "first-experiment",
    "args": "--epochs 20 --lr 0.01 --output experiment_results.json",
    "priority": 1,
    "metadata": {
      "experiment_type": "training",
      "dataset": "sample_data"
    }
  }'
```

### 3. Monitor Progress

```bash
# Check job status
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment

# List all jobs
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs

# Get job metrics
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment/metrics
```

### 4. Use CLI

```bash
# Submit with CLI
cd cli && zig build --release=fast
./cli/zig-out/bin/ml submit \
  --name "cli-experiment" \
  --args "--epochs 15 --lr 0.005" \
  --server http://localhost:8080

# Monitor with CLI
./cli/zig-out/bin/ml list-jobs --server http://localhost:8080
./cli/zig-out/bin/ml job-status cli-experiment --server http://localhost:8080
```

## Advanced Experiment

### Hyperparameter Tuning

```bash
# Submit multiple experiments
for lr in 0.001 0.01 0.1; do
  curl -X POST http://localhost:8080/api/v1/jobs \
    -H "Content-Type: application/json" \
    -H "X-API-Key: your-api-key" \
    -d "{
      \"job_name\": \"tune-lr-$lr\",
      \"args\": \"--epochs 10 --lr $lr\",
      \"metadata\": {\"learning_rate\": $lr}
    }"
done
```

### Batch Processing

```bash
# Submit batch job
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "batch-processing",
    "args": "--input data/ --output results/ --batch-size 32",
    "priority": 2,
    "datasets": ["training_data", "validation_data"]
  }'
```

## Results and Output

### Access Results

```bash
# Download results
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment/results

# View job details
curl -H "X-API-Key: your-api-key" \
  http://localhost:8080/api/v1/jobs/first-experiment | jq .
```

### Result Format

```json
{
  "job_id": "first-experiment",
  "status": "completed",
  "results": {
    "epochs": 20,
    "learning_rate": 0.01,
    "accuracy": 0.86,
    "loss": 0.3,
    "training_time": 2.0
  },
  "metrics": {
    "gpu_utilization": "85%",
    "memory_usage": "2GB",
    "execution_time": "120s"
  }
}
```

## Best Practices

### Job Naming

- Use descriptive names: `model-training-v2`, `data-preprocessing`
- Include version numbers: `experiment-v1`, `experiment-v2`
- Add timestamps: `daily-batch-2024-01-15`

### Metadata Usage

```json
{
  "metadata": {
    "experiment_type": "training",
    "model_version": "v2.1",
    "dataset": "imagenet-2024",
    "environment": "gpu",
    "team": "ml-team"
  }
}
```

### Error Handling

```bash
# Check failed jobs
curl -H "X-API-Key: your-api-key" \
  "http://localhost:8080/api/v1/jobs?status=failed"

# Retry failed job
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "retry-experiment",
    "args": "--epochs 20 --lr 0.01",
    "metadata": {"retry_of": "first-experiment"}
  }'
```

## Related Documentation

- [Quick Start](quick-start.md) - Local development environment and dev stack
- [Testing Guide](testing.md) - Test your experiments
- [Deployment](deployment.md) - Scale to production
- [Performance Monitoring](performance-monitoring.md) - Track experiment performance

## Troubleshooting

**Job stuck in pending?**
- Check worker status: `curl http://localhost:8080/api/v1/workers`
- Verify resources: `docker stats`
- Check logs: `docker logs ml-experiments-api`

**Job failed?**
- Check error message: `curl /api/v1/jobs/job-id`
- Review job arguments
- Verify input data

**No results?**
- Check job completion status
- Verify output file paths
- Check storage permissions