Jeremie Fraeys 385d2cf386 docs: add comprehensive documentation with MkDocs site

- Add complete API documentation and architecture guides
- Include quick start, installation, and deployment guides
- Add troubleshooting and security documentation
- Include CLI reference and configuration schema docs
- Add production monitoring and operations guides
- Implement MkDocs configuration with search functionality
- Include comprehensive user and developer documentation

Provides complete documentation for users and developers
covering all aspects of the FetchML platform.

2025-12-04 16:54:57 -05:00

5.2 KiB

Raw Blame History

First Experiment

Run your first machine learning experiment with Fetch ML.

Prerequisites

Container Runtimes:

Docker Compose: For testing and development only
Podman: For production experiment execution
Fetch ML installed and running
API key (see Security and API Key Process)
Basic ML knowledge

Experiment Workflow

1. Prepare Your ML Code

Create a simple Python script:

# experiment.py
import argparse
import json
import sys
import time

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--lr', type=float, default=0.001)
    parser.add_argument('--output', default='results.json')
    
    args = parser.parse_args()
    
    # Simulate training
    results = {
        'epochs': args.epochs,
        'learning_rate': args.lr,
        'accuracy': 0.85 + (args.lr * 0.1),
        'loss': 0.5 - (args.epochs * 0.01),
        'training_time': args.epochs * 0.1
    }
    
    # Save results
    with open(args.output, 'w') as f:
        json.dump(results, f, indent=2)
    
    print(f"Training completed: {results}")
    return results

if __name__ == '__main__':
    main()

2. Submit Job via API

# Submit experiment
curl -X POST http://localhost:9101/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "first-experiment",
    "args": "--epochs 20 --lr 0.01 --output experiment_results.json",
    "priority": 1,
    "metadata": {
      "experiment_type": "training",
      "dataset": "sample_data"
    }
  }'

3. Monitor Progress

# Check job status
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment

# List all jobs
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs

# Get job metrics
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment/metrics

4. Use CLI

# Submit with CLI
cd cli && zig build dev
./cli/zig-out/dev/ml submit \
  --name "cli-experiment" \
  --args "--epochs 15 --lr 0.005" \
  --server http://localhost:9101

# Monitor with CLI
./cli/zig-out/dev/ml list-jobs --server http://localhost:9101
./cli/zig-out/dev/ml job-status cli-experiment --server http://localhost:9101

Advanced Experiment

Hyperparameter Tuning

# Submit multiple experiments
for lr in 0.001 0.01 0.1; do
  curl -X POST http://localhost:9101/api/v1/jobs \
    -H "Content-Type: application/json" \
    -H "X-API-Key: your-api-key" \
    -d "{
      \"job_name\": \"tune-lr-$lr\",
      \"args\": \"--epochs 10 --lr $lr\",
      \"metadata\": {\"learning_rate\": $lr}
    }"
done

Batch Processing

# Submit batch job
curl -X POST http://localhost:9101/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "batch-processing",
    "args": "--input data/ --output results/ --batch-size 32",
    "priority": 2,
    "datasets": ["training_data", "validation_data"]
  }'

Results and Output

Access Results

# Download results
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment/results

# View job details
curl -H "X-API-Key: your-api-key" \
  http://localhost:9101/api/v1/jobs/first-experiment | jq .

Result Format

{
  "job_id": "first-experiment",
  "status": "completed",
  "results": {
    "epochs": 20,
    "learning_rate": 0.01,
    "accuracy": 0.86,
    "loss": 0.3,
    "training_time": 2.0
  },
  "metrics": {
    "gpu_utilization": "85%",
    "memory_usage": "2GB",
    "execution_time": "120s"
  }
}

Best Practices

Job Naming

Use descriptive names: model-training-v2, data-preprocessing
Include version numbers: experiment-v1, experiment-v2
Add timestamps: daily-batch-2024-01-15

Metadata Usage

{
  "metadata": {
    "experiment_type": "training",
    "model_version": "v2.1",
    "dataset": "imagenet-2024",
    "environment": "gpu",
    "team": "ml-team"
  }
}

Error Handling

# Check failed jobs
curl -H "X-API-Key: your-api-key" \
  "http://localhost:9101/api/v1/jobs?status=failed"

# Retry failed job
curl -X POST http://localhost:9101/api/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "job_name": "retry-experiment",
    "args": "--epochs 20 --lr 0.01",
    "metadata": {"retry_of": "first-experiment"}
  }'

[Development Setup (see Development Setup)](development-setup.md) - Local development environment
[Testing Guide (see Testing Guide)](testing.md) - Test your experiments
[Production Deployment (see Deployment)](deployment.md) - Scale to production
Monitoring - Track experiment performance

Troubleshooting

Job stuck in pending?

Check worker status: curl /api/v1/workers
Verify resources: docker stats
Check logs: docker-compose logs api-server

Job failed?

Check error message: curl /api/v1/jobs/job-id
Review job arguments
Verify input data

No results?

Check job completion status
Verify output file paths
Check storage permissions

5.2 KiB Raw Blame History

First Experiment

Prerequisites

Experiment Workflow

1. Prepare Your ML Code

2. Submit Job via API

3. Monitor Progress

4. Use CLI

Advanced Experiment

Hyperparameter Tuning

Batch Processing

Results and Output

Access Results

Result Format

Best Practices

Job Naming

Metadata Usage

Error Handling

## Related Documentation

Troubleshooting

5.2 KiB

Raw Blame History