fetch_ml/docs/src/jupyter-experiment-integration.md
Jeremie Fraeys cd5640ebd2 Slim and secure: move scripts, clean configs, remove secrets
- Move ci-test.sh and setup.sh to scripts/
- Trim docs/src/zig-cli.md to current structure
- Replace hardcoded secrets with placeholders in configs
- Update .gitignore to block .env*, secrets/, keys, build artifacts
- Slim README.md to reflect current CLI/TUI split
- Add cleanup trap to ci-test.sh
- Ensure no secrets are committed
2025-12-07 13:57:51 -05:00

10 KiB

Jupyter Workspace and Experiment Integration

This guide describes the integration between Jupyter workspaces and FetchML experiments, enabling seamless resource chaining and data synchronization.

Overview

The Jupyter-experiment integration allows you to:

  • Link Jupyter workspaces with specific experiments
  • Automatically track experiment metadata in workspaces
  • Queue experiments directly from Jupyter workspaces
  • Synchronize data between workspaces and experiments
  • Maintain resource sharing and context across development and production workflows

Architecture

Components

  1. Workspace Metadata Manager - Tracks relationships between workspaces and experiments
  2. Service Manager Integration - Links Jupyter services with experiment context
  3. CLI Commands - Provides user-facing integration commands
  4. API Endpoints - Enables programmatic workspace-experiment management

Data Flow

Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
        ↓                      ↓                    ↓
   Notebooks            Link Metadata      Experiment Data
   Scripts              Sync History       Metrics & Results
   Requirements         Auto-sync Config   Job Queue

Quick Start

1. Create a Jupyter workspace

# Create a new workspace
mkdir my_experiment_workspace
cd my_experiment_workspace

# Add notebooks and scripts
# (See examples/jupyter_experiment_integration.py for sample setup)

2. Start Jupyter service

# Start Jupyter with workspace
ml jupyter start --workspace ./my_experiment_workspace --name my_experiment

# Access at http://localhost:8888
# Create experiment
ml experiment create --name "my_experiment" --description "Test experiment"

# Link workspace with experiment
ml jupyter experiment link --workspace ./my_experiment_workspace --experiment <experiment_id>

4. Work in Jupyter

  • Open notebooks in browser
  • Develop and test code interactively
  • Use MLflow for experiment tracking
  • Save results and models

5. Queue for production

# Queue experiment from workspace
ml jupyter experiment queue --workspace ./my_experiment_workspace --script experiment.py --name "production_run"

# Monitor progress
ml status
ml monitor

6. Sync data

# Push workspace changes to experiment
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction push

# Pull experiment results to workspace
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction pull

CLI Commands

Link a Jupyter workspace with an experiment.

ml jupyter experiment link --workspace <path> --experiment <id>

Options:

  • --workspace: Path to Jupyter workspace (default: ./workspace)
  • --experiment: Experiment ID to link with

Creates:

  • .jupyter_experiment.json metadata file in workspace
  • Link record in workspace metadata manager
  • Association between workspace and experiment

ml jupyter experiment queue

Queue an experiment from a linked workspace.

ml jupyter experiment queue --workspace <path> --script <file> --name <name>

Options:

  • --workspace: Path to workspace (default: ./workspace)
  • --script: Python script to execute
  • --name: Name for the queued job

Behavior:

  • Detects linked experiment automatically
  • Passes experiment context to job queue
  • Uses workspace resources and configuration

ml jupyter experiment sync

Synchronize data between workspace and experiment.

ml jupyter experiment sync --workspace <path> --direction <pull|push>

Options:

  • --workspace: Path to workspace (default: ./workspace)
  • --direction: Sync direction (pull or push)

Sync Types:

  • Pull: Download experiment metrics, results, and data to workspace
  • Push: Upload workspace notebooks, scripts, and results to experiment

ml jupyter experiment status

Show experiment status for a workspace.

ml jupyter experiment status [workspace_path]

Displays:

  • Linked experiment information
  • Last sync time
  • Experiment metadata
  • Service association

API Endpoints

Method: POST

Link a workspace with an experiment.

{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "service_id": "jupyter-service-456"
}

Response:

{
  "status": "linked",
  "data": {
    "workspace_path": "/path/to/workspace",
    "experiment_id": "experiment_123",
    "linked_at": "2023-12-06T10:30:00Z",
    "sync_direction": "bidirectional"
  }
}

/api/jupyter/experiments/sync

Method: POST

Synchronize workspace with experiment.

{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "direction": "push",
  "sync_type": "all"
}

Response:

{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "direction": "push",
  "sync_type": "all",
  "synced_at": "2023-12-06T10:35:00Z",
  "status": "completed"
}

/api/jupyter/services

Methods: GET, POST, DELETE

Manage Jupyter services.

GET: List all services POST: Start new service DELETE: Stop service

Workspace Metadata

.jupyter_experiment.json

Each linked workspace contains a metadata file:

{
  "experiment_id": "experiment_123",
  "service_id": "jupyter-service-456",
  "linked_at": 1701864600,
  "last_sync": 1701865200,
  "sync_direction": "bidirectional",
  "auto_sync": false,
  "jupyter_integration": true,
  "workspace_path": "/path/to/workspace",
  "tags": ["development", "ml-experiment"]
}

Metadata Manager

The workspace metadata manager maintains:

  • Workspace-experiment relationships
  • Sync history and timestamps
  • Auto-sync configuration
  • Tags and additional metadata
  • Service associations

Best Practices

Workspace Organization

  1. One workspace per experiment - Keep workspaces focused on specific experiments
  2. Use descriptive names - Name workspaces and services clearly
  3. Version control - Track workspace changes with git
  4. Clean separation - Separate data, code, and results

Experiment Development Workflow

  1. Create workspace with notebooks and scripts
  2. Link with experiment for tracking
  3. Develop interactively in Jupyter
  4. Test locally with sample data
  5. Queue for production when ready
  6. Monitor results and iterate

Data Management

  1. Use requirements.txt for dependencies
  2. Store data separately from notebooks
  3. Use MLflow for experiment tracking
  4. Sync regularly to preserve work
  5. Clean up old workspaces

Resource Management

  1. Monitor service usage with ml jupyter list
  2. Stop unused services with ml jupyter stop
  3. Use resource limits in configuration
  4. Enable auto-sync for automated workflows

Troubleshooting

Common Issues

Workspace not linked:

Error: No experiment link found in workspace

Solution: Run ml jupyter experiment link first

Service not found:

Error: Service not found

Solution: Check service name with ml jupyter list

Sync failed:

Error: Failed to sync workspace

Solution: Check workspace permissions and experiment exists

Debug Commands

# Check workspace metadata
cat ./workspace/.jupyter_experiment.json

# List all services
ml jupyter list

# Check experiment status
ml jupyter experiment status

# View service logs
podman logs <service_id>

Recovery

Lost workspace link:

  1. Find experiment ID with ml experiment list
  2. Re-link with ml jupyter experiment link
  3. Sync data with ml jupyter experiment sync --direction pull

Service stuck:

  1. Stop with ml jupyter stop <service_name>
  2. Check logs for errors
  3. Restart with ml jupyter start

Examples

Complete Workflow

# 1. Setup workspace
mkdir my_ml_project
cd my_ml_project
echo "numpy>=1.20.0" > requirements.txt
echo "mlflow>=1.20.0" >> requirements.txt

# 2. Start Jupyter
ml jupyter start --workspace . --name my_project

# 3. Create experiment
ml experiment create --name "my_project" --description "ML project experiment"

# 4. Link workspace
ml jupyter experiment link --workspace . --experiment <experiment_id>

# 5. Work in Jupyter (browser)
# - Create notebooks
# - Write experiment scripts
# - Test locally

# 6. Queue for production
ml jupyter experiment queue --workspace . --script train_model.py --name "production_run"

# 7. Monitor
ml status
ml monitor

# 8. Sync results
ml jupyter experiment sync --workspace . --direction pull

# 9. Cleanup
ml jupyter stop my_project

Python Integration

import requests
import json

# Link workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/link', json={
    'workspace': '/path/to/workspace',
    'experiment_id': 'experiment_123'
})

# Sync workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/sync', json={
    'workspace': '/path/to/workspace',
    'experiment_id': 'experiment_123',
    'direction': 'push',
    'sync_type': 'all'
})

Configuration

Service Configuration

Jupyter services can be configured with experiment-specific settings:

service:
  default_resources:
    memory_limit: "8G"
    cpu_limit: "2"
    gpu_access: false
  max_services: 5
  auto_sync_interval: "30m"

Workspace Settings

Workspace metadata supports custom configuration:

{
  "auto_sync": true,
  "sync_interval": "15m",
  "sync_direction": "bidirectional",
  "tags": ["development", "production"],
  "additional_data": {
    "environment": "test",
    "team": "ml-team"
  }
}

Migration Guide

From Standalone Jupyter

  1. Create workspace from existing notebooks
  2. Link with experiment using new commands
  3. Update scripts to use experiment context
  4. Migrate data to experiment storage
  5. Update workflows to use integration

From Job Queue Only

  1. Create workspace for development
  2. Link with existing experiments
  3. Add interactive development phase
  4. Implement sync workflows
  5. Update CI/CD pipelines

Future Enhancements

Planned improvements:

  • Auto-sync with file watching
  • Workspace templates
  • Collaborative workspaces
  • Advanced resource sharing
  • Git integration
  • Docker compose support
  • Kubernetes integration
  • Advanced monitoring