Jeremie Fraeys cd5640ebd2 Slim and secure: move scripts, clean configs, remove secrets

- Move ci-test.sh and setup.sh to scripts/
- Trim docs/src/zig-cli.md to current structure
- Replace hardcoded secrets with placeholders in configs
- Update .gitignore to block .env*, secrets/, keys, build artifacts
- Slim README.md to reflect current CLI/TUI split
- Add cleanup trap to ci-test.sh
- Ensure no secrets are committed

2025-12-07 13:57:51 -05:00

10 KiB

Raw Blame History

Jupyter Workspace and Experiment Integration

This guide describes the integration between Jupyter workspaces and FetchML experiments, enabling seamless resource chaining and data synchronization.

Overview

The Jupyter-experiment integration allows you to:

Link Jupyter workspaces with specific experiments
Automatically track experiment metadata in workspaces
Queue experiments directly from Jupyter workspaces
Synchronize data between workspaces and experiments
Maintain resource sharing and context across development and production workflows

Architecture

Components

Workspace Metadata Manager - Tracks relationships between workspaces and experiments
Service Manager Integration - Links Jupyter services with experiment context
CLI Commands - Provides user-facing integration commands
API Endpoints - Enables programmatic workspace-experiment management

Data Flow

Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
        ↓                      ↓                    ↓
   Notebooks            Link Metadata      Experiment Data
   Scripts              Sync History       Metrics & Results
   Requirements         Auto-sync Config   Job Queue

Quick Start

1. Create a Jupyter workspace

# Create a new workspace
mkdir my_experiment_workspace
cd my_experiment_workspace

# Add notebooks and scripts
# (See examples/jupyter_experiment_integration.py for sample setup)

2. Start Jupyter service

# Start Jupyter with workspace
ml jupyter start --workspace ./my_experiment_workspace --name my_experiment

# Access at http://localhost:8888

3. Link workspace with experiment

# Create experiment
ml experiment create --name "my_experiment" --description "Test experiment"

# Link workspace with experiment
ml jupyter experiment link --workspace ./my_experiment_workspace --experiment <experiment_id>

4. Work in Jupyter

Open notebooks in browser
Develop and test code interactively
Use MLflow for experiment tracking
Save results and models

5. Queue for production

# Queue experiment from workspace
ml jupyter experiment queue --workspace ./my_experiment_workspace --script experiment.py --name "production_run"

# Monitor progress
ml status
ml monitor

6. Sync data

# Push workspace changes to experiment
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction push

# Pull experiment results to workspace
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction pull

CLI Commands

`ml jupyter experiment link`

Link a Jupyter workspace with an experiment.

ml jupyter experiment link --workspace <path> --experiment <id>

Options:

--workspace: Path to Jupyter workspace (default: ./workspace)
--experiment: Experiment ID to link with

Creates:

.jupyter_experiment.json metadata file in workspace
Link record in workspace metadata manager
Association between workspace and experiment

`ml jupyter experiment queue`

Queue an experiment from a linked workspace.

ml jupyter experiment queue --workspace <path> --script <file> --name <name>

Options:

--workspace: Path to workspace (default: ./workspace)
--script: Python script to execute
--name: Name for the queued job

Behavior:

Detects linked experiment automatically
Passes experiment context to job queue
Uses workspace resources and configuration

`ml jupyter experiment sync`

Synchronize data between workspace and experiment.

ml jupyter experiment sync --workspace <path> --direction <pull|push>

Options:

--workspace: Path to workspace (default: ./workspace)
--direction: Sync direction (pull or push)

Sync Types:

Pull: Download experiment metrics, results, and data to workspace
Push: Upload workspace notebooks, scripts, and results to experiment

`ml jupyter experiment status`

Show experiment status for a workspace.

ml jupyter experiment status [workspace_path]

Displays:

Linked experiment information
Last sync time
Experiment metadata
Service association

API Endpoints

`/api/jupyter/experiments/link`

Method: POST

Link a workspace with an experiment.

{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "service_id": "jupyter-service-456"
}

Response:

{
  "status": "linked",
  "data": {
    "workspace_path": "/path/to/workspace",
    "experiment_id": "experiment_123",
    "linked_at": "2023-12-06T10:30:00Z",
    "sync_direction": "bidirectional"
  }
}

`/api/jupyter/experiments/sync`

Method: POST

Synchronize workspace with experiment.

{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "direction": "push",
  "sync_type": "all"
}

Response:

{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "direction": "push",
  "sync_type": "all",
  "synced_at": "2023-12-06T10:35:00Z",
  "status": "completed"
}

`/api/jupyter/services`

Methods: GET, POST, DELETE

Manage Jupyter services.

GET: List all services POST: Start new service DELETE: Stop service

Workspace Metadata

`.jupyter_experiment.json`

Each linked workspace contains a metadata file:

{
  "experiment_id": "experiment_123",
  "service_id": "jupyter-service-456",
  "linked_at": 1701864600,
  "last_sync": 1701865200,
  "sync_direction": "bidirectional",
  "auto_sync": false,
  "jupyter_integration": true,
  "workspace_path": "/path/to/workspace",
  "tags": ["development", "ml-experiment"]
}

Metadata Manager

The workspace metadata manager maintains:

Workspace-experiment relationships
Sync history and timestamps
Auto-sync configuration
Tags and additional metadata
Service associations

Best Practices

Workspace Organization

One workspace per experiment - Keep workspaces focused on specific experiments
Use descriptive names - Name workspaces and services clearly
Version control - Track workspace changes with git
Clean separation - Separate data, code, and results

Experiment Development Workflow

Create workspace with notebooks and scripts
Link with experiment for tracking
Develop interactively in Jupyter
Test locally with sample data
Queue for production when ready
Monitor results and iterate

Data Management

Use requirements.txt for dependencies
Store data separately from notebooks
Use MLflow for experiment tracking
Sync regularly to preserve work
Clean up old workspaces

Resource Management

Monitor service usage with ml jupyter list
Stop unused services with ml jupyter stop
Use resource limits in configuration
Enable auto-sync for automated workflows

Troubleshooting

Common Issues

Workspace not linked:

Error: No experiment link found in workspace

Solution: Run ml jupyter experiment link first

Service not found:

Error: Service not found

Solution: Check service name with ml jupyter list

Sync failed:

Error: Failed to sync workspace

Solution: Check workspace permissions and experiment exists

Debug Commands

# Check workspace metadata
cat ./workspace/.jupyter_experiment.json

# List all services
ml jupyter list

# Check experiment status
ml jupyter experiment status

# View service logs
podman logs <service_id>

Recovery

Lost workspace link:

Find experiment ID with ml experiment list
Re-link with ml jupyter experiment link
Sync data with ml jupyter experiment sync --direction pull

Service stuck:

Stop with ml jupyter stop <service_name>
Check logs for errors
Restart with ml jupyter start

Examples

Complete Workflow

# 1. Setup workspace
mkdir my_ml_project
cd my_ml_project
echo "numpy>=1.20.0" > requirements.txt
echo "mlflow>=1.20.0" >> requirements.txt

# 2. Start Jupyter
ml jupyter start --workspace . --name my_project

# 3. Create experiment
ml experiment create --name "my_project" --description "ML project experiment"

# 4. Link workspace
ml jupyter experiment link --workspace . --experiment <experiment_id>

# 5. Work in Jupyter (browser)
# - Create notebooks
# - Write experiment scripts
# - Test locally

# 6. Queue for production
ml jupyter experiment queue --workspace . --script train_model.py --name "production_run"

# 7. Monitor
ml status
ml monitor

# 8. Sync results
ml jupyter experiment sync --workspace . --direction pull

# 9. Cleanup
ml jupyter stop my_project

Python Integration

import requests
import json

# Link workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/link', json={
    'workspace': '/path/to/workspace',
    'experiment_id': 'experiment_123'
})

# Sync workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/sync', json={
    'workspace': '/path/to/workspace',
    'experiment_id': 'experiment_123',
    'direction': 'push',
    'sync_type': 'all'
})

Configuration

Service Configuration

Jupyter services can be configured with experiment-specific settings:

service:
  default_resources:
    memory_limit: "8G"
    cpu_limit: "2"
    gpu_access: false
  max_services: 5
  auto_sync_interval: "30m"

Workspace Settings

Workspace metadata supports custom configuration:

{
  "auto_sync": true,
  "sync_interval": "15m",
  "sync_direction": "bidirectional",
  "tags": ["development", "production"],
  "additional_data": {
    "environment": "test",
    "team": "ml-team"
  }
}

Migration Guide

From Standalone Jupyter

Create workspace from existing notebooks
Link with experiment using new commands
Update scripts to use experiment context
Migrate data to experiment storage
Update workflows to use integration

From Job Queue Only

Create workspace for development
Link with existing experiments
Add interactive development phase
Implement sync workflows
Update CI/CD pipelines

Future Enhancements

Planned improvements:

Auto-sync with file watching
Workspace templates
Collaborative workspaces
Advanced resource sharing
Git integration
Docker compose support
Kubernetes integration
Advanced monitoring

10 KiB Raw Blame History

Jupyter Workspace and Experiment Integration

Overview

Architecture

Components

Data Flow

Quick Start

1. Create a Jupyter workspace

2. Start Jupyter service

3. Link workspace with experiment

4. Work in Jupyter

5. Queue for production

6. Sync data

CLI Commands

ml jupyter experiment link

ml jupyter experiment queue

ml jupyter experiment sync

ml jupyter experiment status

API Endpoints

/api/jupyter/experiments/link

/api/jupyter/experiments/sync

/api/jupyter/services

Workspace Metadata

.jupyter_experiment.json

Metadata Manager

Best Practices

Workspace Organization

Experiment Development Workflow

Data Management

Resource Management

Troubleshooting

Common Issues

Debug Commands

Recovery

Examples

Complete Workflow

Python Integration

Configuration

Service Configuration

Workspace Settings

Migration Guide

From Standalone Jupyter

From Job Queue Only

Future Enhancements

10 KiB

Raw Blame History

`ml jupyter experiment link`

`ml jupyter experiment queue`

`ml jupyter experiment sync`

`ml jupyter experiment status`

`/api/jupyter/experiments/link`

`/api/jupyter/experiments/sync`

`/api/jupyter/services`

`.jupyter_experiment.json`