- Move ci-test.sh and setup.sh to scripts/ - Trim docs/src/zig-cli.md to current structure - Replace hardcoded secrets with placeholders in configs - Update .gitignore to block .env*, secrets/, keys, build artifacts - Slim README.md to reflect current CLI/TUI split - Add cleanup trap to ci-test.sh - Ensure no secrets are committed
10 KiB
Jupyter Workspace and Experiment Integration
This guide describes the integration between Jupyter workspaces and FetchML experiments, enabling seamless resource chaining and data synchronization.
Overview
The Jupyter-experiment integration allows you to:
- Link Jupyter workspaces with specific experiments
- Automatically track experiment metadata in workspaces
- Queue experiments directly from Jupyter workspaces
- Synchronize data between workspaces and experiments
- Maintain resource sharing and context across development and production workflows
Architecture
Components
- Workspace Metadata Manager - Tracks relationships between workspaces and experiments
- Service Manager Integration - Links Jupyter services with experiment context
- CLI Commands - Provides user-facing integration commands
- API Endpoints - Enables programmatic workspace-experiment management
Data Flow
Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
↓ ↓ ↓
Notebooks Link Metadata Experiment Data
Scripts Sync History Metrics & Results
Requirements Auto-sync Config Job Queue
Quick Start
1. Create a Jupyter workspace
# Create a new workspace
mkdir my_experiment_workspace
cd my_experiment_workspace
# Add notebooks and scripts
# (See examples/jupyter_experiment_integration.py for sample setup)
2. Start Jupyter service
# Start Jupyter with workspace
ml jupyter start --workspace ./my_experiment_workspace --name my_experiment
# Access at http://localhost:8888
3. Link workspace with experiment
# Create experiment
ml experiment create --name "my_experiment" --description "Test experiment"
# Link workspace with experiment
ml jupyter experiment link --workspace ./my_experiment_workspace --experiment <experiment_id>
4. Work in Jupyter
- Open notebooks in browser
- Develop and test code interactively
- Use MLflow for experiment tracking
- Save results and models
5. Queue for production
# Queue experiment from workspace
ml jupyter experiment queue --workspace ./my_experiment_workspace --script experiment.py --name "production_run"
# Monitor progress
ml status
ml monitor
6. Sync data
# Push workspace changes to experiment
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction push
# Pull experiment results to workspace
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction pull
CLI Commands
ml jupyter experiment link
Link a Jupyter workspace with an experiment.
ml jupyter experiment link --workspace <path> --experiment <id>
Options:
--workspace: Path to Jupyter workspace (default: ./workspace)--experiment: Experiment ID to link with
Creates:
.jupyter_experiment.jsonmetadata file in workspace- Link record in workspace metadata manager
- Association between workspace and experiment
ml jupyter experiment queue
Queue an experiment from a linked workspace.
ml jupyter experiment queue --workspace <path> --script <file> --name <name>
Options:
--workspace: Path to workspace (default: ./workspace)--script: Python script to execute--name: Name for the queued job
Behavior:
- Detects linked experiment automatically
- Passes experiment context to job queue
- Uses workspace resources and configuration
ml jupyter experiment sync
Synchronize data between workspace and experiment.
ml jupyter experiment sync --workspace <path> --direction <pull|push>
Options:
--workspace: Path to workspace (default: ./workspace)--direction: Sync direction (pull or push)
Sync Types:
- Pull: Download experiment metrics, results, and data to workspace
- Push: Upload workspace notebooks, scripts, and results to experiment
ml jupyter experiment status
Show experiment status for a workspace.
ml jupyter experiment status [workspace_path]
Displays:
- Linked experiment information
- Last sync time
- Experiment metadata
- Service association
API Endpoints
/api/jupyter/experiments/link
Method: POST
Link a workspace with an experiment.
{
"workspace": "/path/to/workspace",
"experiment_id": "experiment_123",
"service_id": "jupyter-service-456"
}
Response:
{
"status": "linked",
"data": {
"workspace_path": "/path/to/workspace",
"experiment_id": "experiment_123",
"linked_at": "2023-12-06T10:30:00Z",
"sync_direction": "bidirectional"
}
}
/api/jupyter/experiments/sync
Method: POST
Synchronize workspace with experiment.
{
"workspace": "/path/to/workspace",
"experiment_id": "experiment_123",
"direction": "push",
"sync_type": "all"
}
Response:
{
"workspace": "/path/to/workspace",
"experiment_id": "experiment_123",
"direction": "push",
"sync_type": "all",
"synced_at": "2023-12-06T10:35:00Z",
"status": "completed"
}
/api/jupyter/services
Methods: GET, POST, DELETE
Manage Jupyter services.
GET: List all services POST: Start new service DELETE: Stop service
Workspace Metadata
.jupyter_experiment.json
Each linked workspace contains a metadata file:
{
"experiment_id": "experiment_123",
"service_id": "jupyter-service-456",
"linked_at": 1701864600,
"last_sync": 1701865200,
"sync_direction": "bidirectional",
"auto_sync": false,
"jupyter_integration": true,
"workspace_path": "/path/to/workspace",
"tags": ["development", "ml-experiment"]
}
Metadata Manager
The workspace metadata manager maintains:
- Workspace-experiment relationships
- Sync history and timestamps
- Auto-sync configuration
- Tags and additional metadata
- Service associations
Best Practices
Workspace Organization
- One workspace per experiment - Keep workspaces focused on specific experiments
- Use descriptive names - Name workspaces and services clearly
- Version control - Track workspace changes with git
- Clean separation - Separate data, code, and results
Experiment Development Workflow
- Create workspace with notebooks and scripts
- Link with experiment for tracking
- Develop interactively in Jupyter
- Test locally with sample data
- Queue for production when ready
- Monitor results and iterate
Data Management
- Use requirements.txt for dependencies
- Store data separately from notebooks
- Use MLflow for experiment tracking
- Sync regularly to preserve work
- Clean up old workspaces
Resource Management
- Monitor service usage with
ml jupyter list - Stop unused services with
ml jupyter stop - Use resource limits in configuration
- Enable auto-sync for automated workflows
Troubleshooting
Common Issues
Workspace not linked:
Error: No experiment link found in workspace
Solution: Run ml jupyter experiment link first
Service not found:
Error: Service not found
Solution: Check service name with ml jupyter list
Sync failed:
Error: Failed to sync workspace
Solution: Check workspace permissions and experiment exists
Debug Commands
# Check workspace metadata
cat ./workspace/.jupyter_experiment.json
# List all services
ml jupyter list
# Check experiment status
ml jupyter experiment status
# View service logs
podman logs <service_id>
Recovery
Lost workspace link:
- Find experiment ID with
ml experiment list - Re-link with
ml jupyter experiment link - Sync data with
ml jupyter experiment sync --direction pull
Service stuck:
- Stop with
ml jupyter stop <service_name> - Check logs for errors
- Restart with
ml jupyter start
Examples
Complete Workflow
# 1. Setup workspace
mkdir my_ml_project
cd my_ml_project
echo "numpy>=1.20.0" > requirements.txt
echo "mlflow>=1.20.0" >> requirements.txt
# 2. Start Jupyter
ml jupyter start --workspace . --name my_project
# 3. Create experiment
ml experiment create --name "my_project" --description "ML project experiment"
# 4. Link workspace
ml jupyter experiment link --workspace . --experiment <experiment_id>
# 5. Work in Jupyter (browser)
# - Create notebooks
# - Write experiment scripts
# - Test locally
# 6. Queue for production
ml jupyter experiment queue --workspace . --script train_model.py --name "production_run"
# 7. Monitor
ml status
ml monitor
# 8. Sync results
ml jupyter experiment sync --workspace . --direction pull
# 9. Cleanup
ml jupyter stop my_project
Python Integration
import requests
import json
# Link workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/link', json={
'workspace': '/path/to/workspace',
'experiment_id': 'experiment_123'
})
# Sync workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/sync', json={
'workspace': '/path/to/workspace',
'experiment_id': 'experiment_123',
'direction': 'push',
'sync_type': 'all'
})
Configuration
Service Configuration
Jupyter services can be configured with experiment-specific settings:
service:
default_resources:
memory_limit: "8G"
cpu_limit: "2"
gpu_access: false
max_services: 5
auto_sync_interval: "30m"
Workspace Settings
Workspace metadata supports custom configuration:
{
"auto_sync": true,
"sync_interval": "15m",
"sync_direction": "bidirectional",
"tags": ["development", "production"],
"additional_data": {
"environment": "test",
"team": "ml-team"
}
}
Migration Guide
From Standalone Jupyter
- Create workspace from existing notebooks
- Link with experiment using new commands
- Update scripts to use experiment context
- Migrate data to experiment storage
- Update workflows to use integration
From Job Queue Only
- Create workspace for development
- Link with existing experiments
- Add interactive development phase
- Implement sync workflows
- Update CI/CD pipelines
Future Enhancements
Planned improvements:
- Auto-sync with file watching
- Workspace templates
- Collaborative workspaces
- Advanced resource sharing
- Git integration
- Docker compose support
- Kubernetes integration
- Advanced monitoring