# Jupyter Workspace and Experiment Integration This guide describes the integration between Jupyter workspaces and FetchML experiments, enabling seamless resource chaining and data synchronization. ## Overview The Jupyter-experiment integration allows you to: - Link Jupyter workspaces with specific experiments - Automatically track experiment metadata in workspaces - Queue experiments directly from Jupyter workspaces - Synchronize data between workspaces and experiments - Maintain resource sharing and context across development and production workflows ## Architecture ### Components 1. **Workspace Metadata Manager** - Tracks relationships between workspaces and experiments 2. **Service Manager Integration** - Links Jupyter services with experiment context 3. **CLI Commands** - Provides user-facing integration commands 4. **API Endpoints** - Enables programmatic workspace-experiment management ### Data Flow ``` Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager ↓ ↓ ↓ Notebooks Link Metadata Experiment Data Scripts Sync History Metrics & Results Requirements Auto-sync Config Job Queue ``` ## Quick Start ### 1. Create a Jupyter workspace ```bash # Create a new workspace mkdir my_experiment_workspace cd my_experiment_workspace # Add notebooks and scripts # (See examples/jupyter_experiment_integration.py for sample setup) ``` ### 2. Start Jupyter service ```bash # Start Jupyter with workspace ml jupyter start --workspace ./my_experiment_workspace --name my_experiment # Access at http://localhost:8888 ``` ### 3. Link workspace with experiment ```bash # Create experiment ml experiment create --name "my_experiment" --description "Test experiment" # Link workspace with experiment ml jupyter experiment link --workspace ./my_experiment_workspace --experiment ``` ### 4. Work in Jupyter - Open notebooks in browser - Develop and test code interactively - Use MLflow for experiment tracking - Save results and models ### 5. Queue for production ```bash # Queue experiment from workspace ml jupyter experiment queue --workspace ./my_experiment_workspace --script experiment.py --name "production_run" # Monitor progress ml status ml monitor ``` ### 6. Sync data ```bash # Push workspace changes to experiment ml jupyter experiment sync --workspace ./my_experiment_workspace --direction push # Pull experiment results to workspace ml jupyter experiment sync --workspace ./my_experiment_workspace --direction pull ``` ## CLI Commands ### `ml jupyter experiment link` Link a Jupyter workspace with an experiment. ```bash ml jupyter experiment link --workspace --experiment ``` **Options:** - `--workspace`: Path to Jupyter workspace (default: ./workspace) - `--experiment`: Experiment ID to link with **Creates:** - `.jupyter_experiment.json` metadata file in workspace - Link record in workspace metadata manager - Association between workspace and experiment ### `ml jupyter experiment queue` Queue an experiment from a linked workspace. ```bash ml jupyter experiment queue --workspace --script --name ``` **Options:** - `--workspace`: Path to workspace (default: ./workspace) - `--script`: Python script to execute - `--name`: Name for the queued job **Behavior:** - Detects linked experiment automatically - Passes experiment context to job queue - Uses workspace resources and configuration ### `ml jupyter experiment sync` Synchronize data between workspace and experiment. ```bash ml jupyter experiment sync --workspace --direction ``` **Options:** - `--workspace`: Path to workspace (default: ./workspace) - `--direction`: Sync direction (pull or push) **Sync Types:** - **Pull**: Download experiment metrics, results, and data to workspace - **Push**: Upload workspace notebooks, scripts, and results to experiment ### `ml jupyter experiment status` Show experiment status for a workspace. ```bash ml jupyter experiment status [workspace_path] ``` **Displays:** - Linked experiment information - Last sync time - Experiment metadata - Service association ## API Endpoints ### `/api/jupyter/experiments/link` **Method:** POST Link a workspace with an experiment. ```json { "workspace": "/path/to/workspace", "experiment_id": "experiment_123", "service_id": "jupyter-service-456" } ``` **Response:** ```json { "status": "linked", "data": { "workspace_path": "/path/to/workspace", "experiment_id": "experiment_123", "linked_at": "2023-12-06T10:30:00Z", "sync_direction": "bidirectional" } } ``` ### `/api/jupyter/experiments/sync` **Method:** POST Synchronize workspace with experiment. ```json { "workspace": "/path/to/workspace", "experiment_id": "experiment_123", "direction": "push", "sync_type": "all" } ``` **Response:** ```json { "workspace": "/path/to/workspace", "experiment_id": "experiment_123", "direction": "push", "sync_type": "all", "synced_at": "2023-12-06T10:35:00Z", "status": "completed" } ``` ### `/api/jupyter/services` **Methods:** GET, POST, DELETE Manage Jupyter services. **GET:** List all services **POST:** Start new service **DELETE:** Stop service ## Workspace Metadata ### `.jupyter_experiment.json` Each linked workspace contains a metadata file: ```json { "experiment_id": "experiment_123", "service_id": "jupyter-service-456", "linked_at": 1701864600, "last_sync": 1701865200, "sync_direction": "bidirectional", "auto_sync": false, "jupyter_integration": true, "workspace_path": "/path/to/workspace", "tags": ["development", "ml-experiment"] } ``` ### Metadata Manager The workspace metadata manager maintains: - Workspace-experiment relationships - Sync history and timestamps - Auto-sync configuration - Tags and additional metadata - Service associations ## Best Practices ### Workspace Organization 1. **One workspace per experiment** - Keep workspaces focused on specific experiments 2. **Use descriptive names** - Name workspaces and services clearly 3. **Version control** - Track workspace changes with git 4. **Clean separation** - Separate data, code, and results ### Experiment Development Workflow 1. **Create workspace** with notebooks and scripts 2. **Link with experiment** for tracking 3. **Develop interactively** in Jupyter 4. **Test locally** with sample data 5. **Queue for production** when ready 6. **Monitor results** and iterate ### Data Management 1. **Use requirements.txt** for dependencies 2. **Store data separately** from notebooks 3. **Use MLflow** for experiment tracking 4. **Sync regularly** to preserve work 5. **Clean up** old workspaces ### Resource Management 1. **Monitor service usage** with `ml jupyter list` 2. **Stop unused services** with `ml jupyter stop` 3. **Use resource limits** in configuration 4. **Enable auto-sync** for automated workflows ## Troubleshooting ### Common Issues **Workspace not linked:** ```bash Error: No experiment link found in workspace ``` **Solution:** Run `ml jupyter experiment link` first **Service not found:** ```bash Error: Service not found ``` **Solution:** Check service name with `ml jupyter list` **Sync failed:** ```bash Error: Failed to sync workspace ``` **Solution:** Check workspace permissions and experiment exists ### Debug Commands ```bash # Check workspace metadata cat ./workspace/.jupyter_experiment.json # List all services ml jupyter list # Check experiment status ml jupyter experiment status # View service logs podman logs ``` ### Recovery **Lost workspace link:** 1. Find experiment ID with `ml experiment list` 2. Re-link with `ml jupyter experiment link` 3. Sync data with `ml jupyter experiment sync --direction pull` **Service stuck:** 1. Stop with `ml jupyter stop ` 2. Check logs for errors 3. Restart with `ml jupyter start` ## Examples ### Complete Workflow ```bash # 1. Setup workspace mkdir my_ml_project cd my_ml_project echo "numpy>=1.20.0" > requirements.txt echo "mlflow>=1.20.0" >> requirements.txt # 2. Start Jupyter ml jupyter start --workspace . --name my_project # 3. Create experiment ml experiment create --name "my_project" --description "ML project experiment" # 4. Link workspace ml jupyter experiment link --workspace . --experiment # 5. Work in Jupyter (browser) # - Create notebooks # - Write experiment scripts # - Test locally # 6. Queue for production ml jupyter experiment queue --workspace . --script train_model.py --name "production_run" # 7. Monitor ml status ml monitor # 8. Sync results ml jupyter experiment sync --workspace . --direction pull # 9. Cleanup ml jupyter stop my_project ``` ### Python Integration ```python import requests import json # Link workspace response = requests.post('http://localhost:9101/api/jupyter/experiments/link', json={ 'workspace': '/path/to/workspace', 'experiment_id': 'experiment_123' }) # Sync workspace response = requests.post('http://localhost:9101/api/jupyter/experiments/sync', json={ 'workspace': '/path/to/workspace', 'experiment_id': 'experiment_123', 'direction': 'push', 'sync_type': 'all' }) ``` ## Configuration ### Service Configuration Jupyter services can be configured with experiment-specific settings: ```yaml service: default_resources: memory_limit: "8G" cpu_limit: "2" gpu_access: false max_services: 5 auto_sync_interval: "30m" ``` ### Workspace Settings Workspace metadata supports custom configuration: ```json { "auto_sync": true, "sync_interval": "15m", "sync_direction": "bidirectional", "tags": ["development", "production"], "additional_data": { "environment": "test", "team": "ml-team" } } ``` ## Migration Guide ### From Standalone Jupyter 1. **Create workspace** from existing notebooks 2. **Link with experiment** using new commands 3. **Update scripts** to use experiment context 4. **Migrate data** to experiment storage 5. **Update workflows** to use integration ### From Job Queue Only 1. **Create workspace** for development 2. **Link with existing experiments** 3. **Add interactive development** phase 4. **Implement sync workflows** 5. **Update CI/CD pipelines** ## Future Enhancements Planned improvements: - **Auto-sync with file watching** - **Workspace templates** - **Collaborative workspaces** - **Advanced resource sharing** - **Git integration** - **Docker compose support** - **Kubernetes integration** - **Advanced monitoring**