# Jupyter Workflow Guide Comprehensive guide to Jupyter workspace management, experiment integration, and secure package management in FetchML. ## Overview The Jupyter workflow system provides: - **Workspace Management**: Isolated development environments - **Experiment Integration**: Seamless linking with ML experiments - **Package Management**: Secure package installation from trusted sources - **Resource Sharing**: Data and context synchronization - **Security Controls**: Approved channels and package filtering ## Quick Start ### Create Jupyter Workspace ```bash # Start development stack make dev-up # Create workspace ./cli/zig-out/bin/ml jupyter create my-workspace # Access workspace open http://localhost:8888 ``` ### Link with Experiment ```bash # Queue experiment from workspace ./cli/zig-out/bin/ml jupyter queue --workspace my-workspace --experiment my-experiment # Monitor progress ./cli/zig-out/bin/ml status ``` ## Workspace Management ### Creating Workspaces ```bash # Create new workspace ./cli/zig-out/bin/ml jupyter create workspace-name # Create with specific configuration ./cli/zig-out/bin/ml jupyter create workspace-name \ --cpu=4 \ --memory=8g \ --gpu=1 \ --image=jupyter/scipy-notebook:latest # List workspaces ./cli/zig-out/bin/ml jupyter list # Workspace details ./cli/zig-out/bin/ml jupyter info workspace-name ``` ### Workspace Configuration **Resource Allocation**: ```yaml # workspace-config.yaml resources: cpu: 4 memory: 8g gpu: 1 disk: 20g environment: python_version: "3.11" jupyter_version: "latest" security: trusted_channels: ["conda-forge", "defaults", "pytorch"] blocked_packages: ["aiohttp", "telnetlib"] ``` You can also override the blocked package list at runtime using an environment variable on the worker: ```bash export FETCHML_JUPYTER_BLOCKED_PACKAGES="aiohttp,telnetlib" ``` Some base images (including the default `quay.io/jupyter/base-notebook`) ship with common HTTP client libraries like `requests`, `urllib3`, and `httpx` preinstalled. If you want to **block installing** packages like `requests`, `urllib3`, and `httpx` for security reasons but still use a base image that already includes them, you can disable the **startup image scan** separately: ```bash # Block installs (user requests) export FETCHML_JUPYTER_BLOCKED_PACKAGES="requests,urllib3,httpx" # Allow base images that already contain these packages to start export FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES="off" ``` If you want startup scanning enabled, set `FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES` to a comma-separated list. ### Access Control ```bash # Set workspace permissions ./cli/zig-out/bin/ml jupyter access workspace-name \ --user=data-scientist \ --role=editor # Revoke access ./cli/zig-out/bin/ml jupyter revoke workspace-name data-scientist ``` ## Experiment Integration ### Architecture ``` Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager ↓ ↓ ↓ Notebooks Link Metadata Experiment Data Scripts Sync History Metrics & Results ``` ### Linking Workspaces and Experiments ```bash # Link existing workspace to experiment ./cli/zig-out/bin/ml jupyter link workspace-name experiment-id # Create workspace linked to new experiment ./cli/zig-out/bin/ml jupyter create workspace-name \ --experiment experiment-id # Queue experiment from workspace ./cli/zig-out/bin/ml jupyter queue \ --workspace workspace-name \ --config experiment-config.yaml ``` ### Data Synchronization **Automatic Sync**: - Notebook metadata - Experiment results - Configuration files - Resource usage metrics **Manual Sync**: ```bash # Sync workspace to experiment ./cli/zig-out/bin/ml jupyter sync workspace-name --to-experiment # Sync experiment to workspace ./cli/zig-out/bin/ml jupyter sync workspace-name --from-experiment # Force full sync ./cli/zig-out/bin/ml jupyter sync workspace-name --full ``` ### Workspace Metadata **Tracked Information**: - Workspace creation and modification dates - Linked experiment IDs - Resource usage history - Package installation records - Notebook execution history ```bash # View workspace metadata ./cli/zig-out/bin/ml jupyter metadata workspace-name # Export metadata ./cli/zig-out/bin/ml jupyter export workspace-name --format=json ``` ## Package Management ### Security Features **Trusted Channels** (default): - `conda-forge` - Community-maintained packages - `defaults` - Anaconda default packages - `pytorch` - PyTorch ecosystem packages - `nvidia` - NVIDIA GPU packages **Blocked Packages** (security): - `requests` - HTTP client library - `urllib3` - HTTP library - `socket` - Network sockets - `subprocess` - Process execution - `os.system` - System commands ### Package Installation **In Jupyter Notebook**: ```python # Install package (checks security) !pip install numpy pandas scikit-learn # Install from conda !conda install -c conda-forge matplotlib seaborn # Check package status !pip list ``` **From CLI**: ```bash # Install package in workspace ./cli/zig-out/bin/ml jupyter install workspace-name numpy # Install with version ./cli/zig-out/bin/ml jupyter install workspace-name "pandas==2.0.0" # Install from conda ./cli/zig-out/bin/ml jupyter install workspace-name matplotlib --conda # List installed packages ./cli/zig-out/bin/ml jupyter packages workspace-name ``` ### Package Approval Workflow **Optional Approval Process**: 1. **Request**: User requests package installation 2. **Review**: Admin reviews package security 3. **Approval**: Package added to allowlist 4. **Installation**: Package installed in workspace ```bash # Request package (requires approval) ./cli/zig-out/bin/ml jupyter request workspace-name custom-package # Review requests (admin) ./cli/zig-out/bin/ml jupyter review --pending # Approve request ./cli/zig-out/bin/ml jupyter approve request-id # Deny request ./cli/zig-out/bin/ml jupyter deny request-id --reason="Security concern" ``` ### Custom Channel Configuration ```yaml # workspace-security.yaml package_management: trusted_channels: - conda-forge - defaults - pytorch - nvidia - company-internal # Custom channel blocked_packages: - requests - urllib3 - socket - subprocess - os.system approval_required: - tensorflow - pytorch - custom-package allowlist: - numpy - pandas - scikit-learn - matplotlib ``` ## Security and Compliance ### Workspace Isolation **Network Isolation**: - Workspaces run in isolated networks - Controlled outbound internet access - Inter-workspace communication blocked **File System Isolation**: - Separate storage volumes per workspace - Controlled file access permissions - Automatic cleanup on workspace deletion ### Audit Trail **Tracked Activities**: - Package installations and removals - Notebook execution history - Data access patterns - Resource usage metrics - User access logs ```bash # View audit log ./cli/zig-out/bin/ml jupyter audit workspace-name # Export audit report ./cli/zig-out/bin/ml jupyter audit workspace-name --export=csv # Security scan ./cli/zig-out/bin/ml jupyter security-scan workspace-name ``` ### Compliance Features **Data Protection**: - Automatic data encryption - Secure data transfer protocols - GDPR compliance features - Data retention policies **Access Controls**: - Role-based permissions - Multi-factor authentication - Session timeout management - IP whitelisting ## Advanced Features ### Custom Images ```bash # Build custom workspace image ./cli/zig-out/bin/ml jupyter build custom-image \ --base=jupyter/scipy-notebook \ --packages="numpy pandas scikit-learn" \ --gpu-support # Use custom image ./cli/zig-out/bin/ml jupyter create workspace-name \ --image=custom-image ``` ### Workspace Templates ```yaml # data-science-template.yaml name: data-science-workspace resources: cpu: 8 memory: 16g gpu: 1 packages: - numpy - pandas - scikit-learn - matplotlib - seaborn - jupyterlab security: trusted_channels: ["conda-forge", "defaults"] approval_required: [] environment: PYTHONPATH: "/workspace" JUPYTER_ENABLE_LAB: "yes" ``` ```bash # Create from template ./cli/zig-out/bin/ml jupyter create workspace-name \ --template=data-science-template ``` ### Collaboration Features **Workspace Sharing**: ```bash # Share workspace with team ./cli/zig-out/bin/ml jupyter share workspace-name \ --team=data-science-team \ --role=collaborator # Collaborative notebooks # Multiple users can edit simultaneously # Real-time cursor tracking # Comment and review features ``` **Version Control**: ```bash # Git integration ./cli/zig-out/bin/ml jupyter git workspace-name init ./cli/zig-out/bin/ml jupyter git workspace-name add . ./cli/zig-out/bin/ml jupyter git workspace-name commit -m "Initial commit" # Notebook versioning ./cli/zig-out/bin/ml jupyter version workspace-name notebook.ipynb ``` ## Monitoring and Troubleshooting ### Performance Monitoring ```bash # Workspace resource usage ./cli/zig-out/bin/ml jupyter stats workspace-name # Real-time monitoring ./cli/zig-out/bin/ml jupyter monitor workspace-name # Performance report ./cli/zig-out/bin/ml jupyter report workspace-name --format=html ``` ### Common Issues **Package Installation Failures**: ```bash # Check package security ./cli/zig-out/bin/ml jupyter check-package package-name # Bypass security (admin only) ./cli/zig-out/bin/ml jupyter install workspace-name package-name --force # Clear package cache ./cli/zig-out/bin/ml jupyter clear-cache workspace-name ``` **Workspace Access Issues**: ```bash # Check workspace status ./cli/zig-out/bin/ml jupyter status workspace-name # Restart workspace ./cli/zig-out/bin/ml jupyter restart workspace-name # Reset workspace ./cli/zig-out/bin/ml jupyter reset workspace-name --hard ``` **Performance Issues**: ```bash # Check resource limits ./cli/zig-out/bin/ml jupyter limits workspace-name # Scale resources ./cli/zig-out/bin/ml jupyter scale workspace-name --cpu=8 --memory=16g # Optimize performance ./cli/zig-out/bin/ml jupyter optimize workspace-name ``` ## Best Practices ### Workspace Organization 1. **Use Descriptive Names**: `project-name-environment` 2. **Resource Planning**: Allocate appropriate CPU/memory 3. **Regular Cleanup**: Remove unused workspaces 4. **Version Control**: Track important changes ### Package Management 1. **Minimal Packages**: Install only necessary packages 2. **Version Pinning**: Use specific package versions 3. **Security First**: Always use trusted channels 4. **Regular Updates**: Keep packages updated ### Security Practices 1. **Principle of Least Privilege**: Minimal required permissions 2. **Regular Audits**: Review workspace activities 3. **Data Classification**: Handle sensitive data appropriately 4. **Compliance**: Follow organizational policies ## API Integration ### Programmatic Workspace Management ```python import requests # Create workspace response = requests.post('/api/v1/jupyter/workspaces', json={ 'name': 'my-workspace', 'resources': {'cpu': 4, 'memory': '8g'}, 'security': {'trusted_channels': ['conda-forge']} }) # Install package requests.post(f'/api/v1/jupyter/workspaces/my-workspace/packages', json={ 'package': 'numpy', 'version': '1.24.0' }) # Link to experiment requests.post('/api/v1/jupyter/workspaces/my-workspace/experiments', json={ 'experiment_id': 'exp-123' }) ``` ### Webhooks ```yaml # workspace-webhooks.yaml events: - workspace_created - package_installed - experiment_linked actions: - slack_notification - email_alert - log_event ``` ## WebSocket Protocol ### Overview The Jupyter CLI commands use a binary WebSocket protocol for efficient, low-latency communication with the FetchML server. This provides better performance than HTTP and allows for real-time updates. ### Connection ```bash # WebSocket endpoint ws://SERVER_HOST:PORT/ws # TLS-enabled endpoint wss://SERVER_HOST:PORT/ws ``` **Authentication**: API key is hashed using SHA256 and the first 16 bytes are sent with each request. ### Binary Message Format All Jupyter commands follow a binary protocol for optimal performance: **Start Jupyter Service** (Opcode: 0x0D): ``` [opcode:1][api_key_hash:16][name_len:1][name:var][workspace_len:2][workspace:var][password_len:1][password:var] ``` **Stop Jupyter Service** (Opcode: 0x0E): ``` [opcode:1][api_key_hash:16][service_id_len:1][service_id:var] ``` **List Jupyter Services** (Opcode: 0x0F): ``` [opcode:1][api_key_hash:16] ``` ### Response Packets The server responds with structured response packets: **Success Response**: ``` [packet_type:0x00][timestamp:8][message_len:2][message:var] ``` **Error Response**: ``` [packet_type:0x01][timestamp:8][error_code:1][message_len:2][message:var][details_len:2][details:var] ``` **Data Response** (for list command): ``` [packet_type:0x04][timestamp:8][type_len:2][type:var][payload_len:4][payload:var] ``` ### CLI Examples **Start Service**: ```bash # Basic start ml jupyter start --name my-notebook --workspace /path/to/workspace # With password ml jupyter start --name my-notebook --workspace /path/to/workspace --password mypass ``` **Stop Service**: ```bash ml jupyter stop service-id-12345 ``` **List Services**: ```bash ml jupyter list ``` ### Error Codes Common error codes in binary responses: - `0x00`: Unknown error - `0x01`: Invalid request format - `0x02`: Authentication failed - `0x03`: Permission denied - `0x10`: Server overloaded - `0x14`: Timeout ### WebSocket vs HTTP **Advantages of WebSocket**: - ✅ Lower latency (persistent connection) - ✅ Binary protocol (smaller payloads) - ✅ Real-time updates possible - ✅ Reduced server load - ✅ Single connection for CLI **When to use HTTP**: - For programmatic API access - For web-based integrations - When WebSocket is unavailable ## See Also - **[Testing Guide](testing.md)** - Testing Jupyter workflows - **[Deployment Guide](deployment.md)** - Production deployment - **[Security Guide](security.md)** - Security best practices - **[API Reference](api-key-process.md)** - API documentation - **[CLI Reference](cli-reference.md)** - Command-line tools