13 KiB
Jupyter Workflow Guide
Comprehensive guide to Jupyter workspace management, experiment integration, and secure package management in FetchML.
Overview
The Jupyter workflow system provides:
- Workspace Management: Isolated development environments
- Experiment Integration: Seamless linking with ML experiments
- Package Management: Secure package installation from trusted sources
- Resource Sharing: Data and context synchronization
- Security Controls: Approved channels and package filtering
Quick Start
Create Jupyter Workspace
# Start development stack
make dev-up
# Create workspace
./cli/zig-out/bin/ml jupyter create my-workspace
# Access workspace
open http://localhost:8888
Link with Experiment
# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue --workspace my-workspace --experiment my-experiment
# Monitor progress
./cli/zig-out/bin/ml status
Workspace Management
Creating Workspaces
# Create new workspace
./cli/zig-out/bin/ml jupyter create workspace-name
# Create with specific configuration
./cli/zig-out/bin/ml jupyter create workspace-name \
--cpu=4 \
--memory=8g \
--gpu=1 \
--image=jupyter/scipy-notebook:latest
# List workspaces
./cli/zig-out/bin/ml jupyter list
# Workspace details
./cli/zig-out/bin/ml jupyter info workspace-name
Workspace Configuration
Resource Allocation:
# workspace-config.yaml
resources:
cpu: 4
memory: 8g
gpu: 1
disk: 20g
environment:
python_version: "3.11"
jupyter_version: "latest"
security:
trusted_channels: ["conda-forge", "defaults", "pytorch"]
blocked_packages: ["requests", "urllib3"]
Access Control
# Set workspace permissions
./cli/zig-out/bin/ml jupyter access workspace-name \
--user=data-scientist \
--role=editor
# Revoke access
./cli/zig-out/bin/ml jupyter revoke workspace-name data-scientist
Experiment Integration
Architecture
Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
↓ ↓ ↓
Notebooks Link Metadata Experiment Data
Scripts Sync History Metrics & Results
Linking Workspaces and Experiments
# Link existing workspace to experiment
./cli/zig-out/bin/ml jupyter link workspace-name experiment-id
# Create workspace linked to new experiment
./cli/zig-out/bin/ml jupyter create workspace-name \
--experiment experiment-id
# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue \
--workspace workspace-name \
--config experiment-config.yaml
Data Synchronization
Automatic Sync:
- Notebook metadata
- Experiment results
- Configuration files
- Resource usage metrics
Manual Sync:
# Sync workspace to experiment
./cli/zig-out/bin/ml jupyter sync workspace-name --to-experiment
# Sync experiment to workspace
./cli/zig-out/bin/ml jupyter sync workspace-name --from-experiment
# Force full sync
./cli/zig-out/bin/ml jupyter sync workspace-name --full
Workspace Metadata
Tracked Information:
- Workspace creation and modification dates
- Linked experiment IDs
- Resource usage history
- Package installation records
- Notebook execution history
# View workspace metadata
./cli/zig-out/bin/ml jupyter metadata workspace-name
# Export metadata
./cli/zig-out/bin/ml jupyter export workspace-name --format=json
Package Management
Security Features
Trusted Channels (default):
conda-forge- Community-maintained packagesdefaults- Anaconda default packagespytorch- PyTorch ecosystem packagesnvidia- NVIDIA GPU packages
Blocked Packages (security):
requests- HTTP client libraryurllib3- HTTP librarysocket- Network socketssubprocess- Process executionos.system- System commands
Package Installation
In Jupyter Notebook:
# Install package (checks security)
!pip install numpy pandas scikit-learn
# Install from conda
!conda install -c conda-forge matplotlib seaborn
# Check package status
!pip list
From CLI:
# Install package in workspace
./cli/zig-out/bin/ml jupyter install workspace-name numpy
# Install with version
./cli/zig-out/bin/ml jupyter install workspace-name "pandas==2.0.0"
# Install from conda
./cli/zig-out/bin/ml jupyter install workspace-name matplotlib --conda
# List installed packages
./cli/zig-out/bin/ml jupyter packages workspace-name
Package Approval Workflow
Optional Approval Process:
- Request: User requests package installation
- Review: Admin reviews package security
- Approval: Package added to allowlist
- Installation: Package installed in workspace
# Request package (requires approval)
./cli/zig-out/bin/ml jupyter request workspace-name custom-package
# Review requests (admin)
./cli/zig-out/bin/ml jupyter review --pending
# Approve request
./cli/zig-out/bin/ml jupyter approve request-id
# Deny request
./cli/zig-out/bin/ml jupyter deny request-id --reason="Security concern"
Custom Channel Configuration
# workspace-security.yaml
package_management:
trusted_channels:
- conda-forge
- defaults
- pytorch
- nvidia
- company-internal # Custom channel
blocked_packages:
- requests
- urllib3
- socket
- subprocess
- os.system
approval_required:
- tensorflow
- pytorch
- custom-package
allowlist:
- numpy
- pandas
- scikit-learn
- matplotlib
Security and Compliance
Workspace Isolation
Network Isolation:
- Workspaces run in isolated networks
- Controlled outbound internet access
- Inter-workspace communication blocked
File System Isolation:
- Separate storage volumes per workspace
- Controlled file access permissions
- Automatic cleanup on workspace deletion
Audit Trail
Tracked Activities:
- Package installations and removals
- Notebook execution history
- Data access patterns
- Resource usage metrics
- User access logs
# View audit log
./cli/zig-out/bin/ml jupyter audit workspace-name
# Export audit report
./cli/zig-out/bin/ml jupyter audit workspace-name --export=csv
# Security scan
./cli/zig-out/bin/ml jupyter security-scan workspace-name
Compliance Features
Data Protection:
- Automatic data encryption
- Secure data transfer protocols
- GDPR compliance features
- Data retention policies
Access Controls:
- Role-based permissions
- Multi-factor authentication
- Session timeout management
- IP whitelisting
Advanced Features
Custom Images
# Build custom workspace image
./cli/zig-out/bin/ml jupyter build custom-image \
--base=jupyter/scipy-notebook \
--packages="numpy pandas scikit-learn" \
--gpu-support
# Use custom image
./cli/zig-out/bin/ml jupyter create workspace-name \
--image=custom-image
Workspace Templates
# data-science-template.yaml
name: data-science-workspace
resources:
cpu: 8
memory: 16g
gpu: 1
packages:
- numpy
- pandas
- scikit-learn
- matplotlib
- seaborn
- jupyterlab
security:
trusted_channels: ["conda-forge", "defaults"]
approval_required: []
environment:
PYTHONPATH: "/workspace"
JUPYTER_ENABLE_LAB: "yes"
# Create from template
./cli/zig-out/bin/ml jupyter create workspace-name \
--template=data-science-template
Collaboration Features
Workspace Sharing:
# Share workspace with team
./cli/zig-out/bin/ml jupyter share workspace-name \
--team=data-science-team \
--role=collaborator
# Collaborative notebooks
# Multiple users can edit simultaneously
# Real-time cursor tracking
# Comment and review features
Version Control:
# Git integration
./cli/zig-out/bin/ml jupyter git workspace-name init
./cli/zig-out/bin/ml jupyter git workspace-name add .
./cli/zig-out/bin/ml jupyter git workspace-name commit -m "Initial commit"
# Notebook versioning
./cli/zig-out/bin/ml jupyter version workspace-name notebook.ipynb
Monitoring and Troubleshooting
Performance Monitoring
# Workspace resource usage
./cli/zig-out/bin/ml jupyter stats workspace-name
# Real-time monitoring
./cli/zig-out/bin/ml jupyter monitor workspace-name
# Performance report
./cli/zig-out/bin/ml jupyter report workspace-name --format=html
Common Issues
Package Installation Failures:
# Check package security
./cli/zig-out/bin/ml jupyter check-package package-name
# Bypass security (admin only)
./cli/zig-out/bin/ml jupyter install workspace-name package-name --force
# Clear package cache
./cli/zig-out/bin/ml jupyter clear-cache workspace-name
Workspace Access Issues:
# Check workspace status
./cli/zig-out/bin/ml jupyter status workspace-name
# Restart workspace
./cli/zig-out/bin/ml jupyter restart workspace-name
# Reset workspace
./cli/zig-out/bin/ml jupyter reset workspace-name --hard
Performance Issues:
# Check resource limits
./cli/zig-out/bin/ml jupyter limits workspace-name
# Scale resources
./cli/zig-out/bin/ml jupyter scale workspace-name --cpu=8 --memory=16g
# Optimize performance
./cli/zig-out/bin/ml jupyter optimize workspace-name
Best Practices
Workspace Organization
- Use Descriptive Names:
project-name-environment - Resource Planning: Allocate appropriate CPU/memory
- Regular Cleanup: Remove unused workspaces
- Version Control: Track important changes
Package Management
- Minimal Packages: Install only necessary packages
- Version Pinning: Use specific package versions
- Security First: Always use trusted channels
- Regular Updates: Keep packages updated
Security Practices
- Principle of Least Privilege: Minimal required permissions
- Regular Audits: Review workspace activities
- Data Classification: Handle sensitive data appropriately
- Compliance: Follow organizational policies
API Integration
Programmatic Workspace Management
import requests
# Create workspace
response = requests.post('/api/v1/jupyter/workspaces', json={
'name': 'my-workspace',
'resources': {'cpu': 4, 'memory': '8g'},
'security': {'trusted_channels': ['conda-forge']}
})
# Install package
requests.post(f'/api/v1/jupyter/workspaces/my-workspace/packages', json={
'package': 'numpy',
'version': '1.24.0'
})
# Link to experiment
requests.post('/api/v1/jupyter/workspaces/my-workspace/experiments', json={
'experiment_id': 'exp-123'
})
Webhooks
# workspace-webhooks.yaml
events:
- workspace_created
- package_installed
- experiment_linked
actions:
- slack_notification
- email_alert
- log_event
WebSocket Protocol
Overview
The Jupyter CLI commands use a binary WebSocket protocol for efficient, low-latency communication with the FetchML server. This provides better performance than HTTP and allows for real-time updates.
Connection
# WebSocket endpoint
ws://SERVER_HOST:PORT/ws
# TLS-enabled endpoint
wss://SERVER_HOST:PORT/ws
Authentication: API key is hashed using SHA256 and the first 16 bytes are sent with each request.
Binary Message Format
All Jupyter commands follow a binary protocol for optimal performance:
Start Jupyter Service (Opcode: 0x0D):
[opcode:1][api_key_hash:16][name_len:1][name:var][workspace_len:2][workspace:var][password_len:1][password:var]
Stop Jupyter Service (Opcode: 0x0E):
[opcode:1][api_key_hash:16][service_id_len:1][service_id:var]
List Jupyter Services (Opcode: 0x0F):
[opcode:1][api_key_hash:16]
Response Packets
The server responds with structured response packets:
Success Response:
[packet_type:0x00][timestamp:8][message_len:2][message:var]
Error Response:
[packet_type:0x01][timestamp:8][error_code:1][message_len:2][message:var][details_len:2][details:var]
Data Response (for list command):
[packet_type:0x04][timestamp:8][type_len:2][type:var][payload_len:4][payload:var]
CLI Examples
Start Service:
# Basic start
ml jupyter start --name my-notebook --workspace /path/to/workspace
# With password
ml jupyter start --name my-notebook --workspace /path/to/workspace --password mypass
Stop Service:
ml jupyter stop service-id-12345
List Services:
ml jupyter list
Error Codes
Common error codes in binary responses:
0x00: Unknown error0x01: Invalid request format0x02: Authentication failed0x03: Permission denied0x10: Server overloaded0x14: Timeout
WebSocket vs HTTP
Advantages of WebSocket:
- ✅ Lower latency (persistent connection)
- ✅ Binary protocol (smaller payloads)
- ✅ Real-time updates possible
- ✅ Reduced server load
- ✅ Single connection for CLI
When to use HTTP:
- For programmatic API access
- For web-based integrations
- When WebSocket is unavailable
See Also
- Testing Guide - Testing Jupyter workflows
- Deployment Guide - Production deployment
- Security Guide - Security best practices
- API Reference - API documentation
- CLI Reference - Command-line tools