fetch_ml/docs/src/jupyter-workflow.md
Jeremie Fraeys 5144d291cb
docs: comprehensive documentation updates
- Add architecture, CI/CD, CLI reference documentation
- Update installation, operations, and quick-start guides
- Add Jupyter workflow and queue documentation
- New landing page and research runner plan
2026-02-12 12:05:27 -05:00

14 KiB

Jupyter Workflow Guide

Comprehensive guide to Jupyter workspace management, experiment integration, and secure package management in FetchML.

Overview

The Jupyter workflow system provides:

  • Workspace Management: Isolated development environments
  • Experiment Integration: Seamless linking with ML experiments
  • Package Management: Secure package installation from trusted sources
  • Resource Sharing: Data and context synchronization
  • Security Controls: Approved channels and package filtering

Quick Start

Create Jupyter Workspace

# Start development stack
make dev-up

# Create workspace
./cli/zig-out/bin/ml jupyter create my-workspace

# Access workspace
open http://localhost:8888
# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue --workspace my-workspace --experiment my-experiment

# Monitor progress
./cli/zig-out/bin/ml status

Workspace Management

Creating Workspaces

# Create new workspace
./cli/zig-out/bin/ml jupyter create workspace-name

# Create with specific configuration
./cli/zig-out/bin/ml jupyter create workspace-name \
  --cpu=4 \
  --memory=8g \
  --gpu=1 \
  --image=jupyter/scipy-notebook:latest

# List workspaces
./cli/zig-out/bin/ml jupyter list

# Workspace details
./cli/zig-out/bin/ml jupyter info workspace-name

Workspace Configuration

Resource Allocation:

# workspace-config.yaml
resources:
  cpu: 4
  memory: 8g
  gpu: 1
  disk: 20g

environment:
  python_version: "3.11"
  jupyter_version: "latest"
  
security:
  trusted_channels: ["conda-forge", "defaults", "pytorch"]
  blocked_packages: ["aiohttp", "telnetlib"]

You can also override the blocked package list at runtime using an environment variable on the worker:

export FETCHML_JUPYTER_BLOCKED_PACKAGES="aiohttp,telnetlib"

Some base images (including the default quay.io/jupyter/base-notebook) ship with common HTTP client libraries like requests, urllib3, and httpx preinstalled.

If you want to block installing packages like requests, urllib3, and httpx for security reasons but still use a base image that already includes them, you can disable the startup image scan separately:

# Block installs (user requests)
export FETCHML_JUPYTER_BLOCKED_PACKAGES="requests,urllib3,httpx"

# Allow base images that already contain these packages to start
export FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES="off"

If you want startup scanning enabled, set FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES to a comma-separated list.

Access Control

# Set workspace permissions
./cli/zig-out/bin/ml jupyter access workspace-name \
  --user=data-scientist \
  --role=editor

# Revoke access
./cli/zig-out/bin/ml jupyter revoke workspace-name data-scientist

Experiment Integration

Architecture

Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
        ↓                      ↓                    ↓
   Notebooks            Link Metadata      Experiment Data
   Scripts              Sync History       Metrics & Results

Linking Workspaces and Experiments

# Link existing workspace to experiment
./cli/zig-out/bin/ml jupyter link workspace-name experiment-id

# Create workspace linked to new experiment
./cli/zig-out/bin/ml jupyter create workspace-name \
  --experiment experiment-id

# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue \
  --workspace workspace-name \
  --config experiment-config.yaml

Data Synchronization

Automatic Sync:

  • Notebook metadata
  • Experiment results
  • Configuration files
  • Resource usage metrics

Manual Sync:

# Sync workspace to experiment
./cli/zig-out/bin/ml jupyter sync workspace-name --to-experiment

# Sync experiment to workspace
./cli/zig-out/bin/ml jupyter sync workspace-name --from-experiment

# Force full sync
./cli/zig-out/bin/ml jupyter sync workspace-name --full

Workspace Metadata

Tracked Information:

  • Workspace creation and modification dates
  • Linked experiment IDs
  • Resource usage history
  • Package installation records
  • Notebook execution history
# View workspace metadata
./cli/zig-out/bin/ml jupyter metadata workspace-name

# Export metadata
./cli/zig-out/bin/ml jupyter export workspace-name --format=json

Package Management

Security Features

Trusted Channels (default):

  • conda-forge - Community-maintained packages
  • defaults - Anaconda default packages
  • pytorch - PyTorch ecosystem packages
  • nvidia - NVIDIA GPU packages

Blocked Packages (security):

  • requests - HTTP client library
  • urllib3 - HTTP library
  • socket - Network sockets
  • subprocess - Process execution
  • os.system - System commands

Package Installation

In Jupyter Notebook:

# Install package (checks security)
!pip install numpy pandas scikit-learn

# Install from conda
!conda install -c conda-forge matplotlib seaborn

# Check package status
!pip list

From CLI:

# Install package in workspace
./cli/zig-out/bin/ml jupyter install workspace-name numpy

# Install with version
./cli/zig-out/bin/ml jupyter install workspace-name "pandas==2.0.0"

# Install from conda
./cli/zig-out/bin/ml jupyter install workspace-name matplotlib --conda

# List installed packages
./cli/zig-out/bin/ml jupyter packages workspace-name

Package Approval Workflow

Optional Approval Process:

  1. Request: User requests package installation
  2. Review: Admin reviews package security
  3. Approval: Package added to allowlist
  4. Installation: Package installed in workspace
# Request package (requires approval)
./cli/zig-out/bin/ml jupyter request workspace-name custom-package

# Review requests (admin)
./cli/zig-out/bin/ml jupyter review --pending

# Approve request
./cli/zig-out/bin/ml jupyter approve request-id

# Deny request
./cli/zig-out/bin/ml jupyter deny request-id --reason="Security concern"

Custom Channel Configuration

# workspace-security.yaml
package_management:
  trusted_channels:
    - conda-forge
    - defaults
    - pytorch
    - nvidia
    - company-internal  # Custom channel
  
  blocked_packages:
    - requests
    - urllib3
    - socket
    - subprocess
    - os.system
  
  approval_required:
    - tensorflow
    - pytorch
    - custom-package
  
  allowlist:
    - numpy
    - pandas
    - scikit-learn
    - matplotlib

Security and Compliance

Workspace Isolation

Network Isolation:

  • Workspaces run in isolated networks
  • Controlled outbound internet access
  • Inter-workspace communication blocked

File System Isolation:

  • Separate storage volumes per workspace
  • Controlled file access permissions
  • Automatic cleanup on workspace deletion

Audit Trail

Tracked Activities:

  • Package installations and removals
  • Notebook execution history
  • Data access patterns
  • Resource usage metrics
  • User access logs
# View audit log
./cli/zig-out/bin/ml jupyter audit workspace-name

# Export audit report
./cli/zig-out/bin/ml jupyter audit workspace-name --export=csv

# Security scan
./cli/zig-out/bin/ml jupyter security-scan workspace-name

Compliance Features

Data Protection:

  • Automatic data encryption
  • Secure data transfer protocols
  • GDPR compliance features
  • Data retention policies

Access Controls:

  • Role-based permissions
  • Multi-factor authentication
  • Session timeout management
  • IP whitelisting

Advanced Features

Custom Images

# Build custom workspace image
./cli/zig-out/bin/ml jupyter build custom-image \
  --base=jupyter/scipy-notebook \
  --packages="numpy pandas scikit-learn" \
  --gpu-support

# Use custom image
./cli/zig-out/bin/ml jupyter create workspace-name \
  --image=custom-image

Workspace Templates

# data-science-template.yaml
name: data-science-workspace
resources:
  cpu: 8
  memory: 16g
  gpu: 1

packages:
  - numpy
  - pandas
  - scikit-learn
  - matplotlib
  - seaborn
  - jupyterlab

security:
  trusted_channels: ["conda-forge", "defaults"]
  approval_required: []

environment:
  PYTHONPATH: "/workspace"
  JUPYTER_ENABLE_LAB: "yes"
# Create from template
./cli/zig-out/bin/ml jupyter create workspace-name \
  --template=data-science-template

Collaboration Features

Workspace Sharing:

# Share workspace with team
./cli/zig-out/bin/ml jupyter share workspace-name \
  --team=data-science-team \
  --role=collaborator

# Collaborative notebooks
# Multiple users can edit simultaneously
# Real-time cursor tracking
# Comment and review features

Version Control:

# Git integration
./cli/zig-out/bin/ml jupyter git workspace-name init
./cli/zig-out/bin/ml jupyter git workspace-name add .
./cli/zig-out/bin/ml jupyter git workspace-name commit -m "Initial commit"

# Notebook versioning
./cli/zig-out/bin/ml jupyter version workspace-name notebook.ipynb

Monitoring and Troubleshooting

Performance Monitoring

# Workspace resource usage
./cli/zig-out/bin/ml jupyter stats workspace-name

# Real-time monitoring
./cli/zig-out/bin/ml jupyter monitor workspace-name

# Performance report
./cli/zig-out/bin/ml jupyter report workspace-name --format=html

Common Issues

Package Installation Failures:

# Check package security
./cli/zig-out/bin/ml jupyter check-package package-name

# Bypass security (admin only)
./cli/zig-out/bin/ml jupyter install workspace-name package-name --force

# Clear package cache
./cli/zig-out/bin/ml jupyter clear-cache workspace-name

Workspace Access Issues:

# Check workspace status
./cli/zig-out/bin/ml jupyter status workspace-name

# Restart workspace
./cli/zig-out/bin/ml jupyter restart workspace-name

# Reset workspace
./cli/zig-out/bin/ml jupyter reset workspace-name --hard

Performance Issues:

# Check resource limits
./cli/zig-out/bin/ml jupyter limits workspace-name

# Scale resources
./cli/zig-out/bin/ml jupyter scale workspace-name --cpu=8 --memory=16g

# Optimize performance
./cli/zig-out/bin/ml jupyter optimize workspace-name

Best Practices

Workspace Organization

  1. Use Descriptive Names: project-name-environment
  2. Resource Planning: Allocate appropriate CPU/memory
  3. Regular Cleanup: Remove unused workspaces
  4. Version Control: Track important changes

Package Management

  1. Minimal Packages: Install only necessary packages
  2. Version Pinning: Use specific package versions
  3. Security First: Always use trusted channels
  4. Regular Updates: Keep packages updated

Security Practices

  1. Principle of Least Privilege: Minimal required permissions
  2. Regular Audits: Review workspace activities
  3. Data Classification: Handle sensitive data appropriately
  4. Compliance: Follow organizational policies

API Integration

Programmatic Workspace Management

import requests

# Create workspace
response = requests.post('/api/v1/jupyter/workspaces', json={
    'name': 'my-workspace',
    'resources': {'cpu': 4, 'memory': '8g'},
    'security': {'trusted_channels': ['conda-forge']}
})

# Install package
requests.post(f'/api/v1/jupyter/workspaces/my-workspace/packages', json={
    'package': 'numpy',
    'version': '1.24.0'
})

# Link to experiment
requests.post('/api/v1/jupyter/workspaces/my-workspace/experiments', json={
    'experiment_id': 'exp-123'
})

Webhooks

# workspace-webhooks.yaml
events:
  - workspace_created
  - package_installed
  - experiment_linked

actions:
  - slack_notification
  - email_alert
  - log_event

WebSocket Protocol

Overview

The Jupyter CLI commands use a binary WebSocket protocol for efficient, low-latency communication with the FetchML server. This provides better performance than HTTP and allows for real-time updates.

Connection

# WebSocket endpoint
ws://SERVER_HOST:PORT/ws

# TLS-enabled endpoint
wss://SERVER_HOST:PORT/ws

Authentication: API key is hashed using SHA256 and the first 16 bytes are sent with each request.

Binary Message Format

All Jupyter commands follow a binary protocol for optimal performance:

Start Jupyter Service (Opcode: 0x0D):

[opcode:1][api_key_hash:16][name_len:1][name:var][workspace_len:2][workspace:var][password_len:1][password:var]

Stop Jupyter Service (Opcode: 0x0E):

[opcode:1][api_key_hash:16][service_id_len:1][service_id:var]

List Jupyter Services (Opcode: 0x0F):

[opcode:1][api_key_hash:16]

Response Packets

The server responds with structured response packets:

Success Response:

[packet_type:0x00][timestamp:8][message_len:2][message:var]

Error Response:

[packet_type:0x01][timestamp:8][error_code:1][message_len:2][message:var][details_len:2][details:var]

Data Response (for list command):

[packet_type:0x04][timestamp:8][type_len:2][type:var][payload_len:4][payload:var]

CLI Examples

Start Service:

# Basic start
ml jupyter start --name my-notebook --workspace /path/to/workspace

# With password
ml jupyter start --name my-notebook --workspace /path/to/workspace --password mypass

Stop Service:

ml jupyter stop service-id-12345

List Services:

ml jupyter list

Error Codes

Common error codes in binary responses:

  • 0x00: Unknown error
  • 0x01: Invalid request format
  • 0x02: Authentication failed
  • 0x03: Permission denied
  • 0x10: Server overloaded
  • 0x14: Timeout

WebSocket vs HTTP

Advantages of WebSocket:

  • Lower latency (persistent connection)
  • Binary protocol (smaller payloads)
  • Real-time updates possible
  • Reduced server load
  • Single connection for CLI

When to use HTTP:

  • For programmatic API access
  • For web-based integrations
  • When WebSocket is unavailable

See Also