docs: comprehensive documentation updates

- Add architecture, CI/CD, CLI reference documentation
- Update installation, operations, and quick-start guides
- Add Jupyter workflow and queue documentation
- New landing page and research runner plan

2026-02-12 12:05:27 -05:00

14 KiB

Raw Blame History

Jupyter Workflow Guide

Comprehensive guide to Jupyter workspace management, experiment integration, and secure package management in FetchML.

Overview

The Jupyter workflow system provides:

Workspace Management: Isolated development environments
Experiment Integration: Seamless linking with ML experiments
Package Management: Secure package installation from trusted sources
Resource Sharing: Data and context synchronization
Security Controls: Approved channels and package filtering

Quick Start

Create Jupyter Workspace

# Start development stack
make dev-up

# Create workspace
./cli/zig-out/bin/ml jupyter create my-workspace

# Access workspace
open http://localhost:8888

Link with Experiment

# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue --workspace my-workspace --experiment my-experiment

# Monitor progress
./cli/zig-out/bin/ml status

Workspace Management

Creating Workspaces

# Create new workspace
./cli/zig-out/bin/ml jupyter create workspace-name

# Create with specific configuration
./cli/zig-out/bin/ml jupyter create workspace-name \
  --cpu=4 \
  --memory=8g \
  --gpu=1 \
  --image=jupyter/scipy-notebook:latest

# List workspaces
./cli/zig-out/bin/ml jupyter list

# Workspace details
./cli/zig-out/bin/ml jupyter info workspace-name

Workspace Configuration

Resource Allocation:

# workspace-config.yaml
resources:
  cpu: 4
  memory: 8g
  gpu: 1
  disk: 20g

environment:
  python_version: "3.11"
  jupyter_version: "latest"
  
security:
  trusted_channels: ["conda-forge", "defaults", "pytorch"]
  blocked_packages: ["aiohttp", "telnetlib"]

You can also override the blocked package list at runtime using an environment variable on the worker:

export FETCHML_JUPYTER_BLOCKED_PACKAGES="aiohttp,telnetlib"

Some base images (including the default quay.io/jupyter/base-notebook) ship with common HTTP client libraries like requests, urllib3, and httpx preinstalled.

If you want to block installing packages like requests, urllib3, and httpx for security reasons but still use a base image that already includes them, you can disable the startup image scan separately:

# Block installs (user requests)
export FETCHML_JUPYTER_BLOCKED_PACKAGES="requests,urllib3,httpx"

# Allow base images that already contain these packages to start
export FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES="off"

If you want startup scanning enabled, set FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES to a comma-separated list.

Access Control

# Set workspace permissions
./cli/zig-out/bin/ml jupyter access workspace-name \
  --user=data-scientist \
  --role=editor

# Revoke access
./cli/zig-out/bin/ml jupyter revoke workspace-name data-scientist

Experiment Integration

Architecture

Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
        ↓                      ↓                    ↓
   Notebooks            Link Metadata      Experiment Data
   Scripts              Sync History       Metrics & Results

Linking Workspaces and Experiments

# Link existing workspace to experiment
./cli/zig-out/bin/ml jupyter link workspace-name experiment-id

# Create workspace linked to new experiment
./cli/zig-out/bin/ml jupyter create workspace-name \
  --experiment experiment-id

# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue \
  --workspace workspace-name \
  --config experiment-config.yaml

Data Synchronization

Automatic Sync:

Notebook metadata
Experiment results
Configuration files
Resource usage metrics

Manual Sync:

# Sync workspace to experiment
./cli/zig-out/bin/ml jupyter sync workspace-name --to-experiment

# Sync experiment to workspace
./cli/zig-out/bin/ml jupyter sync workspace-name --from-experiment

# Force full sync
./cli/zig-out/bin/ml jupyter sync workspace-name --full

Workspace Metadata

Tracked Information:

Workspace creation and modification dates
Linked experiment IDs
Resource usage history
Package installation records
Notebook execution history

# View workspace metadata
./cli/zig-out/bin/ml jupyter metadata workspace-name

# Export metadata
./cli/zig-out/bin/ml jupyter export workspace-name --format=json

Package Management

Security Features

Trusted Channels (default):

conda-forge - Community-maintained packages
defaults - Anaconda default packages
pytorch - PyTorch ecosystem packages
nvidia - NVIDIA GPU packages

Blocked Packages (security):

requests - HTTP client library
urllib3 - HTTP library
socket - Network sockets
subprocess - Process execution
os.system - System commands

Package Installation

In Jupyter Notebook:

# Install package (checks security)
!pip install numpy pandas scikit-learn

# Install from conda
!conda install -c conda-forge matplotlib seaborn

# Check package status
!pip list

From CLI:

# Install package in workspace
./cli/zig-out/bin/ml jupyter install workspace-name numpy

# Install with version
./cli/zig-out/bin/ml jupyter install workspace-name "pandas==2.0.0"

# Install from conda
./cli/zig-out/bin/ml jupyter install workspace-name matplotlib --conda

# List installed packages
./cli/zig-out/bin/ml jupyter packages workspace-name

Package Approval Workflow

Optional Approval Process:

Request: User requests package installation
Review: Admin reviews package security
Approval: Package added to allowlist
Installation: Package installed in workspace

# Request package (requires approval)
./cli/zig-out/bin/ml jupyter request workspace-name custom-package

# Review requests (admin)
./cli/zig-out/bin/ml jupyter review --pending

# Approve request
./cli/zig-out/bin/ml jupyter approve request-id

# Deny request
./cli/zig-out/bin/ml jupyter deny request-id --reason="Security concern"

Custom Channel Configuration

# workspace-security.yaml
package_management:
  trusted_channels:
    - conda-forge
    - defaults
    - pytorch
    - nvidia
    - company-internal  # Custom channel
  
  blocked_packages:
    - requests
    - urllib3
    - socket
    - subprocess
    - os.system
  
  approval_required:
    - tensorflow
    - pytorch
    - custom-package
  
  allowlist:
    - numpy
    - pandas
    - scikit-learn
    - matplotlib

Security and Compliance

Workspace Isolation

Network Isolation:

Workspaces run in isolated networks
Controlled outbound internet access
Inter-workspace communication blocked

File System Isolation:

Separate storage volumes per workspace
Controlled file access permissions
Automatic cleanup on workspace deletion

Audit Trail

Tracked Activities:

Package installations and removals
Notebook execution history
Data access patterns
Resource usage metrics
User access logs

# View audit log
./cli/zig-out/bin/ml jupyter audit workspace-name

# Export audit report
./cli/zig-out/bin/ml jupyter audit workspace-name --export=csv

# Security scan
./cli/zig-out/bin/ml jupyter security-scan workspace-name

Compliance Features

Data Protection:

Automatic data encryption
Secure data transfer protocols
GDPR compliance features
Data retention policies

Access Controls:

Role-based permissions
Multi-factor authentication
Session timeout management
IP whitelisting

Advanced Features

Custom Images

# Build custom workspace image
./cli/zig-out/bin/ml jupyter build custom-image \
  --base=jupyter/scipy-notebook \
  --packages="numpy pandas scikit-learn" \
  --gpu-support

# Use custom image
./cli/zig-out/bin/ml jupyter create workspace-name \
  --image=custom-image

Workspace Templates

# data-science-template.yaml
name: data-science-workspace
resources:
  cpu: 8
  memory: 16g
  gpu: 1

packages:
  - numpy
  - pandas
  - scikit-learn
  - matplotlib
  - seaborn
  - jupyterlab

security:
  trusted_channels: ["conda-forge", "defaults"]
  approval_required: []

environment:
  PYTHONPATH: "/workspace"
  JUPYTER_ENABLE_LAB: "yes"

# Create from template
./cli/zig-out/bin/ml jupyter create workspace-name \
  --template=data-science-template

Collaboration Features

Workspace Sharing:

# Share workspace with team
./cli/zig-out/bin/ml jupyter share workspace-name \
  --team=data-science-team \
  --role=collaborator

# Collaborative notebooks
# Multiple users can edit simultaneously
# Real-time cursor tracking
# Comment and review features

Version Control:

# Git integration
./cli/zig-out/bin/ml jupyter git workspace-name init
./cli/zig-out/bin/ml jupyter git workspace-name add .
./cli/zig-out/bin/ml jupyter git workspace-name commit -m "Initial commit"

# Notebook versioning
./cli/zig-out/bin/ml jupyter version workspace-name notebook.ipynb

Monitoring and Troubleshooting

Performance Monitoring

# Workspace resource usage
./cli/zig-out/bin/ml jupyter stats workspace-name

# Real-time monitoring
./cli/zig-out/bin/ml jupyter monitor workspace-name

# Performance report
./cli/zig-out/bin/ml jupyter report workspace-name --format=html

Common Issues

Package Installation Failures:

# Check package security
./cli/zig-out/bin/ml jupyter check-package package-name

# Bypass security (admin only)
./cli/zig-out/bin/ml jupyter install workspace-name package-name --force

# Clear package cache
./cli/zig-out/bin/ml jupyter clear-cache workspace-name

Workspace Access Issues:

# Check workspace status
./cli/zig-out/bin/ml jupyter status workspace-name

# Restart workspace
./cli/zig-out/bin/ml jupyter restart workspace-name

# Reset workspace
./cli/zig-out/bin/ml jupyter reset workspace-name --hard

Performance Issues:

# Check resource limits
./cli/zig-out/bin/ml jupyter limits workspace-name

# Scale resources
./cli/zig-out/bin/ml jupyter scale workspace-name --cpu=8 --memory=16g

# Optimize performance
./cli/zig-out/bin/ml jupyter optimize workspace-name

Best Practices

Workspace Organization

Use Descriptive Names: project-name-environment
Resource Planning: Allocate appropriate CPU/memory
Regular Cleanup: Remove unused workspaces
Version Control: Track important changes

Package Management

Minimal Packages: Install only necessary packages
Version Pinning: Use specific package versions
Security First: Always use trusted channels
Regular Updates: Keep packages updated

Security Practices

Principle of Least Privilege: Minimal required permissions
Regular Audits: Review workspace activities
Data Classification: Handle sensitive data appropriately
Compliance: Follow organizational policies

API Integration

Programmatic Workspace Management

import requests

# Create workspace
response = requests.post('/api/v1/jupyter/workspaces', json={
    'name': 'my-workspace',
    'resources': {'cpu': 4, 'memory': '8g'},
    'security': {'trusted_channels': ['conda-forge']}
})

# Install package
requests.post(f'/api/v1/jupyter/workspaces/my-workspace/packages', json={
    'package': 'numpy',
    'version': '1.24.0'
})

# Link to experiment
requests.post('/api/v1/jupyter/workspaces/my-workspace/experiments', json={
    'experiment_id': 'exp-123'
})

Webhooks

# workspace-webhooks.yaml
events:
  - workspace_created
  - package_installed
  - experiment_linked

actions:
  - slack_notification
  - email_alert
  - log_event

WebSocket Protocol

Overview

The Jupyter CLI commands use a binary WebSocket protocol for efficient, low-latency communication with the FetchML server. This provides better performance than HTTP and allows for real-time updates.

Connection

# WebSocket endpoint
ws://SERVER_HOST:PORT/ws

# TLS-enabled endpoint
wss://SERVER_HOST:PORT/ws

Authentication: API key is hashed using SHA256 and the first 16 bytes are sent with each request.

Binary Message Format

All Jupyter commands follow a binary protocol for optimal performance:

Start Jupyter Service (Opcode: 0x0D):

[opcode:1][api_key_hash:16][name_len:1][name:var][workspace_len:2][workspace:var][password_len:1][password:var]

Stop Jupyter Service (Opcode: 0x0E):

[opcode:1][api_key_hash:16][service_id_len:1][service_id:var]

List Jupyter Services (Opcode: 0x0F):

[opcode:1][api_key_hash:16]

Response Packets

The server responds with structured response packets:

Success Response:

[packet_type:0x00][timestamp:8][message_len:2][message:var]

Error Response:

[packet_type:0x01][timestamp:8][error_code:1][message_len:2][message:var][details_len:2][details:var]

Data Response (for list command):

[packet_type:0x04][timestamp:8][type_len:2][type:var][payload_len:4][payload:var]

CLI Examples

Start Service:

# Basic start
ml jupyter start --name my-notebook --workspace /path/to/workspace

# With password
ml jupyter start --name my-notebook --workspace /path/to/workspace --password mypass

Stop Service:

ml jupyter stop service-id-12345

List Services:

ml jupyter list

Error Codes

Common error codes in binary responses:

0x00: Unknown error
0x01: Invalid request format
0x02: Authentication failed
0x03: Permission denied
0x10: Server overloaded
0x14: Timeout

WebSocket vs HTTP

Advantages of WebSocket:

✅ Lower latency (persistent connection)
✅ Binary protocol (smaller payloads)
✅ Real-time updates possible
✅ Reduced server load
✅ Single connection for CLI

When to use HTTP:

For programmatic API access
For web-based integrations
When WebSocket is unavailable

14 KiB Raw Blame History

Jupyter Workflow Guide

Overview

Quick Start

Create Jupyter Workspace

Link with Experiment

Workspace Management

Creating Workspaces

Workspace Configuration

Access Control

Experiment Integration

Architecture

Linking Workspaces and Experiments

Data Synchronization

Workspace Metadata

Package Management

Security Features

Package Installation

Package Approval Workflow

Custom Channel Configuration

Security and Compliance

Workspace Isolation

Audit Trail

Compliance Features

Advanced Features

Custom Images

Workspace Templates

Collaboration Features

Monitoring and Troubleshooting

Performance Monitoring

Common Issues

Best Practices

Workspace Organization

Package Management

Security Practices

API Integration

Programmatic Workspace Management

Webhooks

WebSocket Protocol

Overview

Connection

Binary Message Format

Response Packets

CLI Examples

Error Codes

WebSocket vs HTTP

See Also

14 KiB

Raw Blame History