# Jupyter Workflow Guide

Comprehensive guide to Jupyter workspace management, experiment integration, and secure package management in FetchML.

## Overview

The Jupyter workflow system provides:
- **Workspace Management**: Isolated development environments
- **Experiment Integration**: Seamless linking with ML experiments
- **Package Management**: Secure package installation from trusted sources
- **Resource Sharing**: Data and context synchronization
- **Security Controls**: Approved channels and package filtering

## Quick Start

### Create Jupyter Workspace

```bash
# Start development stack
make dev-up

# Create workspace
./cli/zig-out/bin/ml jupyter create my-workspace

# Access workspace
open http://localhost:8888
```

### Link with Experiment

```bash
# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue --workspace my-workspace --experiment my-experiment

# Monitor progress
./cli/zig-out/bin/ml status
```

## Workspace Management

### Creating Workspaces

```bash
# Create new workspace
./cli/zig-out/bin/ml jupyter create workspace-name

# Create with specific configuration
./cli/zig-out/bin/ml jupyter create workspace-name \
  --cpu=4 \
  --memory=8g \
  --gpu=1 \
  --image=jupyter/scipy-notebook:latest

# List workspaces
./cli/zig-out/bin/ml jupyter list

# Workspace details
./cli/zig-out/bin/ml jupyter info workspace-name
```

### Workspace Configuration

**Resource Allocation**:
```yaml
# workspace-config.yaml
resources:
  cpu: 4
  memory: 8g
  gpu: 1
  disk: 20g

environment:
  python_version: "3.11"
  jupyter_version: "latest"
  
security:
  trusted_channels: ["conda-forge", "defaults", "pytorch"]
  blocked_packages: ["aiohttp", "telnetlib"]
```

You can also override the blocked package list at runtime using an environment variable on the worker:

```bash
export FETCHML_JUPYTER_BLOCKED_PACKAGES="aiohttp,telnetlib"
```

Some base images (including the default `quay.io/jupyter/base-notebook`) ship with common HTTP client libraries
like `requests`, `urllib3`, and `httpx` preinstalled.

If you want to **block installing** packages like `requests`, `urllib3`, and `httpx` for security reasons but still
use a base image that already includes them, you can disable the **startup image scan** separately:

```bash
# Block installs (user requests)
export FETCHML_JUPYTER_BLOCKED_PACKAGES="requests,urllib3,httpx"

# Allow base images that already contain these packages to start
export FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES="off"
```

If you want startup scanning enabled, set `FETCHML_JUPYTER_STARTUP_BLOCKED_PACKAGES` to a comma-separated list.


### Access Control

```bash
# Set workspace permissions
./cli/zig-out/bin/ml jupyter access workspace-name \
  --user=data-scientist \
  --role=editor

# Revoke access
./cli/zig-out/bin/ml jupyter revoke workspace-name data-scientist
```

## Experiment Integration

### Architecture

```
Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
        ↓                      ↓                    ↓
   Notebooks            Link Metadata      Experiment Data
   Scripts              Sync History       Metrics & Results
```

### Linking Workspaces and Experiments

```bash
# Link existing workspace to experiment
./cli/zig-out/bin/ml jupyter link workspace-name experiment-id

# Create workspace linked to new experiment
./cli/zig-out/bin/ml jupyter create workspace-name \
  --experiment experiment-id

# Queue experiment from workspace
./cli/zig-out/bin/ml jupyter queue \
  --workspace workspace-name \
  --config experiment-config.yaml
```

### Data Synchronization

**Automatic Sync**:
- Notebook metadata
- Experiment results
- Configuration files
- Resource usage metrics

**Manual Sync**:
```bash
# Sync workspace to experiment
./cli/zig-out/bin/ml jupyter sync workspace-name --to-experiment

# Sync experiment to workspace
./cli/zig-out/bin/ml jupyter sync workspace-name --from-experiment

# Force full sync
./cli/zig-out/bin/ml jupyter sync workspace-name --full
```

### Workspace Metadata

**Tracked Information**:
- Workspace creation and modification dates
- Linked experiment IDs
- Resource usage history
- Package installation records
- Notebook execution history

```bash
# View workspace metadata
./cli/zig-out/bin/ml jupyter metadata workspace-name

# Export metadata
./cli/zig-out/bin/ml jupyter export workspace-name --format=json
```

## Package Management

### Security Features

**Trusted Channels** (default):
- `conda-forge` - Community-maintained packages
- `defaults` - Anaconda default packages
- `pytorch` - PyTorch ecosystem packages
- `nvidia` - NVIDIA GPU packages

**Blocked Packages** (security):
- `requests` - HTTP client library
- `urllib3` - HTTP library
- `socket` - Network sockets
- `subprocess` - Process execution
- `os.system` - System commands

### Package Installation

**In Jupyter Notebook**:
```python
# Install package (checks security)
!pip install numpy pandas scikit-learn

# Install from conda
!conda install -c conda-forge matplotlib seaborn

# Check package status
!pip list
```

**From CLI**:
```bash
# Install package in workspace
./cli/zig-out/bin/ml jupyter install workspace-name numpy

# Install with version
./cli/zig-out/bin/ml jupyter install workspace-name "pandas==2.0.0"

# Install from conda
./cli/zig-out/bin/ml jupyter install workspace-name matplotlib --conda

# List installed packages
./cli/zig-out/bin/ml jupyter packages workspace-name
```

### Package Approval Workflow

**Optional Approval Process**:
1. **Request**: User requests package installation
2. **Review**: Admin reviews package security
3. **Approval**: Package added to allowlist
4. **Installation**: Package installed in workspace

```bash
# Request package (requires approval)
./cli/zig-out/bin/ml jupyter request workspace-name custom-package

# Review requests (admin)
./cli/zig-out/bin/ml jupyter review --pending

# Approve request
./cli/zig-out/bin/ml jupyter approve request-id

# Deny request
./cli/zig-out/bin/ml jupyter deny request-id --reason="Security concern"
```

### Custom Channel Configuration

```yaml
# workspace-security.yaml
package_management:
  trusted_channels:
    - conda-forge
    - defaults
    - pytorch
    - nvidia
    - company-internal  # Custom channel
  
  blocked_packages:
    - requests
    - urllib3
    - socket
    - subprocess
    - os.system
  
  approval_required:
    - tensorflow
    - pytorch
    - custom-package
  
  allowlist:
    - numpy
    - pandas
    - scikit-learn
    - matplotlib
```

## Security and Compliance

### Workspace Isolation

**Network Isolation**:
- Workspaces run in isolated networks
- Controlled outbound internet access
- Inter-workspace communication blocked

**File System Isolation**:
- Separate storage volumes per workspace
- Controlled file access permissions
- Automatic cleanup on workspace deletion

### Audit Trail

**Tracked Activities**:
- Package installations and removals
- Notebook execution history
- Data access patterns
- Resource usage metrics
- User access logs

```bash
# View audit log
./cli/zig-out/bin/ml jupyter audit workspace-name

# Export audit report
./cli/zig-out/bin/ml jupyter audit workspace-name --export=csv

# Security scan
./cli/zig-out/bin/ml jupyter security-scan workspace-name
```

### Compliance Features

**Data Protection**:
- Automatic data encryption
- Secure data transfer protocols
- GDPR compliance features
- Data retention policies

**Access Controls**:
- Role-based permissions
- Multi-factor authentication
- Session timeout management
- IP whitelisting

## Advanced Features

### Custom Images

```bash
# Build custom workspace image
./cli/zig-out/bin/ml jupyter build custom-image \
  --base=jupyter/scipy-notebook \
  --packages="numpy pandas scikit-learn" \
  --gpu-support

# Use custom image
./cli/zig-out/bin/ml jupyter create workspace-name \
  --image=custom-image
```

### Workspace Templates

```yaml
# data-science-template.yaml
name: data-science-workspace
resources:
  cpu: 8
  memory: 16g
  gpu: 1

packages:
  - numpy
  - pandas
  - scikit-learn
  - matplotlib
  - seaborn
  - jupyterlab

security:
  trusted_channels: ["conda-forge", "defaults"]
  approval_required: []

environment:
  PYTHONPATH: "/workspace"
  JUPYTER_ENABLE_LAB: "yes"
```

```bash
# Create from template
./cli/zig-out/bin/ml jupyter create workspace-name \
  --template=data-science-template
```

### Collaboration Features

**Workspace Sharing**:
```bash
# Share workspace with team
./cli/zig-out/bin/ml jupyter share workspace-name \
  --team=data-science-team \
  --role=collaborator

# Collaborative notebooks
# Multiple users can edit simultaneously
# Real-time cursor tracking
# Comment and review features
```

**Version Control**:
```bash
# Git integration
./cli/zig-out/bin/ml jupyter git workspace-name init
./cli/zig-out/bin/ml jupyter git workspace-name add .
./cli/zig-out/bin/ml jupyter git workspace-name commit -m "Initial commit"

# Notebook versioning
./cli/zig-out/bin/ml jupyter version workspace-name notebook.ipynb
```

## Monitoring and Troubleshooting

### Performance Monitoring

```bash
# Workspace resource usage
./cli/zig-out/bin/ml jupyter stats workspace-name

# Real-time monitoring
./cli/zig-out/bin/ml jupyter monitor workspace-name

# Performance report
./cli/zig-out/bin/ml jupyter report workspace-name --format=html
```

### Common Issues

**Package Installation Failures**:
```bash
# Check package security
./cli/zig-out/bin/ml jupyter check-package package-name

# Bypass security (admin only)
./cli/zig-out/bin/ml jupyter install workspace-name package-name --force

# Clear package cache
./cli/zig-out/bin/ml jupyter clear-cache workspace-name
```

**Workspace Access Issues**:
```bash
# Check workspace status
./cli/zig-out/bin/ml jupyter status workspace-name

# Restart workspace
./cli/zig-out/bin/ml jupyter restart workspace-name

# Reset workspace
./cli/zig-out/bin/ml jupyter reset workspace-name --hard
```

**Performance Issues**:
```bash
# Check resource limits
./cli/zig-out/bin/ml jupyter limits workspace-name

# Scale resources
./cli/zig-out/bin/ml jupyter scale workspace-name --cpu=8 --memory=16g

# Optimize performance
./cli/zig-out/bin/ml jupyter optimize workspace-name
```

## Best Practices

### Workspace Organization

1. **Use Descriptive Names**: `project-name-environment`
2. **Resource Planning**: Allocate appropriate CPU/memory
3. **Regular Cleanup**: Remove unused workspaces
4. **Version Control**: Track important changes

### Package Management

1. **Minimal Packages**: Install only necessary packages
2. **Version Pinning**: Use specific package versions
3. **Security First**: Always use trusted channels
4. **Regular Updates**: Keep packages updated

### Security Practices

1. **Principle of Least Privilege**: Minimal required permissions
2. **Regular Audits**: Review workspace activities
3. **Data Classification**: Handle sensitive data appropriately
4. **Compliance**: Follow organizational policies

## API Integration

### Programmatic Workspace Management

```python
import requests

# Create workspace
response = requests.post('/api/v1/jupyter/workspaces', json={
    'name': 'my-workspace',
    'resources': {'cpu': 4, 'memory': '8g'},
    'security': {'trusted_channels': ['conda-forge']}
})

# Install package
requests.post(f'/api/v1/jupyter/workspaces/my-workspace/packages', json={
    'package': 'numpy',
    'version': '1.24.0'
})

# Link to experiment
requests.post('/api/v1/jupyter/workspaces/my-workspace/experiments', json={
    'experiment_id': 'exp-123'
})
```

### Webhooks

```yaml
# workspace-webhooks.yaml
events:
  - workspace_created
  - package_installed
  - experiment_linked

actions:
  - slack_notification
  - email_alert
  - log_event
```

## WebSocket Protocol

### Overview

The Jupyter CLI commands use a binary WebSocket protocol for efficient, low-latency communication with the FetchML server. This provides better performance than HTTP and allows for real-time updates.

### Connection

```bash
# WebSocket endpoint
ws://SERVER_HOST:PORT/ws

# TLS-enabled endpoint
wss://SERVER_HOST:PORT/ws
```

**Authentication**: API key is hashed using SHA256 and the first 16 bytes are sent with each request.

### Binary Message Format

All Jupyter commands follow a binary protocol for optimal performance:

**Start Jupyter Service** (Opcode: 0x0D):
```
[opcode:1][api_key_hash:16][name_len:1][name:var][workspace_len:2][workspace:var][password_len:1][password:var]
```

**Stop Jupyter Service** (Opcode: 0x0E):
```
[opcode:1][api_key_hash:16][service_id_len:1][service_id:var]
```

**List Jupyter Services** (Opcode: 0x0F):
```
[opcode:1][api_key_hash:16]
```

### Response Packets

The server responds with structured response packets:

**Success Response**:
```
[packet_type:0x00][timestamp:8][message_len:2][message:var]
```

**Error Response**:
```
[packet_type:0x01][timestamp:8][error_code:1][message_len:2][message:var][details_len:2][details:var]
```

**Data Response** (for list command):
```
[packet_type:0x04][timestamp:8][type_len:2][type:var][payload_len:4][payload:var]
```

### CLI Examples

**Start Service**:
```bash
# Basic start
ml jupyter start --name my-notebook --workspace /path/to/workspace

# With password
ml jupyter start --name my-notebook --workspace /path/to/workspace --password mypass
```

**Stop Service**:
```bash
ml jupyter stop service-id-12345
```

**List Services**:
```bash
ml jupyter list
```

### Error Codes

Common error codes in binary responses:

- `0x00`: Unknown error
- `0x01`: Invalid request format
- `0x02`: Authentication failed
- `0x03`: Permission denied
- `0x10`: Server overloaded
- `0x14`: Timeout

### WebSocket vs HTTP

**Advantages of WebSocket**:
- ✅ Lower latency (persistent connection)
- ✅ Binary protocol (smaller payloads)
- ✅ Real-time updates possible
- ✅ Reduced server load
- ✅ Single connection for CLI

**When to use HTTP**:
- For programmatic API access
- For web-based integrations
- When WebSocket is unavailable

## See Also

- **[Testing Guide](testing.md)** - Testing Jupyter workflows
- **[Deployment Guide](deployment.md)** - Production deployment
- **[Security Guide](security.md)** - Security best practices
- **[API Reference](api-key-process.md)** - API documentation
- **[CLI Reference](cli-reference.md)** - Command-line tools