# Jupyter Workspace and Experiment Integration

This guide describes the integration between Jupyter workspaces and FetchML experiments, enabling seamless resource chaining and data synchronization.

## Overview

The Jupyter-experiment integration allows you to:

- Link Jupyter workspaces with specific experiments
- Automatically track experiment metadata in workspaces
- Queue experiments directly from Jupyter workspaces
- Synchronize data between workspaces and experiments
- Maintain resource sharing and context across development and production workflows

## Architecture

### Components

1. **Workspace Metadata Manager** - Tracks relationships between workspaces and experiments
2. **Service Manager Integration** - Links Jupyter services with experiment context
3. **CLI Commands** - Provides user-facing integration commands
4. **API Endpoints** - Enables programmatic workspace-experiment management

### Data Flow

```
Jupyter Workspace ←→ Workspace Metadata ←→ Experiment Manager
        ↓                      ↓                    ↓
   Notebooks            Link Metadata      Experiment Data
   Scripts              Sync History       Metrics & Results
   Requirements         Auto-sync Config   Job Queue
```

## Quick Start

### 1. Create a Jupyter workspace

```bash
# Create a new workspace
mkdir my_experiment_workspace
cd my_experiment_workspace

# Add notebooks and scripts
# (See examples/jupyter_experiment_integration.py for sample setup)
```

### 2. Start Jupyter service

```bash
# Start Jupyter with workspace
ml jupyter start --workspace ./my_experiment_workspace --name my_experiment

# Access at http://localhost:8888
```

### 3. Link workspace with experiment

```bash
# Create experiment
ml experiment create --name "my_experiment" --description "Test experiment"

# Link workspace with experiment
ml jupyter experiment link --workspace ./my_experiment_workspace --experiment <experiment_id>
```

### 4. Work in Jupyter

- Open notebooks in browser
- Develop and test code interactively
- Use MLflow for experiment tracking
- Save results and models

### 5. Queue for production

```bash
# Queue experiment from workspace
ml jupyter experiment queue --workspace ./my_experiment_workspace --script experiment.py --name "production_run"

# Monitor progress
ml status
ml monitor
```

### 6. Sync data

```bash
# Push workspace changes to experiment
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction push

# Pull experiment results to workspace
ml jupyter experiment sync --workspace ./my_experiment_workspace --direction pull
```

## CLI Commands

### `ml jupyter experiment link`

Link a Jupyter workspace with an experiment.

```bash
ml jupyter experiment link --workspace <path> --experiment <id>
```

**Options:**
- `--workspace`: Path to Jupyter workspace (default: ./workspace)
- `--experiment`: Experiment ID to link with

**Creates:**
- `.jupyter_experiment.json` metadata file in workspace
- Link record in workspace metadata manager
- Association between workspace and experiment

### `ml jupyter experiment queue`

Queue an experiment from a linked workspace.

```bash
ml jupyter experiment queue --workspace <path> --script <file> --name <name>
```

**Options:**
- `--workspace`: Path to workspace (default: ./workspace)
- `--script`: Python script to execute
- `--name`: Name for the queued job

**Behavior:**
- Detects linked experiment automatically
- Passes experiment context to job queue
- Uses workspace resources and configuration

### `ml jupyter experiment sync`

Synchronize data between workspace and experiment.

```bash
ml jupyter experiment sync --workspace <path> --direction <pull|push>
```

**Options:**
- `--workspace`: Path to workspace (default: ./workspace)
- `--direction`: Sync direction (pull or push)

**Sync Types:**
- **Pull**: Download experiment metrics, results, and data to workspace
- **Push**: Upload workspace notebooks, scripts, and results to experiment

### `ml jupyter experiment status`

Show experiment status for a workspace.

```bash
ml jupyter experiment status [workspace_path]
```

**Displays:**
- Linked experiment information
- Last sync time
- Experiment metadata
- Service association

## API Endpoints

### `/api/jupyter/experiments/link`

**Method:** POST

Link a workspace with an experiment.

```json
{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "service_id": "jupyter-service-456"
}
```

**Response:**
```json
{
  "status": "linked",
  "data": {
    "workspace_path": "/path/to/workspace",
    "experiment_id": "experiment_123",
    "linked_at": "2023-12-06T10:30:00Z",
    "sync_direction": "bidirectional"
  }
}
```

### `/api/jupyter/experiments/sync`

**Method:** POST

Synchronize workspace with experiment.

```json
{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "direction": "push",
  "sync_type": "all"
}
```

**Response:**
```json
{
  "workspace": "/path/to/workspace",
  "experiment_id": "experiment_123",
  "direction": "push",
  "sync_type": "all",
  "synced_at": "2023-12-06T10:35:00Z",
  "status": "completed"
}
```

### `/api/jupyter/services`

**Methods:** GET, POST, DELETE

Manage Jupyter services.

**GET:** List all services
**POST:** Start new service
**DELETE:** Stop service

## Workspace Metadata

### `.jupyter_experiment.json`

Each linked workspace contains a metadata file:

```json
{
  "experiment_id": "experiment_123",
  "service_id": "jupyter-service-456",
  "linked_at": 1701864600,
  "last_sync": 1701865200,
  "sync_direction": "bidirectional",
  "auto_sync": false,
  "jupyter_integration": true,
  "workspace_path": "/path/to/workspace",
  "tags": ["development", "ml-experiment"]
}
```

### Metadata Manager

The workspace metadata manager maintains:

- Workspace-experiment relationships
- Sync history and timestamps
- Auto-sync configuration
- Tags and additional metadata
- Service associations

## Best Practices

### Workspace Organization

1. **One workspace per experiment** - Keep workspaces focused on specific experiments
2. **Use descriptive names** - Name workspaces and services clearly
3. **Version control** - Track workspace changes with git
4. **Clean separation** - Separate data, code, and results

### Experiment Development Workflow

1. **Create workspace** with notebooks and scripts
2. **Link with experiment** for tracking
3. **Develop interactively** in Jupyter
4. **Test locally** with sample data
5. **Queue for production** when ready
6. **Monitor results** and iterate

### Data Management

1. **Use requirements.txt** for dependencies
2. **Store data separately** from notebooks
3. **Use MLflow** for experiment tracking
4. **Sync regularly** to preserve work
5. **Clean up** old workspaces

### Resource Management

1. **Monitor service usage** with `ml jupyter list`
2. **Stop unused services** with `ml jupyter stop`
3. **Use resource limits** in configuration
4. **Enable auto-sync** for automated workflows

## Troubleshooting

### Common Issues

**Workspace not linked:**
```bash
Error: No experiment link found in workspace
```
**Solution:** Run `ml jupyter experiment link` first

**Service not found:**
```bash
Error: Service not found
```
**Solution:** Check service name with `ml jupyter list`

**Sync failed:**
```bash
Error: Failed to sync workspace
```
**Solution:** Check workspace permissions and experiment exists

### Debug Commands

```bash
# Check workspace metadata
cat ./workspace/.jupyter_experiment.json

# List all services
ml jupyter list

# Check experiment status
ml jupyter experiment status

# View service logs
podman logs <service_id>
```

### Recovery

**Lost workspace link:**
1. Find experiment ID with `ml experiment list`
2. Re-link with `ml jupyter experiment link`
3. Sync data with `ml jupyter experiment sync --direction pull`

**Service stuck:**
1. Stop with `ml jupyter stop <service_name>`
2. Check logs for errors
3. Restart with `ml jupyter start`

## Examples

### Complete Workflow

```bash
# 1. Setup workspace
mkdir my_ml_project
cd my_ml_project
echo "numpy>=1.20.0" > requirements.txt
echo "mlflow>=1.20.0" >> requirements.txt

# 2. Start Jupyter
ml jupyter start --workspace . --name my_project

# 3. Create experiment
ml experiment create --name "my_project" --description "ML project experiment"

# 4. Link workspace
ml jupyter experiment link --workspace . --experiment <experiment_id>

# 5. Work in Jupyter (browser)
# - Create notebooks
# - Write experiment scripts
# - Test locally

# 6. Queue for production
ml jupyter experiment queue --workspace . --script train_model.py --name "production_run"

# 7. Monitor
ml status
ml monitor

# 8. Sync results
ml jupyter experiment sync --workspace . --direction pull

# 9. Cleanup
ml jupyter stop my_project
```

### Python Integration

```python
import requests
import json

# Link workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/link', json={
    'workspace': '/path/to/workspace',
    'experiment_id': 'experiment_123'
})

# Sync workspace
response = requests.post('http://localhost:9101/api/jupyter/experiments/sync', json={
    'workspace': '/path/to/workspace',
    'experiment_id': 'experiment_123',
    'direction': 'push',
    'sync_type': 'all'
})
```

## Configuration

### Service Configuration

Jupyter services can be configured with experiment-specific settings:

```yaml
service:
  default_resources:
    memory_limit: "8G"
    cpu_limit: "2"
    gpu_access: false
  max_services: 5
  auto_sync_interval: "30m"
```

### Workspace Settings

Workspace metadata supports custom configuration:

```json
{
  "auto_sync": true,
  "sync_interval": "15m",
  "sync_direction": "bidirectional",
  "tags": ["development", "production"],
  "additional_data": {
    "environment": "test",
    "team": "ml-team"
  }
}
```

## Migration Guide

### From Standalone Jupyter

1. **Create workspace** from existing notebooks
2. **Link with experiment** using new commands
3. **Update scripts** to use experiment context
4. **Migrate data** to experiment storage
5. **Update workflows** to use integration

### From Job Queue Only

1. **Create workspace** for development
2. **Link with existing experiments**
3. **Add interactive development** phase
4. **Implement sync workflows**
5. **Update CI/CD pipelines**

## Future Enhancements

Planned improvements:

- **Auto-sync with file watching**
- **Workspace templates**
- **Collaborative workspaces**
- **Advanced resource sharing**
- **Git integration**
- **Docker compose support**
- **Kubernetes integration**
- **Advanced monitoring**