Add Jupyter workflow documentation to docs/

- Create comprehensive Jupyter Workflow Integration guide
- Include ML tools usage examples (MLflow, Streamlit, Dash, Panel, Bokeh)
- Add CLI integration instructions and troubleshooting
- Update docs index to reference new workflow guide
- Provide complete setup and usage instructions
This commit is contained in:
Jeremie Fraeys 2025-12-06 16:02:49 -05:00
parent 7312451cfe
commit 9dd4261873
2 changed files with 137 additions and 0 deletions

View file

@ -45,6 +45,7 @@ make quick-start
- [**Architecture**](architecture.md) - System architecture and design
- [**CLI Reference**](cli-reference.md) - Command-line interface documentation
- [**Testing Guide**](testing.md) - Testing procedures and guidelines
- [**Jupyter Workflow**](jupyter-workflow.md) - CLI and Jupyter integration
- [**Queue System**](queue.md) - Job queue implementation
### 🏭 Production Deployment

View file

@ -0,0 +1,136 @@
# Jupyter Workflow Integration
## Overview
This guide shows how to integrate FetchML CLI with Jupyter notebooks for seamless data science experiments using pre-installed ML tools.
## Quick Start
### 1. Build the ML Tools Container
```bash
cd podman
podman build -f ml-tools-runner.podfile -t ml-tools-runner .
```
### 2. Start Jupyter Server
```bash
# Using the launcher script
./jupyter_launcher.sh
# Or manually
podman run -d -p 8888:8889 --name ml-jupyter \
-v "$(pwd)/workspace:/workspace:Z" \
--user root --entrypoint bash localhost/ml-tools-runner \
-c "mkdir -p /home/mlrunner/.local/share/jupyter/runtime && \
chown -R mlrunner:mlrunner /home/mlrunner && \
su - mlrunner -c 'conda run -n ml_env jupyter notebook \
--no-browser --ip=0.0.0.0 --port=8888 \
--NotebookApp.token= --NotebookApp.password= --allow-root'"
```
### 3. Access Jupyter
Open http://localhost:8889 in your browser.
## Available ML Tools
The container includes these pre-installed tools:
- **MLflow 3.7.0** - Experiment tracking and model registry
- **Streamlit 1.52.1** - Interactive web apps
- **Dash 3.3.0** - Plotly-based dashboards
- **Panel 1.8.4** - Data apps and dashboards
- **Bokeh 3.8.1** - Interactive visualizations
- **WandB 0.23.1** - Experiment tracking (requires API key)
## Using ML Tools in Jupyter
### MLflow Example
```python
import mlflow
# Start tracking
with mlflow.start_run() as run:
mlflow.log_param("model", "random_forest")
mlflow.log_metric("accuracy", 0.95)
print(f"Run ID: {run.info.run_id}")
```
### Streamlit Example
```python
import streamlit as st
st.title("My ML App")
st.write("Interactive dashboard with real-time updates")
```
### Dash Example
```python
import dash
import dash_core_components as dcc
import dash_html_components as html
app = dash.Dash(__name__)
app.run_server(debug=True, host='0.0.0.0', port=8050)
```
## CLI Integration
### Sync Projects
```bash
# From another terminal
cd cli && ./zig-out/bin/ml sync ./my_project --queue
# Check status
./cli/zig-out/bin/ml status
```
### Monitor Jobs
```bash
# Monitor running experiments
./cli/zig-out/bin/ml monitor
# View experiment logs
./cli/zig-out/bin/ml experiment log my_experiment
```
## Workflow Example
1. **Start Jupyter**: Run the launcher script
2. **Create Notebook**: Use the sample templates in `workspace/notebooks/`
3. **Run Experiments**: Use ML tools for tracking and visualization
4. **Sync with CLI**: Use CLI commands to manage experiments
5. **Monitor Progress**: Track jobs from terminal while working in Jupyter
## Security Features
- Container isolation with Podman
- Network access limited to localhost
- Pre-approved ML tools only
- Non-root user execution
- Resource limits enforced
## Troubleshooting
### Jupyter Won't Start
```bash
# Check container logs
podman logs ml-jupyter
# Restart with proper permissions
podman rm ml-jupyter && ./jupyter_launcher.sh
```
### ML Tools Not Available
```bash
# Test tools in container
podman exec ml-jupyter conda run -n ml_env python -c "import mlflow; print(mlflow.__version__)"
```
### CLI Connection Issues
```bash
# Check CLI status
./cli/zig-out/bin/ml status
# Test sync without server
./cli/zig-out/bin/ml sync ./podman/workspace --test
```