From f35762468559e8c38f4da00cf2bc71908e47b578 Mon Sep 17 00:00:00 2001
From: Jeremie Fraeys <jfaeys@gmail.com>
Date: Wed, 18 Feb 2026 21:28:25 -0500
Subject: [PATCH] docs: Update CHANGELOG and add feature documentation

Update documentation for new features:
- Add CHANGELOG entries for research features and privacy enhancements
- Update README with new CLI commands and security features
- Add privacy-security.md documentation for PII detection
- Add research-features.md for narrative and outcome tracking
---
 CHANGELOG.md                  |  31 ++++
 README.md                     |  11 ++
 docs/src/privacy-security.md  | 320 ++++++++++++++++++++++++++++++++++
 docs/src/research-features.md | 320 ++++++++++++++++++++++++++++++++++
 4 files changed, 682 insertions(+)
 create mode 100644 docs/src/privacy-security.md
 create mode 100644 docs/src/research-features.md

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7da9c4e..34c6e0f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,36 @@
 ## [Unreleased]
 
+### Added - CSV Export Features (2026-02-18)
+- CLI: `ml compare --csv` - Export run comparisons as CSV with actual run IDs as column headers
+- CLI: `ml find --csv` - Export search results as CSV for spreadsheet analysis  
+- CLI: `ml dataset verify --csv` - Export dataset verification metrics as CSV
+- Shell: Updated bash/zsh completions with --csv flags for compare, find commands
+
+### Added - Phase 3 Features (2026-02-18)
+- CLI: `ml requeue --with-changes` - Iterative experimentation with config overrides (--lr=0.002, etc.)
+- CLI: `ml requeue --inherit-narrative` - Copy hypothesis/context from parent run
+- CLI: `ml requeue --inherit-config` - Copy metadata from parent run
+- CLI: `ml requeue --parent` - Link as child run for provenance tracking
+- CLI: `ml dataset verify` - Fast dataset checksum validation
+- CLI: `ml logs --follow` - Real-time log streaming via WebSocket
+- API/WebSocket: Add opcodes for compare (0x30), find (0x31), export (0x32), set outcome (0x33)
+
+### Added - Phase 2 Features (2026-02-18)
+- CLI: `ml compare` - Diff two runs showing narrative/metadata/metrics differences
+- CLI: `ml find` - Search experiments by tags, outcome, dataset, experiment-group, author
+- CLI: `ml export --anonymize` - Export bundles with path/IP/username redaction
+- CLI: `ml export --anonymize-level` - 'metadata-only' or 'full' anonymization
+- CLI: `ml outcome set` - Post-run outcome tracking (validates/refutes/inconclusive/partial)
+- CLI: Error suggestions with Levenshtein distance for typos
+- Shell: Updated bash/zsh completions for all new commands
+- Tests: E2E tests for compare, find, export, requeue changes
+
+### Added - Phase 0 Features (2026-02-18)
+- CLI: Queue-time narrative flags (--hypothesis, --context, --intent, --expected-outcome, --experiment-group, --tags)
+- CLI: Enhanced `ml status` output with queue position [pos N] and priority (P:N)
+- CLI: `ml narrative set` command for setting run narrative fields
+- Shell: Updated completions with new commands and flags
+
 ### Security
 - Native: fix buffer overflow vulnerabilities in `dataset_hash` (replaced `strcpy` with `strncpy` + null termination)
 - Native: fix unsafe `memcpy` in `queue_index` priority queue (added explicit null terminators for string fields)
diff --git a/README.md b/README.md
index 6e2f3e7..4d490f4 100644
--- a/README.md
+++ b/README.md
@@ -100,6 +100,15 @@ ml queue my-job
 ml cancel my-job
 ml dataset list
 ml monitor  # SSH to run TUI remotely
+
+# Research features (see docs/src/research-features.md)
+ml queue train.py --hypothesis "LR scaling..." --tags ablation
+ml outcome set run_abc --outcome validates --summary "Accuracy +2%"
+ml find --outcome validates --tag lr-test
+ml compare run_abc run_def
+ml privacy set run_abc --level team
+ml export run_abc --anonymize
+ml dataset verify /path/to/data
 ```
 
 ## Phase 1 (V1) notes
@@ -150,6 +159,8 @@ See `docs/` for detailed guides:
 - `docs/src/zig-cli.md` – CLI reference
 - `docs/src/quick-start.md` – Full setup guide
 - `docs/src/deployment.md` – Production deployment
+- `docs/src/research-features.md` – Research workflow features (narrative capture, outcomes, search)
+- `docs/src/privacy-security.md` – Privacy levels, PII detection, anonymized export
 
 ## Source code
 
diff --git a/docs/src/privacy-security.md b/docs/src/privacy-security.md
new file mode 100644
index 0000000..d08a0d9
--- /dev/null
+++ b/docs/src/privacy-security.md
@@ -0,0 +1,320 @@
+# Privacy & Security
+
+FetchML includes privacy-conscious features for research environments handling sensitive data.
+
+---
+
+## Privacy Levels
+
+Control experiment visibility with four privacy levels.
+
+### Available Levels
+
+| Level | Visibility | Use Case |
+|-------|-----------|----------|
+| `private` | Owner only (default) | Sensitive/unpublished research |
+| `team` | Same team members | Collaborative team projects |
+| `public` | All authenticated users | Open research, shared datasets |
+| `anonymized` | All users with PII stripped | Public release, papers |
+
+### Setting Privacy
+
+```bash
+# Make experiment private (default)
+ml privacy set run_abc --level private
+
+# Share with team
+ml privacy set run_abc --level team --team vision-research
+
+# Make public within organization
+ml privacy set run_abc --level public
+
+# Prepare for anonymized export
+ml privacy set run_abc --level anonymized
+```
+
+### Privacy in Manifest
+
+Privacy settings are stored in the experiment manifest:
+
+```json
+{
+  "privacy": {
+    "level": "team",
+    "team": "vision-research",
+    "owner": "researcher@lab.edu"
+  }
+}
+```
+
+---
+
+## PII Detection
+
+Automatically detect potentially identifying information in experiment metadata.
+
+### What Gets Detected
+
+- **Email addresses** - `user@example.com`
+- **IP addresses** - `192.168.1.1`, `10.0.0.5`
+- **Phone numbers** - Basic pattern matching
+- **SSN patterns** - `123-45-6789`
+
+### Using Privacy Scan
+
+When adding annotations with sensitive context:
+
+```bash
+# Scan for PII before storing
+ml annotate run_abc \
+  --note "Contact at user@example.com for questions" \
+  --privacy-scan
+
+# Output:
+# Warning: Potential PII detected:
+#   - email: 'user@example.com'
+# Use --force to store anyway, or edit your note.
+```
+
+### Override Warnings
+
+If PII is intentional and acceptable:
+
+```bash
+ml annotate run_abc \
+  --note "Contact at user@example.com" \
+  --privacy-scan \
+  --force
+```
+
+### Redacting PII
+
+For anonymized exports, PII is automatically redacted:
+
+```bash
+ml export run_abc --anonymize
+```
+
+Redacted content becomes: `[EMAIL-1]`, `[IP-1]`, etc.
+
+---
+
+## Anonymized Export
+
+Export experiments for external sharing without leaking sensitive information.
+
+### Basic Anonymization
+
+```bash
+ml export run_abc --bundle run_abc.tar.gz --anonymize
+```
+
+### Anonymization Levels
+
+**Metadata-only** (default):
+- Strips internal paths: `/nas/private/data` → `/datasets/data`
+- Replaces internal IPs: `10.0.0.5` → `[INTERNAL-1]`
+- Hashes email addresses: `user@lab.edu` → `[RESEARCHER-A]`
+- Keeps experiment structure and metrics
+
+**Full**:
+- Everything in metadata-only, plus:
+- Removes logs entirely
+- Removes annotations
+- Redacts all PII from notes
+
+```bash
+# Full anonymization
+ml export run_abc --anonymize --anonymize-level full
+```
+
+### What Gets Anonymized
+
+| Original | Anonymized | Notes |
+|----------|------------|-------|
+| `/home/user/data` | `/workspace/data` | Paths generalized |
+| `/nas/private/lab` | `/datasets/lab` | Internal mounts hidden |
+| `user@lab.edu` | `[RESEARCHER-A]` | Consistent per user |
+| `10.0.0.5` | `[INTERNAL-1]` | IP ranges replaced |
+| `john@example.com` | `[EMAIL-1]` | PII redacted |
+
+### Export Verification
+
+Review what's in the export:
+
+```bash
+# Export and list contents
+ml export run_abc --anonymize -o /tmp/run_abc.tar.gz
+tar tzf /tmp/run_abc.tar.gz | head -20
+```
+
+---
+
+## Dataset Identity & Checksums
+
+Verify dataset integrity with SHA256 checksums.
+
+### Computing Checksums
+
+Datasets are automatically checksummed when registered:
+
+```bash
+ml dataset register /path/to/dataset --name my-dataset
+# Computes SHA256 of all files in dataset
+```
+
+### Verifying Datasets
+
+```bash
+# Verify dataset integrity
+ml dataset verify /path/to/my-dataset
+
+# Output:
+# ✓ Dataset checksum verified
+#   Expected: sha256:abc123...
+#   Actual:   sha256:abc123...
+```
+
+### Checksum in Manifest
+
+```json
+{
+  "datasets": [{
+    "name": "imagenet-train",
+    "checksum": "sha256:def456...",
+    "sample_count": 1281167
+  }]
+}
+```
+
+---
+
+## Security Best Practices
+
+### 1. Default to Private
+
+Keep experiments private until ready to share:
+
+```bash
+# Private by default
+ml queue train.py --hypothesis "..."
+
+# Later, when ready to share
+ml privacy set run_abc --level team --team my-team
+```
+
+### 2. Scan Before Sharing
+
+Always use `--privacy-scan` when adding notes that might contain PII:
+
+```bash
+ml annotate run_abc --note "..." --privacy-scan
+```
+
+### 3. Anonymize for External Release
+
+Before exporting for papers or public release:
+
+```bash
+ml export run_abc --anonymize --anonymize-level full
+```
+
+### 4. Verify Dataset Integrity
+
+Regularly verify datasets, especially shared ones:
+
+```bash
+ml dataset verify /path/to/shared/dataset
+```
+
+### 5. Use Team Privacy for Collaboration
+
+Share with specific teams rather than making public:
+
+```bash
+ml privacy set run_abc --level team --team ml-group
+```
+
+---
+
+## Compliance Considerations
+
+### GDPR / Research Ethics
+
+| Requirement | FetchML Support | Status |
+|-------------|-----------------|--------|
+| Right to access | `ml export` creates data bundles | ✅ |
+| Right to erasure | Delete command (future) | ⏳ |
+| Data minimization | Narrative fields collect only necessary data | ✅ |
+| PII detection | `ml annotate --privacy-scan` | ✅ |
+| Anonymization | `ml export --anonymize` | ✅ |
+
+### Handling Sensitive Data
+
+For experiments with sensitive data:
+
+1. **Keep private**: Use `--level private`
+2. **PII scan all annotations**: Always use `--privacy-scan`
+3. **Anonymize before export**: Use `--anonymize-level full`
+4. **Verify team membership**: Before sharing at `--level team`
+
+---
+
+## Configuration
+
+### Worker Privacy Settings
+
+Configure privacy defaults in worker config:
+
+```yaml
+privacy:
+  default_level: private
+  enforce_teams: true
+  audit_access: true
+```
+
+### API Server Privacy
+
+Enable privacy enforcement:
+
+```yaml
+security:
+  privacy:
+    enabled: true
+    default_level: private
+    audit_access: true
+```
+
+---
+
+## Troubleshooting
+
+### PII Scan False Positives
+
+Some valid text may trigger PII warnings:
+
+```bash
+# Example: "batch@32" looks like email
+ml annotate run_abc --note "Use batch@32 for training" --privacy-scan
+# Warning triggers, use --force if intended
+```
+
+### Privacy Changes Not Applied
+
+- Verify you own the experiment
+- Check server supports privacy enforcement
+- Try with explicit base path: `--base /path/to/experiments`
+
+### Export Not Anonymized
+
+- Ensure `--anonymize` flag is set
+- Check `--anonymize-level` is correct (metadata-only vs full)
+- Verify manifest contains privacy data
+
+---
+
+## See Also
+
+- `docs/src/research-features.md` - Research workflow features
+- `docs/src/deployment.md` - Production deployment with privacy
+- `docs/src/quick-start.md` - Getting started guide
diff --git a/docs/src/research-features.md b/docs/src/research-features.md
new file mode 100644
index 0000000..7405b5e
--- /dev/null
+++ b/docs/src/research-features.md
@@ -0,0 +1,320 @@
+# Research Features
+
+FetchML includes research-focused features for experiment tracking, knowledge capture, and collaboration.
+
+---
+
+## Queue-Time Narrative Capture
+
+Document your hypothesis and intent when queuing experiments. This creates provenance for your research and helps others understand your work.
+
+### Basic Usage
+
+```bash
+ml queue train.py \
+  --hypothesis "Linear LR scaling improves convergence" \
+  --context "Following up on Smith et al. 2023" \
+  --intent "Test batch=64 with 2x LR" \
+  --expected-outcome "Same accuracy, 50% less time" \
+  --experiment-group "lr-scaling-study" \
+  --tags "ablation,learning-rate,batch-size"
+```
+
+### Available Flags
+
+| Flag | Description | Example |
+|------|-------------|---------|
+| `--hypothesis` | What you expect to happen | "LR scaling improves convergence" |
+| `--context` | Background and motivation | "Following paper XYZ" |
+| `--intent` | What you're testing | "Test batch=64 with 2x LR" |
+| `--expected-outcome` | Predicted results | "Same accuracy, 50% less time" |
+| `--experiment-group` | Group related experiments | "batch-scaling" |
+| `--tags` | Comma-separated tags | "ablation,lr-test" |
+
+### Viewing Narrative
+
+After queuing, view the narrative with:
+
+```bash
+ml info run_abc
+```
+
+---
+
+## Post-Run Outcome Capture
+
+Record findings after experiments complete. This preserves institutional knowledge and helps track what worked.
+
+### Setting Outcomes
+
+```bash
+ml outcome set run_abc \
+  --outcome validates \
+  --summary "Accuracy improved 2.3% with 2x learning rate" \
+  --learning "Linear scaling works for batch sizes 32-128" \
+  --learning "GPU utilization increased from 60% to 95%" \
+  --next-step "Try batch=96 with gradient accumulation" \
+  --validation-status "cross-validated"
+```
+
+### Outcome Types
+
+- **validates** - Hypothesis confirmed
+- **refutes** - Hypothesis rejected
+- **inconclusive** - Results unclear
+- **partial** - Partial confirmation
+
+### Repeatable Fields
+
+Use multiple flags to capture multiple learnings or next steps:
+
+```bash
+ml outcome set run_abc \
+  --learning "Finding 1" \
+  --learning "Finding 2" \
+  --learning "Finding 3" \
+  --next-step "Follow-up experiment A" \
+  --next-step "Follow-up experiment B"
+```
+
+---
+
+## Experiment Search & Discovery
+
+Find past experiments with powerful filters and export results.
+
+### Basic Search
+
+```bash
+# Search by tags
+ml find --tag ablation
+
+# Search by outcome
+ml find --outcome validates
+
+# Search by experiment group
+ml find --experiment-group lr-scaling
+```
+
+### Combined Filters
+
+```bash
+# Multiple criteria
+ml find \
+  --tag ablation \
+  --outcome validates \
+  --dataset imagenet \
+  --after 2024-01-01 \
+  --before 2024-03-01
+
+# With limit
+ml find --tag production --limit 50
+```
+
+### Export Results
+
+```bash
+# JSON output for programmatic use
+ml find --outcome validates --json > results.json
+
+# CSV for analysis in spreadsheet tools
+ml find --experiment-group lr-study --csv > study_results.csv
+```
+
+### CSV Output Format
+
+The CSV includes columns:
+- `id` - Run identifier
+- `job_name` - Job name
+- `outcome` - validates/refutes/inconclusive/partial
+- `status` - running/finished/failed
+- `experiment_group` - Group name
+- `tags` - Comma-separated tags
+- `hypothesis` - Narrative hypothesis
+
+---
+
+## Experiment Comparison
+
+Compare two or more experiments side-by-side.
+
+### Basic Comparison
+
+```bash
+ml compare run_abc run_def
+```
+
+### Output Formats
+
+```bash
+# Human-readable (default)
+ml compare run_abc run_def
+
+# JSON for programmatic analysis
+ml compare run_abc run_def --json
+
+# CSV for spreadsheet analysis
+ml compare run_abc run_def --csv
+
+# Show all fields (including unchanged)
+ml compare run_abc run_def --all
+```
+
+### What Gets Compared
+
+- **Narrative fields** - Hypothesis, context, intent differences
+- **Metadata** - Batch size, learning rate, epochs, model, dataset
+- **Metrics** - Accuracy, loss, training time with deltas
+- **Outcomes** - validates vs refutes, etc.
+
+---
+
+## Dataset Verification
+
+Verify dataset integrity with SHA256 checksums.
+
+### Basic Verification
+
+```bash
+ml dataset verify /path/to/dataset
+```
+
+### Registration with Checksums
+
+When registering datasets, checksums are automatically computed:
+
+```bash
+ml dataset register /path/to/imagenet --name imagenet-train
+```
+
+### View Dataset Info
+
+```bash
+ml dataset info imagenet-train
+```
+
+This shows:
+- Dataset name and path
+- SHA256 checksums
+- Sample count
+- Privacy level (if set)
+
+---
+
+## Best Practices
+
+### 1. Always Document Hypothesis
+
+```bash
+ml queue train.py \
+  --hypothesis "Data augmentation reduces overfitting" \
+  --experiment-group "regularization-study"
+```
+
+### 2. Capture Outcomes Promptly
+
+Record findings while they're fresh:
+
+```bash
+ml outcome set run_abc \
+  --outcome validates \
+  --summary "Augmentation improved validation accuracy by 3%" \
+  --learning "Rotation=15 worked best"
+```
+
+### 3. Use Consistent Tags
+
+Establish tag conventions for your team:
+- `ablation` - Ablation studies
+- `baseline` - Baseline experiments
+- `production` - Production-ready models
+- `exploratory` - Initial exploration
+
+### 4. Export for Analysis
+
+Regularly export experiment data for analysis:
+
+```bash
+ml find --experiment-group my-study --csv > my_study.csv
+```
+
+### 5. Compare Systematically
+
+Use comparison to understand what changed:
+
+```bash
+# Compare to baseline
+ml compare baseline_run current_run
+
+# Compare successful runs
+ml compare run_abc run_def run_ghi
+```
+
+---
+
+## Integration with Research Workflow
+
+### Iterative Experiments
+
+```bash
+# Run baseline
+ml queue train.py --hypothesis "Baseline is strong" --tags baseline
+
+# Run variant A
+ml queue train_v2.py \
+  --hypothesis "V2 improves on baseline" \
+  --tags ablation,v2 \
+  --experiment-group v2-study
+
+# Compare results
+ml compare baseline_run v2_run
+
+# Document outcome
+ml outcome set v2_run --outcome validates --summary "V2 improved 2.3%"
+```
+
+### Ablation Studies
+
+```bash
+# Full model
+ml queue train.py --hypothesis "Full model works best" --tags ablation,full
+
+# Without component A
+ml queue train_no_a.py --hypothesis "Component A is critical" --tags ablation,no-a
+
+# Without component B
+ml queue train_no_b.py --hypothesis "Component B adds value" --tags ablation,no-b
+
+# Search all ablations
+ml find --tag ablation --csv > ablations.csv
+```
+
+---
+
+## Troubleshooting
+
+### Search Returns No Results
+
+- Check your filters aren't too restrictive
+- Try broadening date ranges
+- Verify tags are spelled correctly
+
+### Outcome Not Saved
+
+- Ensure run ID exists: `ml info run_abc`
+- Check you have permission to modify the run
+- Try with explicit base path: `ml outcome set run_abc ... --base /path/to/experiments`
+
+### Comparison Shows No Differences
+
+- Use `--all` flag to show unchanged fields
+- Verify you're comparing different runs
+- Check that runs have narrative data
+
+---
+
+## See Also
+
+- `docs/src/privacy-security.md` - Privacy levels and PII detection
+- `docs/src/quick-start.md` - Full setup guide
+- `docs/src/zig-cli.md` - Complete CLI reference