fetch_ml/docs/src/research-features.md
Jeremie Fraeys f357624685
docs: Update CHANGELOG and add feature documentation
Update documentation for new features:
- Add CHANGELOG entries for research features and privacy enhancements
- Update README with new CLI commands and security features
- Add privacy-security.md documentation for PII detection
- Add research-features.md for narrative and outcome tracking
2026-02-18 21:28:25 -05:00

6.8 KiB

Research Features

FetchML includes research-focused features for experiment tracking, knowledge capture, and collaboration.


Queue-Time Narrative Capture

Document your hypothesis and intent when queuing experiments. This creates provenance for your research and helps others understand your work.

Basic Usage

ml queue train.py \
  --hypothesis "Linear LR scaling improves convergence" \
  --context "Following up on Smith et al. 2023" \
  --intent "Test batch=64 with 2x LR" \
  --expected-outcome "Same accuracy, 50% less time" \
  --experiment-group "lr-scaling-study" \
  --tags "ablation,learning-rate,batch-size"

Available Flags

Flag Description Example
--hypothesis What you expect to happen "LR scaling improves convergence"
--context Background and motivation "Following paper XYZ"
--intent What you're testing "Test batch=64 with 2x LR"
--expected-outcome Predicted results "Same accuracy, 50% less time"
--experiment-group Group related experiments "batch-scaling"
--tags Comma-separated tags "ablation,lr-test"

Viewing Narrative

After queuing, view the narrative with:

ml info run_abc

Post-Run Outcome Capture

Record findings after experiments complete. This preserves institutional knowledge and helps track what worked.

Setting Outcomes

ml outcome set run_abc \
  --outcome validates \
  --summary "Accuracy improved 2.3% with 2x learning rate" \
  --learning "Linear scaling works for batch sizes 32-128" \
  --learning "GPU utilization increased from 60% to 95%" \
  --next-step "Try batch=96 with gradient accumulation" \
  --validation-status "cross-validated"

Outcome Types

  • validates - Hypothesis confirmed
  • refutes - Hypothesis rejected
  • inconclusive - Results unclear
  • partial - Partial confirmation

Repeatable Fields

Use multiple flags to capture multiple learnings or next steps:

ml outcome set run_abc \
  --learning "Finding 1" \
  --learning "Finding 2" \
  --learning "Finding 3" \
  --next-step "Follow-up experiment A" \
  --next-step "Follow-up experiment B"

Experiment Search & Discovery

Find past experiments with powerful filters and export results.

# Search by tags
ml find --tag ablation

# Search by outcome
ml find --outcome validates

# Search by experiment group
ml find --experiment-group lr-scaling

Combined Filters

# Multiple criteria
ml find \
  --tag ablation \
  --outcome validates \
  --dataset imagenet \
  --after 2024-01-01 \
  --before 2024-03-01

# With limit
ml find --tag production --limit 50

Export Results

# JSON output for programmatic use
ml find --outcome validates --json > results.json

# CSV for analysis in spreadsheet tools
ml find --experiment-group lr-study --csv > study_results.csv

CSV Output Format

The CSV includes columns:

  • id - Run identifier
  • job_name - Job name
  • outcome - validates/refutes/inconclusive/partial
  • status - running/finished/failed
  • experiment_group - Group name
  • tags - Comma-separated tags
  • hypothesis - Narrative hypothesis

Experiment Comparison

Compare two or more experiments side-by-side.

Basic Comparison

ml compare run_abc run_def

Output Formats

# Human-readable (default)
ml compare run_abc run_def

# JSON for programmatic analysis
ml compare run_abc run_def --json

# CSV for spreadsheet analysis
ml compare run_abc run_def --csv

# Show all fields (including unchanged)
ml compare run_abc run_def --all

What Gets Compared

  • Narrative fields - Hypothesis, context, intent differences
  • Metadata - Batch size, learning rate, epochs, model, dataset
  • Metrics - Accuracy, loss, training time with deltas
  • Outcomes - validates vs refutes, etc.

Dataset Verification

Verify dataset integrity with SHA256 checksums.

Basic Verification

ml dataset verify /path/to/dataset

Registration with Checksums

When registering datasets, checksums are automatically computed:

ml dataset register /path/to/imagenet --name imagenet-train

View Dataset Info

ml dataset info imagenet-train

This shows:

  • Dataset name and path
  • SHA256 checksums
  • Sample count
  • Privacy level (if set)

Best Practices

1. Always Document Hypothesis

ml queue train.py \
  --hypothesis "Data augmentation reduces overfitting" \
  --experiment-group "regularization-study"

2. Capture Outcomes Promptly

Record findings while they're fresh:

ml outcome set run_abc \
  --outcome validates \
  --summary "Augmentation improved validation accuracy by 3%" \
  --learning "Rotation=15 worked best"

3. Use Consistent Tags

Establish tag conventions for your team:

  • ablation - Ablation studies
  • baseline - Baseline experiments
  • production - Production-ready models
  • exploratory - Initial exploration

4. Export for Analysis

Regularly export experiment data for analysis:

ml find --experiment-group my-study --csv > my_study.csv

5. Compare Systematically

Use comparison to understand what changed:

# Compare to baseline
ml compare baseline_run current_run

# Compare successful runs
ml compare run_abc run_def run_ghi

Integration with Research Workflow

Iterative Experiments

# Run baseline
ml queue train.py --hypothesis "Baseline is strong" --tags baseline

# Run variant A
ml queue train_v2.py \
  --hypothesis "V2 improves on baseline" \
  --tags ablation,v2 \
  --experiment-group v2-study

# Compare results
ml compare baseline_run v2_run

# Document outcome
ml outcome set v2_run --outcome validates --summary "V2 improved 2.3%"

Ablation Studies

# Full model
ml queue train.py --hypothesis "Full model works best" --tags ablation,full

# Without component A
ml queue train_no_a.py --hypothesis "Component A is critical" --tags ablation,no-a

# Without component B
ml queue train_no_b.py --hypothesis "Component B adds value" --tags ablation,no-b

# Search all ablations
ml find --tag ablation --csv > ablations.csv

Troubleshooting

Search Returns No Results

  • Check your filters aren't too restrictive
  • Try broadening date ranges
  • Verify tags are spelled correctly

Outcome Not Saved

  • Ensure run ID exists: ml info run_abc
  • Check you have permission to modify the run
  • Try with explicit base path: ml outcome set run_abc ... --base /path/to/experiments

Comparison Shows No Differences

  • Use --all flag to show unchanged fields
  • Verify you're comparing different runs
  • Check that runs have narrative data

See Also

  • docs/src/privacy-security.md - Privacy levels and PII detection
  • docs/src/quick-start.md - Full setup guide
  • docs/src/zig-cli.md - Complete CLI reference