docs: Update CHANGELOG and add feature documentation

Update documentation for new features:
- Add CHANGELOG entries for research features and privacy enhancements
- Update README with new CLI commands and security features
- Add privacy-security.md documentation for PII detection
- Add research-features.md for narrative and outcome tracking

2026-02-18 21:28:25 -05:00

6.8 KiB

Raw Blame History

Research Features

FetchML includes research-focused features for experiment tracking, knowledge capture, and collaboration.

Queue-Time Narrative Capture

Document your hypothesis and intent when queuing experiments. This creates provenance for your research and helps others understand your work.

Basic Usage

ml queue train.py \
  --hypothesis "Linear LR scaling improves convergence" \
  --context "Following up on Smith et al. 2023" \
  --intent "Test batch=64 with 2x LR" \
  --expected-outcome "Same accuracy, 50% less time" \
  --experiment-group "lr-scaling-study" \
  --tags "ablation,learning-rate,batch-size"

Available Flags

Flag	Description	Example
`--hypothesis`	What you expect to happen	"LR scaling improves convergence"
`--context`	Background and motivation	"Following paper XYZ"
`--intent`	What you're testing	"Test batch=64 with 2x LR"
`--expected-outcome`	Predicted results	"Same accuracy, 50% less time"
`--experiment-group`	Group related experiments	"batch-scaling"
`--tags`	Comma-separated tags	"ablation,lr-test"

Viewing Narrative

After queuing, view the narrative with:

ml info run_abc

Post-Run Outcome Capture

Record findings after experiments complete. This preserves institutional knowledge and helps track what worked.

Setting Outcomes

ml outcome set run_abc \
  --outcome validates \
  --summary "Accuracy improved 2.3% with 2x learning rate" \
  --learning "Linear scaling works for batch sizes 32-128" \
  --learning "GPU utilization increased from 60% to 95%" \
  --next-step "Try batch=96 with gradient accumulation" \
  --validation-status "cross-validated"

Outcome Types

validates - Hypothesis confirmed
refutes - Hypothesis rejected
inconclusive - Results unclear
partial - Partial confirmation

Repeatable Fields

Use multiple flags to capture multiple learnings or next steps:

ml outcome set run_abc \
  --learning "Finding 1" \
  --learning "Finding 2" \
  --learning "Finding 3" \
  --next-step "Follow-up experiment A" \
  --next-step "Follow-up experiment B"

Experiment Search & Discovery

Find past experiments with powerful filters and export results.

Basic Search

# Search by tags
ml find --tag ablation

# Search by outcome
ml find --outcome validates

# Search by experiment group
ml find --experiment-group lr-scaling

Combined Filters

# Multiple criteria
ml find \
  --tag ablation \
  --outcome validates \
  --dataset imagenet \
  --after 2024-01-01 \
  --before 2024-03-01

# With limit
ml find --tag production --limit 50

Export Results

# JSON output for programmatic use
ml find --outcome validates --json > results.json

# CSV for analysis in spreadsheet tools
ml find --experiment-group lr-study --csv > study_results.csv

CSV Output Format

The CSV includes columns:

id - Run identifier
job_name - Job name
outcome - validates/refutes/inconclusive/partial
status - running/finished/failed
experiment_group - Group name
tags - Comma-separated tags
hypothesis - Narrative hypothesis

Experiment Comparison

Compare two or more experiments side-by-side.

Basic Comparison

ml compare run_abc run_def

Output Formats

# Human-readable (default)
ml compare run_abc run_def

# JSON for programmatic analysis
ml compare run_abc run_def --json

# CSV for spreadsheet analysis
ml compare run_abc run_def --csv

# Show all fields (including unchanged)
ml compare run_abc run_def --all

What Gets Compared

Narrative fields - Hypothesis, context, intent differences
Metadata - Batch size, learning rate, epochs, model, dataset
Metrics - Accuracy, loss, training time with deltas
Outcomes - validates vs refutes, etc.

Dataset Verification

Verify dataset integrity with SHA256 checksums.

Basic Verification

ml dataset verify /path/to/dataset

Registration with Checksums

When registering datasets, checksums are automatically computed:

ml dataset register /path/to/imagenet --name imagenet-train

View Dataset Info

ml dataset info imagenet-train

This shows:

Dataset name and path
SHA256 checksums
Sample count
Privacy level (if set)

Best Practices

1. Always Document Hypothesis

ml queue train.py \
  --hypothesis "Data augmentation reduces overfitting" \
  --experiment-group "regularization-study"

2. Capture Outcomes Promptly

Record findings while they're fresh:

ml outcome set run_abc \
  --outcome validates \
  --summary "Augmentation improved validation accuracy by 3%" \
  --learning "Rotation=15 worked best"

3. Use Consistent Tags

Establish tag conventions for your team:

ablation - Ablation studies
baseline - Baseline experiments
production - Production-ready models
exploratory - Initial exploration

4. Export for Analysis

Regularly export experiment data for analysis:

ml find --experiment-group my-study --csv > my_study.csv

5. Compare Systematically

Use comparison to understand what changed:

# Compare to baseline
ml compare baseline_run current_run

# Compare successful runs
ml compare run_abc run_def run_ghi

Integration with Research Workflow

Iterative Experiments

# Run baseline
ml queue train.py --hypothesis "Baseline is strong" --tags baseline

# Run variant A
ml queue train_v2.py \
  --hypothesis "V2 improves on baseline" \
  --tags ablation,v2 \
  --experiment-group v2-study

# Compare results
ml compare baseline_run v2_run

# Document outcome
ml outcome set v2_run --outcome validates --summary "V2 improved 2.3%"

Ablation Studies

# Full model
ml queue train.py --hypothesis "Full model works best" --tags ablation,full

# Without component A
ml queue train_no_a.py --hypothesis "Component A is critical" --tags ablation,no-a

# Without component B
ml queue train_no_b.py --hypothesis "Component B adds value" --tags ablation,no-b

# Search all ablations
ml find --tag ablation --csv > ablations.csv

Troubleshooting

Search Returns No Results

Check your filters aren't too restrictive
Try broadening date ranges
Verify tags are spelled correctly

Outcome Not Saved

Ensure run ID exists: ml info run_abc
Check you have permission to modify the run
Try with explicit base path: ml outcome set run_abc ... --base /path/to/experiments

Comparison Shows No Differences

Use --all flag to show unchanged fields
Verify you're comparing different runs
Check that runs have narrative data

6.8 KiB Raw Blame History