# Privacy & Security FetchML includes privacy-conscious features for research environments handling sensitive data. --- ## Privacy Levels Control experiment visibility with four privacy levels. ### Available Levels | Level | Visibility | Use Case | |-------|-----------|----------| | `private` | Owner only (default) | Sensitive/unpublished research | | `team` | Same team members | Collaborative team projects | | `public` | All authenticated users | Open research, shared datasets | | `anonymized` | All users with PII stripped | Public release, papers | ### Setting Privacy ```bash # Make experiment private (default) ml privacy set run_abc --level private # Share with team ml privacy set run_abc --level team --team vision-research # Make public within organization ml privacy set run_abc --level public # Prepare for anonymized export ml privacy set run_abc --level anonymized ``` ### Privacy in Manifest Privacy settings are stored in the experiment manifest: ```json { "privacy": { "level": "team", "team": "vision-research", "owner": "researcher@lab.edu" } } ``` --- ## PII Detection Automatically detect potentially identifying information in experiment metadata. ### What Gets Detected - **Email addresses** - `user@example.com` - **IP addresses** - `192.168.1.1`, `10.0.0.5` - **Phone numbers** - Basic pattern matching - **SSN patterns** - `123-45-6789` ### Using Privacy Scan When adding annotations with sensitive context: ```bash # Scan for PII before storing ml annotate run_abc \ --note "Contact at user@example.com for questions" \ --privacy-scan # Output: # Warning: Potential PII detected: # - email: 'user@example.com' # Use --force to store anyway, or edit your note. ``` ### Override Warnings If PII is intentional and acceptable: ```bash ml annotate run_abc \ --note "Contact at user@example.com" \ --privacy-scan \ --force ``` ### Redacting PII For anonymized exports, PII is automatically redacted: ```bash ml export run_abc --anonymize ``` Redacted content becomes: `[EMAIL-1]`, `[IP-1]`, etc. --- ## Anonymized Export Export experiments for external sharing without leaking sensitive information. ### Basic Anonymization ```bash ml export run_abc --bundle run_abc.tar.gz --anonymize ``` ### Anonymization Levels **Metadata-only** (default): - Strips internal paths: `/nas/private/data` → `/datasets/data` - Replaces internal IPs: `10.0.0.5` → `[INTERNAL-1]` - Hashes email addresses: `user@lab.edu` → `[RESEARCHER-A]` - Keeps experiment structure and metrics **Full**: - Everything in metadata-only, plus: - Removes logs entirely - Removes annotations - Redacts all PII from notes ```bash # Full anonymization ml export run_abc --anonymize --anonymize-level full ``` ### What Gets Anonymized | Original | Anonymized | Notes | |----------|------------|-------| | `/home/user/data` | `/workspace/data` | Paths generalized | | `/nas/private/lab` | `/datasets/lab` | Internal mounts hidden | | `user@lab.edu` | `[RESEARCHER-A]` | Consistent per user | | `10.0.0.5` | `[INTERNAL-1]` | IP ranges replaced | | `john@example.com` | `[EMAIL-1]` | PII redacted | ### Export Verification Review what's in the export: ```bash # Export and list contents ml export run_abc --anonymize -o /tmp/run_abc.tar.gz tar tzf /tmp/run_abc.tar.gz | head -20 ``` --- ## Dataset Identity & Checksums Verify dataset integrity with SHA256 checksums. ### Computing Checksums Datasets are automatically checksummed when registered: ```bash ml dataset register /path/to/dataset --name my-dataset # Computes SHA256 of all files in dataset ``` ### Verifying Datasets ```bash # Verify dataset integrity ml dataset verify /path/to/my-dataset # Output: # ✓ Dataset checksum verified # Expected: sha256:abc123... # Actual: sha256:abc123... ``` ### Checksum in Manifest ```json { "datasets": [{ "name": "imagenet-train", "checksum": "sha256:def456...", "sample_count": 1281167 }] } ``` --- ## Security Best Practices ### 1. Default to Private Keep experiments private until ready to share: ```bash # Private by default ml queue train.py --hypothesis "..." # Later, when ready to share ml privacy set run_abc --level team --team my-team ``` ### 2. Scan Before Sharing Always use `--privacy-scan` when adding notes that might contain PII: ```bash ml annotate run_abc --note "..." --privacy-scan ``` ### 3. Anonymize for External Release Before exporting for papers or public release: ```bash ml export run_abc --anonymize --anonymize-level full ``` ### 4. Verify Dataset Integrity Regularly verify datasets, especially shared ones: ```bash ml dataset verify /path/to/shared/dataset ``` ### 5. Use Team Privacy for Collaboration Share with specific teams rather than making public: ```bash ml privacy set run_abc --level team --team ml-group ``` --- ## Compliance Considerations ### GDPR / Research Ethics | Requirement | FetchML Support | Status | |-------------|-----------------|--------| | Right to access | `ml export` creates data bundles | ✅ | | Right to erasure | Delete command (future) | ⏳ | | Data minimization | Narrative fields collect only necessary data | ✅ | | PII detection | `ml annotate --privacy-scan` | ✅ | | Anonymization | `ml export --anonymize` | ✅ | ### Handling Sensitive Data For experiments with sensitive data: 1. **Keep private**: Use `--level private` 2. **PII scan all annotations**: Always use `--privacy-scan` 3. **Anonymize before export**: Use `--anonymize-level full` 4. **Verify team membership**: Before sharing at `--level team` --- ## Configuration ### Worker Privacy Settings Configure privacy defaults in worker config: ```yaml privacy: default_level: private enforce_teams: true audit_access: true ``` ### API Server Privacy Enable privacy enforcement: ```yaml security: privacy: enabled: true default_level: private audit_access: true ``` --- ## Troubleshooting ### PII Scan False Positives Some valid text may trigger PII warnings: ```bash # Example: "batch@32" looks like email ml annotate run_abc --note "Use batch@32 for training" --privacy-scan # Warning triggers, use --force if intended ``` ### Privacy Changes Not Applied - Verify you own the experiment - Check server supports privacy enforcement - Try with explicit base path: `--base /path/to/experiments` ### Export Not Anonymized - Ensure `--anonymize` flag is set - Check `--anonymize-level` is correct (metadata-only vs full) - Verify manifest contains privacy data --- ## See Also - `docs/src/research-features.md` - Research workflow features - `docs/src/deployment.md` - Production deployment with privacy - `docs/src/quick-start.md` - Getting started guide