fetch_ml/README.md
Jeremie Fraeys f357624685
docs: Update CHANGELOG and add feature documentation
Update documentation for new features:
- Add CHANGELOG entries for research features and privacy enhancements
- Update README with new CLI commands and security features
- Add privacy-security.md documentation for PII detection
- Add research-features.md for narrative and outcome tracking
2026-02-18 21:28:25 -05:00

175 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# FetchML
A lightweight ML experiment platform with a tiny Zig CLI and a Go backend. Designed for homelabs and small teams.
## Installation (recommended)
FetchML publishes pre-built release artifacts (CLI + Go services) on GitHub Releases.
If you prefer a one-shot check (recommended for most users), you can use:
```bash
./scripts/verify_release.sh --dir . --repo <org>/<repo>
```
1) Download the right archive for your platform
2) Verify `checksums.txt` signature (recommended)
The release includes a signed `checksums.txt` plus:
- `checksums.txt.sig`
- `checksums.txt.cert`
Verify the signature (keyless Sigstore) using cosign:
```bash
cosign verify-blob \
--certificate checksums.txt.cert \
--signature checksums.txt.sig \
--certificate-identity-regexp "^https://github.com/jfraeysd/fetch_ml/.forgejo/workflows/release-mirror.yml@refs/tags/v.*$" \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
checksums.txt
```
3) Verify the SHA256 checksum against `checksums.txt`
4) Extract and install
Example (CLI on Linux x86_64):
```bash
# Download
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/ml-linux-x86_64.tar.gz
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt.sig
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt.cert
# Verify
cosign verify-blob \
--certificate checksums.txt.cert \
--signature checksums.txt.sig \
--certificate-identity-regexp "^https://github.com/jfraeysd/fetch_ml/.forgejo/workflows/release-mirror.yml@refs/tags/v.*$" \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
checksums.txt
sha256sum -c --ignore-missing checksums.txt
# Install
tar -xzf ml-linux-x86_64.tar.gz
chmod +x ml-linux-x86_64
sudo mv ml-linux-x86_64 /usr/local/bin/ml
ml --help
```
## Quick start
```bash
# Clone and run (dev)
git clone <your-repo>
cd fetch_ml
make dev-up
# Or build the CLI locally
cd cli && make all
./zig-out/bin/ml --help
```
## What you get
- **Zig CLI** (`ml`): Tiny, fast local client. Uses `~/.ml/config.toml` and `FETCH_ML_CLI_*` env vars.
- **Go backends**: API server, worker, and a TUI for richer remote features.
- **TUI over SSH**: `ml monitor` launches the TUI on the server, keeping the local CLI minimal.
- **CI/CD**: Crossplatform builds with `zig build-exe` and Go releases.
## CLI usage
```bash
# Configure
cat > ~/.ml/config.toml <<EOF
worker_host = "127.0.0.1"
worker_user = "dev_user"
worker_base = "/tmp/ml-experiments"
worker_port = 22
api_key = "your-api-key"
EOF
# Core commands
ml status
ml queue my-job
ml cancel my-job
ml dataset list
ml monitor # SSH to run TUI remotely
# Research features (see docs/src/research-features.md)
ml queue train.py --hypothesis "LR scaling..." --tags ablation
ml outcome set run_abc --outcome validates --summary "Accuracy +2%"
ml find --outcome validates --tag lr-test
ml compare run_abc run_def
ml privacy set run_abc --level team
ml export run_abc --anonymize
ml dataset verify /path/to/data
```
## Phase 1 (V1) notes
- **Task schema** supports optional `snapshot_id` (opaque identifier) and `dataset_specs` (structured dataset inputs). If `dataset_specs` is present it takes precedence over legacy `datasets` / `--datasets` args.
- **Snapshot restore (S1)** stages verified `snapshot_id` into each task workspace and exposes it via `FETCH_ML_SNAPSHOT_DIR` and `FETCH_ML_SNAPSHOT_ID`. If `snapshot_store.enabled: true` in the worker config, the worker will pull `<prefix>/<snapshot_id>.tar.gz` from an S3-compatible store (e.g. MinIO), verify `snapshot_sha256`, and cache it under `data_dir/snapshots/sha256/<snapshot_sha256>`.
- **Prewarm (best-effort)** can fetch datasets for the next queued task while another task is running. Prewarm state is surfaced in `ml status --json` under the optional `prewarm` field.
- **Env prewarm (best-effort)** can build a warmed Podman image keyed by `deps_manifest_sha256` and reuse it for later tasks.
## Changelog
See `CHANGELOG.md`.
## Build
### Native C++ Libraries (Optional)
FetchML includes optional C++ native libraries for performance. See `docs/src/native-libraries.md` for detailed build instructions.
Quick start:
```bash
make native-build # Build native libs
make native-smoke # Run smoke test
export FETCHML_NATIVE_LIBS=1 # Enable at runtime
```
### Standard Build
```bash
# CLI (Zig)
cd cli && make all # release-small
make tiny # extra-small
make fast # release-fast
# Go backends
make cross-platform # builds for Linux/macOS/Windows
```
## Deploy
- **Dev**: `docker-compose up -d`
- **Prod**: Use the provided systemd units or containers on Rocky Linux.
## Docs
See `docs/` for detailed guides:
- `docs/src/native-libraries.md` Native C++ libraries (build, test, deploy)
- `docs/src/zig-cli.md` CLI reference
- `docs/src/quick-start.md` Full setup guide
- `docs/src/deployment.md` Production deployment
- `docs/src/research-features.md` Research workflow features (narrative capture, outcomes, search)
- `docs/src/privacy-security.md` Privacy levels, PII detection, anonymized export
## Source code
The FetchML source code is intentionally not hosted on GitHub.
The canonical source repository is available at: `<SOURCE_REPO_URL>`.
## License
FetchML is source-available for transparency and auditability. It is not open-source.
See `LICENSE`.