Update documentation for new features: - Add CHANGELOG entries for research features and privacy enhancements - Update README with new CLI commands and security features - Add privacy-security.md documentation for PII detection - Add research-features.md for narrative and outcome tracking
175 lines
5.4 KiB
Markdown
175 lines
5.4 KiB
Markdown
# FetchML
|
||
|
||
A lightweight ML experiment platform with a tiny Zig CLI and a Go backend. Designed for homelabs and small teams.
|
||
|
||
## Installation (recommended)
|
||
|
||
FetchML publishes pre-built release artifacts (CLI + Go services) on GitHub Releases.
|
||
|
||
If you prefer a one-shot check (recommended for most users), you can use:
|
||
|
||
```bash
|
||
./scripts/verify_release.sh --dir . --repo <org>/<repo>
|
||
```
|
||
|
||
1) Download the right archive for your platform
|
||
|
||
2) Verify `checksums.txt` signature (recommended)
|
||
|
||
The release includes a signed `checksums.txt` plus:
|
||
|
||
- `checksums.txt.sig`
|
||
- `checksums.txt.cert`
|
||
|
||
Verify the signature (keyless Sigstore) using cosign:
|
||
|
||
```bash
|
||
cosign verify-blob \
|
||
--certificate checksums.txt.cert \
|
||
--signature checksums.txt.sig \
|
||
--certificate-identity-regexp "^https://github.com/jfraeysd/fetch_ml/.forgejo/workflows/release-mirror.yml@refs/tags/v.*$" \
|
||
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
|
||
checksums.txt
|
||
```
|
||
|
||
3) Verify the SHA256 checksum against `checksums.txt`
|
||
|
||
4) Extract and install
|
||
|
||
Example (CLI on Linux x86_64):
|
||
|
||
```bash
|
||
# Download
|
||
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/ml-linux-x86_64.tar.gz
|
||
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt
|
||
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt.sig
|
||
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt.cert
|
||
|
||
# Verify
|
||
cosign verify-blob \
|
||
--certificate checksums.txt.cert \
|
||
--signature checksums.txt.sig \
|
||
--certificate-identity-regexp "^https://github.com/jfraeysd/fetch_ml/.forgejo/workflows/release-mirror.yml@refs/tags/v.*$" \
|
||
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
|
||
checksums.txt
|
||
sha256sum -c --ignore-missing checksums.txt
|
||
|
||
# Install
|
||
tar -xzf ml-linux-x86_64.tar.gz
|
||
chmod +x ml-linux-x86_64
|
||
sudo mv ml-linux-x86_64 /usr/local/bin/ml
|
||
|
||
ml --help
|
||
```
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
# Clone and run (dev)
|
||
git clone <your-repo>
|
||
cd fetch_ml
|
||
make dev-up
|
||
|
||
# Or build the CLI locally
|
||
cd cli && make all
|
||
./zig-out/bin/ml --help
|
||
```
|
||
|
||
## What you get
|
||
|
||
- **Zig CLI** (`ml`): Tiny, fast local client. Uses `~/.ml/config.toml` and `FETCH_ML_CLI_*` env vars.
|
||
- **Go backends**: API server, worker, and a TUI for richer remote features.
|
||
- **TUI over SSH**: `ml monitor` launches the TUI on the server, keeping the local CLI minimal.
|
||
- **CI/CD**: Cross‑platform builds with `zig build-exe` and Go releases.
|
||
|
||
## CLI usage
|
||
|
||
```bash
|
||
# Configure
|
||
cat > ~/.ml/config.toml <<EOF
|
||
worker_host = "127.0.0.1"
|
||
worker_user = "dev_user"
|
||
worker_base = "/tmp/ml-experiments"
|
||
worker_port = 22
|
||
api_key = "your-api-key"
|
||
EOF
|
||
|
||
# Core commands
|
||
ml status
|
||
ml queue my-job
|
||
ml cancel my-job
|
||
ml dataset list
|
||
ml monitor # SSH to run TUI remotely
|
||
|
||
# Research features (see docs/src/research-features.md)
|
||
ml queue train.py --hypothesis "LR scaling..." --tags ablation
|
||
ml outcome set run_abc --outcome validates --summary "Accuracy +2%"
|
||
ml find --outcome validates --tag lr-test
|
||
ml compare run_abc run_def
|
||
ml privacy set run_abc --level team
|
||
ml export run_abc --anonymize
|
||
ml dataset verify /path/to/data
|
||
```
|
||
|
||
## Phase 1 (V1) notes
|
||
|
||
- **Task schema** supports optional `snapshot_id` (opaque identifier) and `dataset_specs` (structured dataset inputs). If `dataset_specs` is present it takes precedence over legacy `datasets` / `--datasets` args.
|
||
- **Snapshot restore (S1)** stages verified `snapshot_id` into each task workspace and exposes it via `FETCH_ML_SNAPSHOT_DIR` and `FETCH_ML_SNAPSHOT_ID`. If `snapshot_store.enabled: true` in the worker config, the worker will pull `<prefix>/<snapshot_id>.tar.gz` from an S3-compatible store (e.g. MinIO), verify `snapshot_sha256`, and cache it under `data_dir/snapshots/sha256/<snapshot_sha256>`.
|
||
- **Prewarm (best-effort)** can fetch datasets for the next queued task while another task is running. Prewarm state is surfaced in `ml status --json` under the optional `prewarm` field.
|
||
- **Env prewarm (best-effort)** can build a warmed Podman image keyed by `deps_manifest_sha256` and reuse it for later tasks.
|
||
|
||
## Changelog
|
||
|
||
See `CHANGELOG.md`.
|
||
|
||
## Build
|
||
|
||
### Native C++ Libraries (Optional)
|
||
|
||
FetchML includes optional C++ native libraries for performance. See `docs/src/native-libraries.md` for detailed build instructions.
|
||
|
||
Quick start:
|
||
```bash
|
||
make native-build # Build native libs
|
||
make native-smoke # Run smoke test
|
||
export FETCHML_NATIVE_LIBS=1 # Enable at runtime
|
||
```
|
||
|
||
### Standard Build
|
||
|
||
```bash
|
||
# CLI (Zig)
|
||
cd cli && make all # release-small
|
||
make tiny # extra-small
|
||
make fast # release-fast
|
||
|
||
# Go backends
|
||
make cross-platform # builds for Linux/macOS/Windows
|
||
```
|
||
|
||
## Deploy
|
||
|
||
- **Dev**: `docker-compose up -d`
|
||
- **Prod**: Use the provided systemd units or containers on Rocky Linux.
|
||
|
||
## Docs
|
||
|
||
See `docs/` for detailed guides:
|
||
- `docs/src/native-libraries.md` – Native C++ libraries (build, test, deploy)
|
||
- `docs/src/zig-cli.md` – CLI reference
|
||
- `docs/src/quick-start.md` – Full setup guide
|
||
- `docs/src/deployment.md` – Production deployment
|
||
- `docs/src/research-features.md` – Research workflow features (narrative capture, outcomes, search)
|
||
- `docs/src/privacy-security.md` – Privacy levels, PII detection, anonymized export
|
||
|
||
## Source code
|
||
|
||
The FetchML source code is intentionally not hosted on GitHub.
|
||
|
||
The canonical source repository is available at: `<SOURCE_REPO_URL>`.
|
||
|
||
## License
|
||
|
||
FetchML is source-available for transparency and auditability. It is not open-source.
|
||
|
||
See `LICENSE`.
|