No description

Find a file

Jeremie Fraeys 7cd86fb88a Some checks failed Build Pipeline / Build Binaries (push) Failing after 3m39s Details Build Pipeline / Build Docker Images (push) Has been skipped Details Build Pipeline / Sign HIPAA Config (push) Has been skipped Details Build Pipeline / Generate SLSA Provenance (push) Has been skipped Details Checkout test / test (push) Successful in 6s Details CI Pipeline / Test (ubuntu-latest on self-hosted) (push) Failing after 1s Details CI Pipeline / Dev Compose Smoke Test (push) Has been skipped Details CI Pipeline / Security Scan (push) Has been skipped Details CI Pipeline / Test Scripts (push) Has been skipped Details CI Pipeline / Test Native Libraries (push) Has been skipped Details CI Pipeline / Native Library Build Matrix (push) Has been skipped Details Contract Tests / Spec Drift Detection (push) Failing after 11s Details Contract Tests / API Contract Tests (push) Has been skipped Details Deploy API Docs / Build API Documentation (push) Failing after 5s Details Deploy API Docs / Deploy to GitHub Pages (push) Has been skipped Details Documentation / build-and-publish (push) Failing after 40s Details Test Matrix / test-native-vs-pure (cgo) (push) Failing after 14s Details Test Matrix / test-native-vs-pure (native) (push) Failing after 35s Details Test Matrix / test-native-vs-pure (pure) (push) Failing after 18s Details CI Pipeline / Trigger Build Workflow (push) Failing after 1s Details Build CLI with Embedded SQLite / build (arm64, aarch64-linux) (push) Has been cancelled Details Build CLI with Embedded SQLite / build (x86_64, x86_64-linux) (push) Has been cancelled Details Build CLI with Embedded SQLite / build-macos (arm64) (push) Has been cancelled Details Build CLI with Embedded SQLite / build-macos (x86_64) (push) Has been cancelled Details Security Scan / Security Analysis (push) Has been cancelled Details Security Scan / Native Library Security (push) Has been cancelled Details Verification & Maintenance / V.1 - Schema Drift Detection (push) Has been cancelled Details Verification & Maintenance / V.4 - Custom Go Vet Analyzers (push) Has been cancelled Details Verification & Maintenance / V.7 - Audit Chain Integrity (push) Has been cancelled Details Verification & Maintenance / V.6 - Extended Security Scanning (push) Has been cancelled Details Verification & Maintenance / V.10 - OpenSSF Scorecard (push) Has been cancelled Details Verification & Maintenance / Verification Summary (push) Has been cancelled Details feat: add new API handlers, build scripts, and ADRs - Introduce audit, plugin, and scheduler API handlers - Add spec_embed.go for OpenAPI spec embedding - Create modular build scripts (cli, go, native, cross-platform) - Add deployment cleanup and health-check utilities - New ADRs: hot reload, audit store, SSE updates, RBAC, caching, offline mode, KMS regions, tenant offboarding - Add KMS configuration schema and worker variants - Include KMS benchmark tests		2026-03-04 13:24:27 -05:00
.forgejo/workflows	feat: add new API handlers, build scripts, and ADRs	2026-03-04 13:24:27 -05:00
.gitea	chore: update dependencies and remove obsolete compose files	2026-03-04 13:23:52 -05:00
api	api: regenerate OpenAPI types and server code	2026-03-04 13:23:34 -05:00
build	chore(deploy): update deployment configs and TUI for scheduler	2026-02-26 12:08:31 -05:00
cli	cli: update Zig CLI build and native hash integration	2026-03-04 13:23:30 -05:00
cmd	security: improve audit, crypto, and config handling	2026-03-04 13:23:42 -05:00
configs	feat: add new API handlers, build scripts, and ADRs	2026-03-04 13:24:27 -05:00
db	feat: add GitHub workflows and development tooling	2025-12-04 16:56:25 -05:00
deployments	chore: update dependencies and remove obsolete compose files	2026-03-04 13:23:52 -05:00
docs	feat: add new API handlers, build scripts, and ADRs	2026-03-04 13:24:27 -05:00
examples	Slim and secure: move scripts, clean configs, remove secrets	2025-12-07 13:57:51 -05:00
internal	feat: add new API handlers, build scripts, and ADRs	2026-03-04 13:24:27 -05:00
monitoring	docs: update documentation for streamlined Makefile	2026-03-04 13:22:29 -05:00
native	chore(tools): update scripts, native libs, and documentation	2026-02-26 12:08:58 -05:00
podman	refactor: reorganize podman directory structure	2026-02-18 16:40:46 -05:00
redis	Slim and secure: move scripts, clean configs, remove secrets	2025-12-07 13:57:51 -05:00
scripts	feat: add new API handlers, build scripts, and ADRs	2026-03-04 13:24:27 -05:00
tests	feat: add new API handlers, build scripts, and ADRs	2026-03-04 13:24:27 -05:00
tools	docs: update documentation for streamlined Makefile	2026-03-04 13:22:29 -05:00
.dockerignore	chore(repo): add dockerignore, changelog, and ignore local artifacts	2026-01-05 12:30:57 -05:00
.env.example	chore(repo): add dockerignore, changelog, and ignore local artifacts	2026-01-05 12:30:57 -05:00
.flake8	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
.gitignore	chore(cleanup): remove obsolete files and update .gitignore	2026-02-26 12:09:18 -05:00
.golangci.yml	ci: align workflows, build scripts, and docs with current architecture	2026-01-05 12:34:23 -05:00
.golintrc	Fix multi-user authentication and clean up debug code	2025-12-06 12:35:32 -05:00
CHANGELOG.md	docs: update coverage map and development documentation	2026-02-23 20:26:13 -05:00
DEVELOPMENT.md	docs: update documentation for streamlined Makefile	2026-03-04 13:22:29 -05:00
go.mod	chore: update dependencies and remove obsolete compose files	2026-03-04 13:23:52 -05:00
go.sum	chore: update dependencies and remove obsolete compose files	2026-03-04 13:23:52 -05:00
LICENSE	ci: align workflows, build scripts, and docs with current architecture	2026-01-05 12:34:23 -05:00
Makefile	build: streamline Makefile from 1000+ to ~730 lines	2026-03-04 13:22:21 -05:00
pyproject.toml	feat: initialize FetchML ML platform with core project structure	2025-12-04 16:52:09 -05:00
README.md	docs: update coverage map and development documentation	2026-02-23 20:26:13 -05:00
SECURITY.md	feat: update CLI, TUI, and security documentation	2026-02-19 15:35:05 -05:00

README.md

FetchML

A lightweight ML experiment platform with a tiny Zig CLI and a Go backend. Designed for homelabs and small teams.

Installation (recommended)

FetchML publishes pre-built release artifacts (CLI + Go services) on GitHub Releases.

If you prefer a one-shot check (recommended for most users), you can use:

./scripts/verify_release.sh --dir . --repo <org>/<repo>

Download the right archive for your platform
Verify checksums.txt signature (recommended)

The release includes a signed checksums.txt plus:

checksums.txt.sig
checksums.txt.cert

Verify the signature (keyless Sigstore) using cosign:

cosign verify-blob \
  --certificate checksums.txt.cert \
  --signature checksums.txt.sig \
  --certificate-identity-regexp "^https://github.com/jfraeysd/fetch_ml/.forgejo/workflows/release-mirror.yml@refs/tags/v.*$" \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  checksums.txt

Verify the SHA256 checksum against checksums.txt
Extract and install

Example (CLI on Linux x86_64):

# Download
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/ml-linux-x86_64.tar.gz
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt.sig
curl -fsSLO https://github.com/jfraeysd/fetch_ml/releases/download/<tag>/checksums.txt.cert

# Verify
cosign verify-blob \
  --certificate checksums.txt.cert \
  --signature checksums.txt.sig \
  --certificate-identity-regexp "^https://github.com/jfraeysd/fetch_ml/.forgejo/workflows/release-mirror.yml@refs/tags/v.*$" \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  checksums.txt
sha256sum -c --ignore-missing checksums.txt

# Install
tar -xzf ml-linux-x86_64.tar.gz
chmod +x ml-linux-x86_64
sudo mv ml-linux-x86_64 /usr/local/bin/ml

ml --help

Quick start

# Clone and run (dev)
git clone <your-repo>
cd fetch_ml
make dev-up

# Or build the CLI locally
cd cli && make all
./zig-out/bin/ml --help

What you get

Zig CLI (ml): Tiny, fast local client. Uses ~/.ml/config.toml and FETCH_ML_CLI_* env vars.
Go backends: API server, worker, and a TUI for richer remote features.
TUI over SSH: ml monitor launches the TUI on the server, keeping the local CLI minimal.
CI/CD: Cross‑platform builds with zig build-exe and Go releases.

Testing & Security

FetchML maintains 100% test coverage (49/49 requirements) for all security and reproducibility controls:

Unit tests: 150+ tests covering security, reproducibility, and core functionality
Property-based tests: gopter-based invariant verification
Integration tests: Cross-tenant isolation, audit verification, PHI redaction
Fault injection: Prepared tests for toxiproxy integration
Custom lint analyzers: fetchml-vet enforces security at compile time

See docs/TEST_COVERAGE_MAP.md for detailed coverage tracking and DEVELOPMENT.md for testing guidelines.

CLI usage

# Configure
cat > ~/.ml/config.toml <<EOF
worker_host = "127.0.0.1"
worker_user = "dev_user"
worker_base = "/tmp/ml-experiments"
worker_port = 22
api_key = "your-api-key"
EOF

# Core commands
ml status
ml queue my-job
ml cancel my-job
ml dataset list
ml monitor  # SSH to run TUI remotely

# Research features (see docs/src/research-features.md)
ml queue train.py --hypothesis "LR scaling..." --tags ablation
ml outcome set run_abc --outcome validates --summary "Accuracy +2%"
ml find --outcome validates --tag lr-test
ml compare run_abc run_def
ml privacy set run_abc --level team
ml export run_abc --anonymize
ml dataset verify /path/to/data

Phase 1 (V1) notes

Task schema supports optional snapshot_id (opaque identifier) and dataset_specs (structured dataset inputs). If dataset_specs is present it takes precedence over legacy datasets / --datasets args.
Snapshot restore (S1) stages verified snapshot_id into each task workspace and exposes it via FETCH_ML_SNAPSHOT_DIR and FETCH_ML_SNAPSHOT_ID. If snapshot_store.enabled: true in the worker config, the worker will pull <prefix>/<snapshot_id>.tar.gz from an S3-compatible store (e.g. MinIO), verify snapshot_sha256, and cache it under data_dir/snapshots/sha256/<snapshot_sha256>.
Prewarm (best-effort) can fetch datasets for the next queued task while another task is running. Prewarm state is surfaced in ml status --json under the optional prewarm field.
Env prewarm (best-effort) can build a warmed Podman image keyed by deps_manifest_sha256 and reuse it for later tasks.

Changelog

See CHANGELOG.md.

Build

Native C++ Libraries (Optional)

FetchML includes optional C++ native libraries for performance. See docs/src/native-libraries.md for detailed build instructions.

Quick start:

make native-build           # Build native libs
make native-smoke           # Run smoke test
go build -tags native_libs  # Enable native libraries

Standard Build

# CLI (Zig)
cd cli && make all      # release-small
make tiny              # extra-small
make fast              # release-fast

# Go backends
make cross-platform    # builds for Linux/macOS/Windows

Deploy

Dev: docker-compose up -d
Prod: Use the provided systemd units or containers on Rocky Linux.

Docs

See docs/ for detailed guides:

docs/src/native-libraries.md – Native C++ libraries (build, test, deploy)
docs/src/zig-cli.md – CLI reference
docs/src/quick-start.md – Full setup guide
docs/src/deployment.md – Production deployment
docs/src/research-features.md – Research workflow features (narrative capture, outcomes, search)
docs/src/privacy-security.md – Privacy levels, PII detection, anonymized export

CLI Architecture (2026-02)

The Zig CLI has been refactored for improved maintainability:

Modular 3-layer architecture: core/ (foundation), local//server/ (mode-specific), commands/ (routers)
Unified context: core.context.Context handles mode detection, output formatting, and dispatch
Code reduction: experiment.zig reduced from 836 to 348 lines (58% reduction)
Bug fixes: Resolved 15+ compilation errors across multiple commands

See cli/README.md for detailed architecture documentation.

Source code

The FetchML source code is intentionally not hosted on GitHub.

The canonical source repository is available at: <SOURCE_REPO_URL>.

License

FetchML is source-available for transparency and auditability. It is not open-source.

See LICENSE.

README.md Unescape Escape