fetch_ml/AGENTS.md
Jeremie Fraeys 6646f3a382
ci(docker): add test workflow and container architecture docs
- Create docker-tests.yml for merge-to-main CI pipeline
- Add mock GPU test matrix (NVIDIA, Metal, CPU-only)
- Add AGENTS.md with container architecture rules:
  * Docker for CI/CD testing and deployments
  * Podman for ML experiment isolation only
- Update .gitignore to track AGENTS.md
2026-03-12 14:05:53 -04:00

4.1 KiB

AGENTS.md - FetchML

Architecture

┌─────────┐     ┌─────────┐     ┌──────────┐     ┌─────────┐     ┌──────────┐
│   CLI   │────▶│   API   │────▶│ Scheduler│────▶│  Worker  │────▶│ Storage  │
│  (Zig)  │◄────│(Go/HTTP)│◄────│  (Go)    │◄────│  (Go)    │◄────│ (MinIO)  │
└─────────┘     └─────────┘     └──────────┘     └─────────┘     └──────────┘
                                     │
                                     ▼
                              ┌──────────┐
                              │   Redis  │
                              │  (Queue) │
                              └──────────┘

CLI ↔ Server: HTTP (default) or Unix socket (local). execution_mode config: direct (bypass scheduler) or queue (full flow). Auth via API key in header.


Container Architecture

Docker - Used for:

  • CI/CD testing pipelines (.forgejo/workflows/docker-tests.yml)
  • Application deployments (staging/production)
  • Build environments

Podman - Used for:

  • ML experiment isolation only
  • Running untrusted/3rd party ML workloads
  • Rootless container execution for security

Rule: Never use Podman for CI testing or deployments. Never use Docker for experiment isolation.


Critical Invariants

Audit Log — never break these

  • Append-only — entries are never modified or deleted
  • Hash chain — every entry includes SHA256 of the previous entry
  • All mutations to tasks/groups/tokens must produce an audit entry
  • Write the audit entry before the storage write — partial failures must be audited

Auth

  • TokenFromContext(ctx) is the only authorised way to extract auth in handlers
  • Group visibility enforced at DB query level — never filter in application code
  • API keys hashed with bcrypt before storage — never log raw keys

Storage

  • All DB access through repository types in internal/db/repository/
  • Transactions via WithTx(ctx, db, func(tx *sql.Tx) error) — never manage tx manually
  • Migrations: additive only — new columns must be nullable or have defaults, never drop columns (mark deprecated, remove later)

CGO / Native Libs

Use -tags native_libs when building with C++ extensions. This has broken twice — always check build tags when touching GPU detection or native code.


Build Commands

make build              # all components
make dev                # fast, no LTO
make prod               # production-optimized
make prod-with-native   # production + C++ libs
make cross-platform     # Linux/macOS/Windows

cd cli && make dev      # Zig: fast compile + format
cd cli && make prod     # Zig: release=fast, LTO
cd cli && make debug    # Zig: no optimizations
cd cli && zig build test

Test Commands

make test               # all tests (Docker)
make test-unit
make test-integration
make test-e2e
make test-coverage

go test -v ./path/to/package -run TestName
go test -race ./path/to/package/...
LOG_LEVEL=debug go test -v ./path/to/package
FETCH_ML_E2E_PODMAN=1 go test ./tests/e2e/...

Lint / Security

make lint
make security-scan
make configlint
make openapi-validate
go vet ./...
cd cli && zig fmt .

Legacy Go — modernize when touching existing code only

Legacy Modern
interface{} any
for i := 0; i < n; i++ for i := range items
[]byte(fmt.Sprintf(...)) fmt.Appendf(nil, ...)
sort.Slice with closure slices.Sort(x)
Manual contains loop slices.Contains

Dependencies

  • Go 1.25+, Zig 0.15+, Python 3.11+
  • Redis (integration tests), Docker/Podman (container tests)