Add Known Limitations section to AGENTS.md documenting: - AMD GPU not implemented (use NVIDIA, Apple Silicon, or CPU) - 100+ node gang allocation stress testing not yet implemented - Podman-in-Docker CI requires privileged mode, not yet automated - Error handling patterns for unimplemented features - Container usage rules (Docker for testing/deployments, Podman for experiments) - Error codes table (NOT_IMPLEMENTED, NOT_FOUND, INVALID_CONFIGURATION) Update testing documentation to reflect new test locations: - Unit tests moved from tests/unit/ to internal/ (Go convention) - Update all test file path references in security testing docs
5 KiB
5 KiB
AGENTS.md - FetchML
Architecture
┌─────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐
│ CLI │────▶│ API │────▶│ Scheduler│────▶│ Worker │────▶│ Storage │
│ (Zig) │◄────│(Go/HTTP)│◄────│ (Go) │◄────│ (Go) │◄────│ (MinIO) │
└─────────┘ └─────────┘ └──────────┘ └─────────┘ └──────────┘
│
▼
┌──────────┐
│ Redis │
│ (Queue) │
└──────────┘
CLI ↔ Server: HTTP (default) or Unix socket (local). execution_mode config:
direct (bypass scheduler) or queue (full flow). Auth via API key in header.
Container Architecture
Docker - Used for:
- CI/CD testing pipelines (
.forgejo/workflows/docker-tests.yml) - Application deployments (staging/production)
- Build environments
Podman - Used for:
- ML experiment isolation only
- Running untrusted/3rd party ML workloads
- Rootless container execution for security
Rule: Never use Podman for CI testing or deployments. Never use Docker for experiment isolation.
Critical Invariants
Audit Log — never break these
- Append-only — entries are never modified or deleted
- Hash chain — every entry includes SHA256 of the previous entry
- All mutations to tasks/groups/tokens must produce an audit entry
- Write the audit entry before the storage write — partial failures must be audited
Auth
TokenFromContext(ctx)is the only authorised way to extract auth in handlers- Group visibility enforced at DB query level — never filter in application code
- API keys hashed with bcrypt before storage — never log raw keys
Storage
- All DB access through repository types in
internal/db/repository/ - Transactions via
WithTx(ctx, db, func(tx *sql.Tx) error)— never manage tx manually - Migrations: additive only — new columns must be nullable or have defaults, never drop columns (mark deprecated, remove later)
CGO / Native Libs
Use -tags native_libs when building with C++ extensions. This has broken twice —
always check build tags when touching GPU detection or native code.
Build Commands
make build # all components
make dev # fast, no LTO
make prod # production-optimized
make prod-with-native # production + C++ libs
make cross-platform # Linux/macOS/Windows
cd cli && make dev # Zig: fast compile + format
cd cli && make prod # Zig: release=fast, LTO
cd cli && make debug # Zig: no optimizations
cd cli && zig build test
Test Commands
make test # all tests (Docker)
make test-unit
make test-integration
make test-e2e
make test-coverage
go test -v ./path/to/package -run TestName
go test -race ./path/to/package/...
LOG_LEVEL=debug go test -v ./path/to/package
FETCH_ML_E2E_PODMAN=1 go test ./tests/e2e/...
Lint / Security
make lint
make security-scan
make configlint
make openapi-validate
go vet ./...
cd cli && zig fmt .
Legacy Go — modernize when touching existing code only
| Legacy | Modern |
|---|---|
interface{} |
any |
for i := 0; i < n; i++ |
for i := range items |
[]byte(fmt.Sprintf(...)) |
fmt.Appendf(nil, ...) |
sort.Slice with closure |
slices.Sort(x) |
| Manual contains loop | slices.Contains |
Dependencies
- Go 1.25+, Zig 0.15+, Python 3.11+
- Redis (integration tests), Docker/Podman (container tests)
Known Limitations
See docs/known-limitations.md for full details.
Key items:
- AMD GPU: Not implemented. Use NVIDIA, Apple Silicon, or CPU. Mock available for testing.
- 100+ node gang allocation: Stress testing not yet implemented.
- Podman-in-Docker CI: Requires privileged mode, not yet automated.
Error Handling:
// For unimplemented features:
return apierrors.NewNotImplemented("feature name")
// Validation:
if err := detectionResult.Validate(); err != nil {
return err // Clear error message for user
}
Container Rule Reminder:
- Docker = testing & deployments
- Podman = experiment isolation only
Error Codes
| Code | HTTP Status | Use Case |
|---|---|---|
NOT_IMPLEMENTED |
501 | Feature planned but not available |
NOT_FOUND |
404 | Resource doesn't exist |
INVALID_CONFIGURATION |
400 | Bad config (e.g., AMD GPU in production) |