Add Known Limitations section to AGENTS.md documenting: - AMD GPU not implemented (use NVIDIA, Apple Silicon, or CPU) - 100+ node gang allocation stress testing not yet implemented - Podman-in-Docker CI requires privileged mode, not yet automated - Error handling patterns for unimplemented features - Container usage rules (Docker for testing/deployments, Podman for experiments) - Error codes table (NOT_IMPLEMENTED, NOT_FOUND, INVALID_CONFIGURATION) Update testing documentation to reflect new test locations: - Unit tests moved from tests/unit/ to internal/ (Go convention) - Update all test file path references in security testing docs
162 lines
5 KiB
Markdown
162 lines
5 KiB
Markdown
# AGENTS.md - FetchML
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐
|
|
│ CLI │────▶│ API │────▶│ Scheduler│────▶│ Worker │────▶│ Storage │
|
|
│ (Zig) │◄────│(Go/HTTP)│◄────│ (Go) │◄────│ (Go) │◄────│ (MinIO) │
|
|
└─────────┘ └─────────┘ └──────────┘ └─────────┘ └──────────┘
|
|
│
|
|
▼
|
|
┌──────────┐
|
|
│ Redis │
|
|
│ (Queue) │
|
|
└──────────┘
|
|
```
|
|
|
|
**CLI ↔ Server**: HTTP (default) or Unix socket (local). `execution_mode` config:
|
|
`direct` (bypass scheduler) or `queue` (full flow). Auth via API key in header.
|
|
|
|
---
|
|
|
|
## Container Architecture
|
|
|
|
**Docker** - Used for:
|
|
- CI/CD testing pipelines (`.forgejo/workflows/docker-tests.yml`)
|
|
- Application deployments (staging/production)
|
|
- Build environments
|
|
|
|
**Podman** - Used for:
|
|
- ML experiment isolation only
|
|
- Running untrusted/3rd party ML workloads
|
|
- Rootless container execution for security
|
|
|
|
**Rule**: Never use Podman for CI testing or deployments. Never use Docker for experiment isolation.
|
|
|
|
---
|
|
|
|
## Critical Invariants
|
|
|
|
### Audit Log — never break these
|
|
|
|
- **Append-only** — entries are never modified or deleted
|
|
- **Hash chain** — every entry includes SHA256 of the previous entry
|
|
- **All mutations** to tasks/groups/tokens must produce an audit entry
|
|
- Write the audit entry before the storage write — partial failures must be audited
|
|
|
|
### Auth
|
|
|
|
- `TokenFromContext(ctx)` is the only authorised way to extract auth in handlers
|
|
- Group visibility enforced at DB query level — never filter in application code
|
|
- API keys hashed with bcrypt before storage — never log raw keys
|
|
|
|
### Storage
|
|
|
|
- All DB access through repository types in `internal/db/repository/`
|
|
- Transactions via `WithTx(ctx, db, func(tx *sql.Tx) error)` — never manage tx manually
|
|
- Migrations: additive only — new columns must be nullable or have defaults,
|
|
never drop columns (mark deprecated, remove later)
|
|
|
|
### CGO / Native Libs
|
|
|
|
Use `-tags native_libs` when building with C++ extensions. This has broken twice —
|
|
always check build tags when touching GPU detection or native code.
|
|
|
|
---
|
|
|
|
## Build Commands
|
|
|
|
```bash
|
|
make build # all components
|
|
make dev # fast, no LTO
|
|
make prod # production-optimized
|
|
make prod-with-native # production + C++ libs
|
|
make cross-platform # Linux/macOS/Windows
|
|
|
|
cd cli && make dev # Zig: fast compile + format
|
|
cd cli && make prod # Zig: release=fast, LTO
|
|
cd cli && make debug # Zig: no optimizations
|
|
cd cli && zig build test
|
|
```
|
|
|
|
## Test Commands
|
|
|
|
```bash
|
|
make test # all tests (Docker)
|
|
make test-unit
|
|
make test-integration
|
|
make test-e2e
|
|
make test-coverage
|
|
|
|
go test -v ./path/to/package -run TestName
|
|
go test -race ./path/to/package/...
|
|
LOG_LEVEL=debug go test -v ./path/to/package
|
|
FETCH_ML_E2E_PODMAN=1 go test ./tests/e2e/...
|
|
```
|
|
|
|
## Lint / Security
|
|
|
|
```bash
|
|
make lint
|
|
make security-scan
|
|
make configlint
|
|
make openapi-validate
|
|
go vet ./...
|
|
cd cli && zig fmt .
|
|
```
|
|
|
|
---
|
|
|
|
## Legacy Go — modernize when touching existing code only
|
|
|
|
| Legacy | Modern |
|
|
| -------------------------- | ----------------------- |
|
|
| `interface{}` | `any` |
|
|
| `for i := 0; i < n; i++` | `for i := range items` |
|
|
| `[]byte(fmt.Sprintf(...))` | `fmt.Appendf(nil, ...)` |
|
|
| `sort.Slice` with closure | `slices.Sort(x)` |
|
|
| Manual contains loop | `slices.Contains` |
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
- Go 1.25+, Zig 0.15+, Python 3.11+
|
|
- Redis (integration tests), Docker/Podman (container tests)
|
|
|
|
---
|
|
|
|
## Known Limitations
|
|
|
|
See `docs/known-limitations.md` for full details.
|
|
|
|
**Key items**:
|
|
- **AMD GPU**: Not implemented. Use NVIDIA, Apple Silicon, or CPU. Mock available for testing.
|
|
- **100+ node gang allocation**: Stress testing not yet implemented.
|
|
- **Podman-in-Docker CI**: Requires privileged mode, not yet automated.
|
|
|
|
**Error Handling**:
|
|
```go
|
|
// For unimplemented features:
|
|
return apierrors.NewNotImplemented("feature name")
|
|
|
|
// Validation:
|
|
if err := detectionResult.Validate(); err != nil {
|
|
return err // Clear error message for user
|
|
}
|
|
```
|
|
|
|
**Container Rule Reminder**:
|
|
- Docker = testing & deployments
|
|
- Podman = experiment isolation only
|
|
|
|
---
|
|
|
|
## Error Codes
|
|
|
|
| Code | HTTP Status | Use Case |
|
|
|------|-------------|----------|
|
|
| `NOT_IMPLEMENTED` | 501 | Feature planned but not available |
|
|
| `NOT_FOUND` | 404 | Resource doesn't exist |
|
|
| `INVALID_CONFIGURATION` | 400 | Bad config (e.g., AMD GPU in production) |
|