fetch_ml/docs/src/native-libraries.md
Jeremie Fraeys d6265df0bd
docs: update all documentation to use build tags instead of deprecated env var
- README.md: Replace FETCHML_NATIVE_LIBS with -tags native_libs
- docs/src/native-libraries.md: Update all examples to use build tags
- .forgejo/workflows/ci-native.yml: Use -tags native_libs in all test steps
- Remove deprecated FETCHML_NATIVE_LIBS=1/0 env var references
2026-02-21 15:11:27 -05:00

3.2 KiB

Native C++ Libraries

FetchML includes optional C++ native libraries for performance-critical operations. These libraries are loaded dynamically via cgo and provide significant syscall reduction compared to pure Go implementations.

Overview

Library Purpose Syscall Reduction
dataset_hash mmap + SIMD SHA256 hashing 78%
queue_index Binary index format 96%
artifact_scanner Fast directory traversal 87%
streaming_io Parallel gzip extraction 95%

Requirements

  • CMake 3.15+
  • C++17 compiler
  • zlib

Building

Development Build

make native-build

Production Optimized

make native-release  # -O3 optimized

Debug with ASan

make native-debug    # AddressSanitizer enabled

Smoke Test

make native-smoke    # C++ tests + Go integration

Enabling Native Libraries

Build with the native_libs tag:

go build -tags native_libs ./...

Or run tests with native libraries:

go test -tags native_libs ./tests/...

Deployment

Ship the native libraries alongside your Go binaries:

  • Linux: lib*.so files
  • macOS: lib*.dylib files

The libraries are loaded dynamically via cgo. If not found, FetchML automatically falls back to pure Go implementations.

Building with Native Support

make prod-with-native    # Copies .so/.dylib files to bin/

Architecture

Library Structure

native/
├── common/          # Shared utilities (mmap, thread pool, arena)
├── queue_index/     # Storage, heap, priority queue
└── dataset_hash/    # Crypto, I/O, threading

Security Boundaries

All native libraries implement input validation at C API boundaries:

  • Path validation: Rejects traversal sequences (..) and null bytes
  • Buffer safety: strncpy with explicit null termination
  • Mmap limits: 100MB cap prevents unbounded memory exposure
  • Atomic writes: Temp file + rename ensures data integrity

Testing

C++ Unit Tests

cd native/build && ctest --output-on-failure

Go Integration Tests

# Run with native libraries
go test -tags native_libs ./tests/benchmarks/...
go test -tags native_libs ./tests/e2e/...

# Run without native libraries (Go fallback)
go test ./tests/benchmarks/...

ASan Build

cmake .. -DENABLE_ASAN=ON
make
ASAN_OPTIONS=detect_leaks=1 ./test_executable

Performance Validation

Run benchmarks to verify native libraries outperform pure Go:

# Go implementation
go test -bench=. ./tests/benchmarks/

# Native implementation
go test -tags native_libs -bench=. ./tests/benchmarks/

Troubleshooting

Library not found

Ensure the native libraries are in the library search path:

# Linux
export LD_LIBRARY_PATH=/path/to/native/build:$LD_LIBRARY_PATH

# macOS
export DYLD_LIBRARY_PATH=/path/to/native/build:$DYLD_LIBRARY_PATH

Build errors

Common issues:

  1. Missing cmake: Install with apt-get install cmake or brew install cmake
  2. Missing C++ compiler: Install build-essential (Linux) or Xcode (macOS)
  3. Missing zlib: Install zlib1g-dev (Linux) or it's built-in (macOS)