fetch_ml/docs/src/native-libraries.md
Jeremie Fraeys 6cc02b5efc
docs: add native libraries documentation and smoke tests
- Add comprehensive native-libraries.md documentation
- Add smoke-test-native.sh for testing native library builds
- Document build process, architecture, and testing strategy
2026-02-16 20:38:46 -05:00

3 KiB

Native C++ Libraries

FetchML includes optional C++ native libraries for performance-critical operations. These libraries are loaded dynamically via cgo and provide significant syscall reduction compared to pure Go implementations.

Overview

Library Purpose Syscall Reduction
dataset_hash mmap + SIMD SHA256 hashing 78%
queue_index Binary index format 96%
artifact_scanner Fast directory traversal 87%
streaming_io Parallel gzip extraction 95%

Requirements

  • CMake 3.15+
  • C++17 compiler
  • zlib

Building

Development Build

make native-build

Production Optimized

make native-release  # -O3 optimized

Debug with ASan

make native-debug    # AddressSanitizer enabled

Smoke Test

make native-smoke    # C++ tests + Go integration

Enabling at Runtime

export FETCHML_NATIVE_LIBS=1

Deployment

Ship the native libraries alongside your Go binaries:

  • Linux: lib*.so files
  • macOS: lib*.dylib files

The libraries are loaded dynamically via cgo. If not found, FetchML automatically falls back to pure Go implementations.

Building with Native Support

make prod-with-native    # Copies .so/.dylib files to bin/

Architecture

Library Structure

native/
├── common/          # Shared utilities (mmap, thread pool, arena)
├── queue_index/     # Storage, heap, priority queue
└── dataset_hash/    # Crypto, I/O, threading

Security Boundaries

All native libraries implement input validation at C API boundaries:

  • Path validation: Rejects traversal sequences (..) and null bytes
  • Buffer safety: strncpy with explicit null termination
  • Mmap limits: 100MB cap prevents unbounded memory exposure
  • Atomic writes: Temp file + rename ensures data integrity

Testing

C++ Unit Tests

cd native/build && ctest --output-on-failure

Go Integration Tests

FETCHML_NATIVE_LIBS=1 go test ./tests/benchmarks/...
FETCHML_NATIVE_LIBS=1 go test ./tests/e2e/...

ASan Build

cmake .. -DENABLE_ASAN=ON
make
ASAN_OPTIONS=detect_leaks=1 ./test_executable

Performance Validation

Run benchmarks to verify native libraries outperform pure Go:

# Go implementation
FETCHML_NATIVE_LIBS=0 go test -bench=. ./tests/benchmarks/

# Native implementation
FETCHML_NATIVE_LIBS=1 go test -bench=. ./tests/benchmarks/

Troubleshooting

Library not found

Ensure the native libraries are in the library search path:

# Linux
export LD_LIBRARY_PATH=/path/to/native/build:$LD_LIBRARY_PATH

# macOS
export DYLD_LIBRARY_PATH=/path/to/native/build:$DYLD_LIBRARY_PATH

Build errors

Common issues:

  1. Missing cmake: Install with apt-get install cmake or brew install cmake
  2. Missing C++ compiler: Install build-essential (Linux) or Xcode (macOS)
  3. Missing zlib: Install zlib1g-dev (Linux) or it's built-in (macOS)