fetch_ml/native/tests/fuzz/fuzz_index_storage.cpp
Jeremie Fraeys 7efe8bbfbf
native: security hardening, research trustworthiness, and CVE mitigations
Security Fixes:
- CVE-2024-45339: Add O_EXCL flag to temp file creation in storage_write_entries()
  Prevents symlink attacks on predictable .tmp file paths
- CVE-2025-47290: Use openat_nofollow() in storage_open()
  Closes TOCTOU race condition via path_sanitizer infrastructure
- CVE-2025-0838: Add MAX_BATCH_SIZE=10000 to add_tasks()
  Prevents integer overflow in batch operations

Research Trustworthiness (dataset_hash):
- Deterministic file ordering: std::sort after collect_files()
- Recursive directory traversal: depth-limited with cycle detection
- Documented exclusions: hidden files and special files noted in API

Bug Fixes:
- R1: storage_init path validation for non-existent directories
- R2: safe_strncpy return value check before strcat
- R3: parallel_hash 256-file cap replaced with std::vector
- R4: wire qi_compact_index/qi_rebuild_index stubs
- R5: CompletionLatch race condition fix (hold mutex during decrement)
- R6: ARMv8 SHA256 transform fix (save abcd_pre before vsha256hq_u32)
- R7: fuzz_index_storage header format fix
- R8: enforce null termination in add_tasks/update_tasks
- R9: use 64 bytes (not 65) in combined hash to exclude null terminator
- R10: status field persistence in save()

New Tests:
- test_recursive_dataset.cpp: Verify deterministic recursive hashing
- test_storage_symlink_resistance.cpp: Verify CVE-2024-45339 fix
- test_queue_index_batch_limit.cpp: Verify CVE-2025-0838 fix
- test_sha256_arm_kat.cpp: ARMv8 known-answer tests
- test_storage_init_new_dir.cpp: F1 verification
- test_parallel_hash_large_dir.cpp: F3 verification
- test_queue_index_compact.cpp: F4 verification

All 8 native tests passing. Library ready for research lab deployment.
2026-02-21 13:33:45 -05:00

68 lines
1.9 KiB
C++

// fuzz_index_storage.cpp - libFuzzer harness for index storage
// Tests parsing of arbitrary index.bin content
#include <cstdint>
#include <cstddef>
#include <cstdio>
#include <cstring>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
// Include the storage implementation
#include "../../queue_index/storage/index_storage.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
// Create a temporary directory
char tmpdir[] = "/tmp/fuzz_idx_XXXXXX";
if (!mkdtemp(tmpdir)) {
return 0;
}
// Write fuzz data as index.bin
char path[256];
snprintf(path, sizeof(path), "%s/index.bin", tmpdir);
int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0640);
if (fd < 0) {
rmdir(tmpdir);
return 0;
}
// Write header if data is too small (minimum valid header)
if (size < 48) {
// Write a minimal valid header using proper struct
FileHeader header{};
memcpy(header.magic, "FQI1", 4);
header.version = CURRENT_VERSION;
header.entry_count = 0;
memset(header.reserved, 0, sizeof(header.reserved));
memset(header.padding, 0, sizeof(header.padding));
write(fd, &header, sizeof(header));
if (size > 0) {
write(fd, data, size);
}
} else {
write(fd, data, size);
}
close(fd);
// Try to open and read the storage
IndexStorage storage;
if (storage_init(&storage, tmpdir)) {
if (storage_open(&storage)) {
// Try to read entries - this is where vulnerabilities could be triggered
DiskEntry entries[16];
size_t count = 0;
storage_read_entries(&storage, entries, 16, &count);
storage_close(&storage);
}
storage_cleanup(&storage);
}
// Cleanup
unlink(path);
rmdir(tmpdir);
return 0; // Non-crashing input
}