fetch_ml/native/dataset_hash/crypto/sha256_x86.cpp
Jeremie Fraeys 7efe8bbfbf
native: security hardening, research trustworthiness, and CVE mitigations
Security Fixes:
- CVE-2024-45339: Add O_EXCL flag to temp file creation in storage_write_entries()
  Prevents symlink attacks on predictable .tmp file paths
- CVE-2025-47290: Use openat_nofollow() in storage_open()
  Closes TOCTOU race condition via path_sanitizer infrastructure
- CVE-2025-0838: Add MAX_BATCH_SIZE=10000 to add_tasks()
  Prevents integer overflow in batch operations

Research Trustworthiness (dataset_hash):
- Deterministic file ordering: std::sort after collect_files()
- Recursive directory traversal: depth-limited with cycle detection
- Documented exclusions: hidden files and special files noted in API

Bug Fixes:
- R1: storage_init path validation for non-existent directories
- R2: safe_strncpy return value check before strcat
- R3: parallel_hash 256-file cap replaced with std::vector
- R4: wire qi_compact_index/qi_rebuild_index stubs
- R5: CompletionLatch race condition fix (hold mutex during decrement)
- R6: ARMv8 SHA256 transform fix (save abcd_pre before vsha256hq_u32)
- R7: fuzz_index_storage header format fix
- R8: enforce null termination in add_tasks/update_tasks
- R9: use 64 bytes (not 65) in combined hash to exclude null terminator
- R10: status field persistence in save()

New Tests:
- test_recursive_dataset.cpp: Verify deterministic recursive hashing
- test_storage_symlink_resistance.cpp: Verify CVE-2024-45339 fix
- test_queue_index_batch_limit.cpp: Verify CVE-2025-0838 fix
- test_sha256_arm_kat.cpp: ARMv8 known-answer tests
- test_storage_init_new_dir.cpp: F1 verification
- test_parallel_hash_large_dir.cpp: F3 verification
- test_queue_index_compact.cpp: F4 verification

All 8 native tests passing. Library ready for research lab deployment.
2026-02-21 13:33:45 -05:00

47 lines
1.5 KiB
C++

#include "sha256_base.h"
// Intel SHA-NI (SHA Extensions) implementation
#if defined(__x86_64__) || defined(_M_X64)
#include <cpuid.h>
#include <immintrin.h>
// TODO: Full SHA-NI implementation using:
// _mm_sha256msg1_epu32, _mm_sha256msg2_epu32 for message schedule
// _mm_sha256rnds2_epu32 for rounds
// For now, falls back to generic (implementation placeholder)
static void transform_sha_ni(uint32_t* state, const uint8_t* data) {
// Placeholder: full implementation would use SHA-NI intrinsics
// This requires message scheduling with sha256msg1/sha256msg2
// and rounds with sha256rnds2
transform_generic(state, data);
}
TransformFunc detect_x86_transform(void) {
// Fix: Return nullptr until real SHA-NI implementation exists
// The placeholder transform_sha_ni() just calls transform_generic(),
// which would falsely report "SHA-NI" when it's actually generic.
//
// TODO: Implement real SHA-NI using:
// _mm_sha256msg1_epu32, _mm_sha256msg2_epu32 for message schedule
// _mm_sha256rnds2_epu32 for rounds
// Then enable this detection.
(void)transform_sha_ni; // Suppress unused function warning
return nullptr;
/* Full implementation when ready:
unsigned int eax, ebx, ecx, edx;
if (__get_cpuid(7, &eax, &ebx, &ecx, &edx)) {
if (ebx & (1 << 29)) { // SHA bit
return transform_sha_ni;
}
}
return nullptr;
*/
}
#else // No x86 support
TransformFunc detect_x86_transform(void) { return nullptr; }
#endif