# Consistency Test Fixtures This directory contains canonical test fixtures for cross-implementation consistency testing. Each implementation (native C++, Go, Zig) must produce identical outputs for these fixtures. ## Algorithm Specification ### Dataset Hash Algorithm v1 1. Recursively collect all regular files (not symlinks, not directories) 2. Skip hidden files (names starting with '.') 3. Sort file paths lexicographically (full relative paths) 4. For each file: - Compute SHA256 of file contents - Convert to lowercase hex (64 chars) 5. Combine: SHA256(concatenation of all file hashes in sorted order) 6. Return lowercase hex (64 chars) **Empty directory**: Returns SHA256 of empty string: `e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855` ### Directory Structure ``` dataset_hash/ ├── 01_empty_dir/ # Empty directory ├── 02_single_file/ # One file with "hello world" ├── 03_nested/ # Nested directories ├── 04_special_chars/ # Files with spaces and unicode └── expected_hashes.json # All expected outputs ``` ## Adding New Fixtures 1. Create directory with `input/` subdirectory 2. Add files to `input/` 3. Compute expected hash using reference implementation 4. Add entry to `expected_hashes.json` 5. Document any special considerations in `README.md`