Add cross-implementation consistency tests for dataset hash functionality: ## Test Fixtures - Single file, nested directories, and multiple file test cases - Expected hashes in JSON format for validation ## Test Infrastructure - harness.go: Common test utilities and reference implementation runner - dataset_hash_test.go: Consistency test cases comparing implementations - cmd/update.go: Tool to regenerate expected hashes from reference ## Purpose Ensures hash implementations (Go, C++, Zig) produce identical results across all supported platforms and implementations. |
||
|---|---|---|
| .. | ||
| dataset_hash | ||
| README.md | ||
Consistency Test Fixtures
This directory contains canonical test fixtures for cross-implementation consistency testing.
Each implementation (native C++, Go, Zig) must produce identical outputs for these fixtures.
Algorithm Specification
Dataset Hash Algorithm v1
- Recursively collect all regular files (not symlinks, not directories)
- Skip hidden files (names starting with '.')
- Sort file paths lexicographically (full relative paths)
- For each file:
- Compute SHA256 of file contents
- Convert to lowercase hex (64 chars)
- Combine: SHA256(concatenation of all file hashes in sorted order)
- Return lowercase hex (64 chars)
Empty directory: Returns SHA256 of empty string:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Directory Structure
dataset_hash/
├── 01_empty_dir/ # Empty directory
├── 02_single_file/ # One file with "hello world"
├── 03_nested/ # Nested directories
├── 04_special_chars/ # Files with spaces and unicode
└── expected_hashes.json # All expected outputs
Adding New Fixtures
- Create directory with
input/subdirectory - Add files to
input/ - Compute expected hash using reference implementation
- Add entry to
expected_hashes.json - Document any special considerations in
README.md