History

Jeremie Fraeys a239f3a14f test(consistency): add dataset hash consistency test suite Add cross-implementation consistency tests for dataset hash functionality: ## Test Fixtures - Single file, nested directories, and multiple file test cases - Expected hashes in JSON format for validation ## Test Infrastructure - harness.go: Common test utilities and reference implementation runner - dataset_hash_test.go: Consistency test cases comparing implementations - cmd/update.go: Tool to regenerate expected hashes from reference ## Purpose Ensures hash implementations (Go, C++, Zig) produce identical results across all supported platforms and implementations.	2026-03-05 14:41:14 -05:00
..
dataset_hash	test(consistency): add dataset hash consistency test suite	2026-03-05 14:41:14 -05:00
README.md	test(consistency): add dataset hash consistency test suite	2026-03-05 14:41:14 -05:00

test(consistency): add dataset hash consistency test suite

Add cross-implementation consistency tests for dataset hash functionality:

## Test Fixtures
- Single file, nested directories, and multiple file test cases
- Expected hashes in JSON format for validation

## Test Infrastructure
- harness.go: Common test utilities and reference implementation runner
- dataset_hash_test.go: Consistency test cases comparing implementations
- cmd/update.go: Tool to regenerate expected hashes from reference

## Purpose
Ensures hash implementations (Go, C++, Zig) produce identical results
across all supported platforms and implementations.

2026-03-05 14:41:14 -05:00

dataset_hash

test(consistency): add dataset hash consistency test suite

2026-03-05 14:41:14 -05:00

README.md

test(consistency): add dataset hash consistency test suite

2026-03-05 14:41:14 -05:00

README.md

Consistency Test Fixtures

This directory contains canonical test fixtures for cross-implementation consistency testing.

Each implementation (native C++, Go, Zig) must produce identical outputs for these fixtures.

Algorithm Specification

Dataset Hash Algorithm v1

Recursively collect all regular files (not symlinks, not directories)
Skip hidden files (names starting with '.')
Sort file paths lexicographically (full relative paths)
For each file:
- Compute SHA256 of file contents
- Convert to lowercase hex (64 chars)
Combine: SHA256(concatenation of all file hashes in sorted order)
Return lowercase hex (64 chars)

Empty directory: Returns SHA256 of empty string: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Directory Structure

dataset_hash/
├── 01_empty_dir/         # Empty directory
├── 02_single_file/         # One file with "hello world"
├── 03_nested/              # Nested directories
├── 04_special_chars/       # Files with spaces and unicode
└── expected_hashes.json    # All expected outputs

Adding New Fixtures

Create directory with input/ subdirectory
Add files to input/
Compute expected hash using reference implementation
Add entry to expected_hashes.json
Document any special considerations in README.md