A high-performance command-line tool for comparing folders and generating detailed reports on file differences. This tool supports cross-platform operation (Windows, Linux, and macOS) and can efficiently handle large volumes of files and large file sizes.
- Compare files in two folders for content differences
- Detect and report three types of inconsistencies:
- Files with different content
- Missing files (present in folder1 but not in folder2)
- Extra files (present in folder2 but not in folder1)
- Cross-platform support (Windows, Linux, and macOS)
- Uses chunked hash comparison algorithm for efficient handling of large files (up to tens of GB)
- Multi-process parallel processing for improved performance
- File and folder ignore patterns
- Detailed reports in both text and HTML formats with GitHub and PyPI links
- Interactive progress bar with ETA display
- Handles tens or hundreds of thousands of files efficiently
pip install hpfc-toolNote: The package name is
hpfc-tooland the command-line tool ishpfc.
git clone https://github.com/ethan-li/hpfc.git
cd hpfc
pip install .hpfc folder1 folder2hpfc folder1 folder2 [options]Options:
-c,--chunk-size: Chunk size in bytes for comparing large files (default: 8MB)-w,--workers: Number of worker processes for parallel processing (default: CPU count)-i,--ignore: Patterns to ignore (can specify multiple)-o,--output: Save report to specified file (default: console output)--html: Generate an HTML report instead of text--no-progress: Disable progress bar display-v,--version: Show version information
Compare two folders:
hpfc /path/to/folder1 /path/to/folder2Ignore specific files or folders:
hpfc /path/to/folder1 /path/to/folder2 --ignore ".git" "*.log" "temp"Adjust chunk size for large file comparison:
hpfc /path/to/folder1 /path/to/folder2 --chunk-size 16777216 # 16MBSpecify number of worker processes:
hpfc /path/to/folder1 /path/to/folder2 --workers 4Save report to file:
hpfc /path/to/folder1 /path/to/folder2 --output report.txtGenerate HTML report:
hpfc /path/to/folder1 /path/to/folder2 --html --output report.html0: All files are identical1: There are different files, missing files, extra files, or error files
- For large files (>8MB), the tool uses chunked hash comparison instead of full file loading to avoid memory overflow
- For small files, direct content comparison is used, which is generally faster than hash calculation
- The tool uses multi-process parallel processing for file comparison to utilize multi-core CPUs
- Performance priority: files are first compared by size, and only if sizes match are contents compared
python -m unittest discover -s testsTo skip large file tests (which may take some time):
SKIP_LARGE_FILE_TEST=1 python -m unittest discover -s testshpfc-tool/
├── src/
│ └── hpfc/
│ ├── __init__.py # Package initialization
│ ├── core.py # Core comparison functionality
│ └── cli.py # Command-line interface
├── tests/
│ └── test_hpfc.py # Test cases
├── setup.py # Package setup
└── README.md # Documentation
- This tool is specifically designed for handling large volumes of files and large file sizes
- Binary and text files are compared in the same way (exact content comparison)
- Ignore patterns use simple substring matching, not wildcards or regular expressions
- HTML reports provide an interactive and visual representation of comparison results
- Ethan Li - Initial work and maintenance