1. Introduction
This document describes how to use the reconciii command-line tool to compare two CSV files and generate a report highlighting missing records and discrepancies between them. The tool can be helpful for data validation, identifying discrepancies in data pipelines, and ensuring data consistency across multiple sources.
2. Prerequisites
- Python 3.6+
clickpackage installed (pip install click)- Pandas library installed (pip install pandas)
- Levenshtein library installed (pip install Levenshtein)
2.1 Installing
pip3 install reconciii3. Running the Tool
3.1 Required arguments:
-sor--source: Path to the source CSV file.-tor--target: Path to the target CSV file.-oor--output: Path to the output CSV file containing the reconciliation report (default: "reconciliation_report.csv").
3.2 Optional arguments:
-cor--comparison-columns: List of additional columns to compare for discrepancies (default: all columns except "ID").
3.2 Example usage:
reconcile -s data/source.csv -t data/target.csv -o report.csv -c column1,column2This command will compare the source file data/source.csv with the target file data/target.csv, generate a reconciliation report named report.csv, and compare discrepancies in columns column1 and column2 in addition to the default "ID" column.
4. Output Report
The generated report will be a CSV file with the following columns:
Type: Indicates the type of discrepancy (missing in source, missing in target, field discrepancy).Record Identifier: ID of the record where the discrepancy was found.Field: (optional) Specific field where the discrepancy was found (applicable for field discrepancies).Source Value: Value of the field in the source file.Target Value: Value of the field in the target file.