Samples for common OCR file formats (hOCR, ABBYY, ALTO)
This repository contains sample files for common OCR file formats in a standardized directory layout to make it easier for tools to test.
This is a sister project to ocr-fileformat which contains code for format validation and inter-format transformation.
Samples are placed in ./samples/<format>/<format-version>/<sample>.
Metadata about a sample is in
./samples/<format>/<format-version>/<sample>.yml. Metadata is
described as a YAML document with the following keys:
engine: Engine that created this documentpage-side:leftorrightview-url: URL to the landing pagepage-number: Numeric page number