A Python-based pipeline for processing images of flipcharts, performing OCR (Optical Character Recognition) using NVIDIA OCR services, optionally uploading to NVCF, annotating OCR results, and generating a combined PDF. Designed for quick processing, annotation, and archival of flipchart content.
- Load images from a directory (supports
.jpgor custom extensions). - Image preprocessing and optimization.
- Upload images to NVIDIA NVCF (optional).
- OCR detection with German language support.
- Annotate PDF with OCR results (bounding boxes and text).
- Combine multiple images into a single PDF.
- Command-line interface with flexible input/output options.
- Workflow monitoring with worker threads for asynchronous processing.
-
Python 3.11+
-
Libraries:
opencv-pythonnumpyrequestspython-dotenvPyPDF2>=3.0.0reportlabPillow
-
NVIDIA OCR API key
-
Clone the repository:
git clone https://github.com/bumbleflies/protocol.git cd flipchart-pipeline -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate # Windows pip install -r requirements.txt
-
Create a
.envfile with your NVIDIA API key (optional for OCR):cp .env.example .env # Edit .env and add your NVIDIA API keyGet your API key from: https://build.nvidia.com/explore/discover
Note: If no API key is provided, the pipeline will automatically skip OCR and just combine images into a PDF.
Run the pipeline from the command line:
python main.py -i /path/to/images -e .jpg -o output.pdf| Flag | Description | Default |
|---|---|---|
-i, --input |
Input directory containing images | . (current directory) |
-e, --extension |
File extension to process | .jpg |
-o, --output |
Output PDF filename | combined-YYYYMMDD-HHMMSS.pdf |
--no-ocr |
Skip OCR step (images only) | False |
--config |
Path to YAML configuration file | pipeline_config.yaml |
-
Place flipchart images in a folder (
images/). -
Run the pipeline:
python main.py -i images/ -o flipcharts.pdf
-
The pipeline will:
- Optimize images
- Upload (if configured)
- Run OCR
- Annotate detected text
- Produce
flipcharts.pdfwith OCR annotations
flipchart-pipeline/
├─ main.py # Entry point
├─ pipeline/ # Core workflow classes
│ ├─ file_loader.py
│ ├─ worker.py
│ └─ monitor.py
├─ tasks/ # Task modules
│ ├─ image_optimization.py
│ ├─ upload_task.py
│ ├─ ocr_task.py
│ └─ save_pdf.py
├─ .env # Environment variables (API keys)
├─ requirements.txt
└─ README.md
- All sensitive information (API keys) should be stored in
.env. - OCR language is set to German by default; this can be configured in
OCRTask. - Worker threads allow asynchronous image processing for large datasets.
The codebase has been refactored to follow SOLID principles! 🎉
Key improvements:
- ✅ Abstract base classes for task processors (LSP compliance)
- ✅ Provider abstraction for OCR services (DIP compliance)
- ✅ Task registry system (OCP compliance)
- ✅ Configuration-based pipeline with dependency injection
- ✅ 40 comprehensive tests with 50% code coverage
- ✅ Backward compatible with existing usage
New features:
# Use YAML configuration for flexible pipelines
python main.py --config pipeline_config.yamlDocumentation:
- REFACTORING_SUMMARY.md - Complete refactoring details
- CLAUDE.md - Architecture documentation
- TEST_RESULTS.md - Test coverage and results
- SEMANTIC_RELEASE.md - Automated versioning and releases
- tests/README.md - Testing guide
Running tests:
pip install -r requirements.txt
pip install -r test_requirements.txt
pytest tests/ -vReleases: This project uses Python Semantic Release for automated versioning. Use Conventional Commits for automatic version bumps:
feat:→ Minor version bump (0.x.0)fix:→ Patch version bump (0.0.x)feat!:orBREAKING CHANGE:→ Major version bump (x.0.0)
See SEMANTIC_RELEASE.md for detailed instructions.
MIT License © bumbleflies UG