pdf-extractor

Here are 91 public repositories matching this topic...

torakiki / pdfsam

PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages

java pdf javafx extract split merge rotate splitter combine pdf-manipulation pdf-merge pdf-extractor pdf-split pdf-rotate pdf-mix split-pdf merge-pdf merger pdf-combiner

Updated Dec 8, 2025
Java

UglyToad / PdfPig

Star

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf csharp pdfbox netstandard pdf-files pdf-document pdf-generation hocr document-analysis pdf-extractor alto-xml page-xml layout-analysis pdf-document-processor

Updated Dec 7, 2025
C#

GowenGit / docnet

Star

DocNET is as fast PDF editing and reading library for modern .NET applications

pdf csharp jpeg pdf-converter netcore netstandard pdf-files pdf-document pdf-conversion pdf-extractor pdf-document-processor

Updated May 13, 2024
C#

DocumindHQ / documind

Star

Open-source platform for extracting structured data from documents using AI.

open-source pdf parser ocr ai pdf-converter developer-tools extract-data document-analysis pdf-extractor document-extraction llms pdf-extractor-llm

Updated May 15, 2025
JavaScript

pdftables / python-pdftables-api

Star

Python library to interact with https://pdftables.com API

pdf pdf-converter pdf-conversion pdf-to-excel pdftables pdf-extractor pdftables-api

Updated Nov 21, 2025
Python

asepmaulanaismail / pdf-to-txt-python

Star

Simple pdf to text with python using PDFtk and PyPDF2

python pdf python3 text-extraction pdf-to-text pypdf2 pdftk pdf-extractor

Updated Oct 1, 2023
Python

Siltaar / doc_crawler.py

Star

Explore a website recursively and download all the wanted documents (PDF, ODT…)

crawler downloader web-crawler recursive file-download pdf-extractor web-crawler-python

Updated Jun 24, 2021

Madgrades / madgrades-extractor

Star

UW-Madison course and grade distribution data extraction tool.

csv sql database java-8 uw-madison pdf-extractor

Updated Nov 21, 2025
Java

NotYuSheng / OmniPDF

Star

OmniPDF is a PDF analyzer capable of translation, summarization, captioning and conversational capabilities through Retrieval-Augmented-Generation (RAG).

docker kubernetes redis metadata microservice docker-compose helm production s3 crc image-captioning pdf-extractor fastapi pdf-table-extraction streamlit chromadb pdf-translator pdf-image-extractor docling

Updated Oct 7, 2025
Python

saiedislamshuvo / pdf-splitter-tool-react

Star

This is a simple ReactJS project that allows you to split a PDF file into separate pages, each page with a given name.

reactjs pdf-extractor

Updated Apr 24, 2023
CSS

xiaoyao9184 / docker-marker

Star

Docker implementation of the Marker pdf to markdown

ocr docker-image marker pdf-extractor cuda-support markdown-export

Updated Oct 1, 2025
Python

sensein / GrobidArticleExtractor

Star

python deep-learning pdf-extractor scientific-literature

Updated Jun 25, 2025
Python

SR-Sujon / llamachirp

Star

Engage in dynamic conversations with PDFs to extract and comprehend information using locally hosted LLM variants of Ollama by integrating RAG.

open-source chatbot pdf-extractor rag llm ollama

Updated May 7, 2024
Python

talrand / DocnetExtended

Star

DocNetExtended is a small extension library built upon the DocNet library, designed to extract text in a readable order from PDFs

pdf csharp netstandard pdf-extractor docnet

Updated Nov 12, 2021
C#

gimpscape / gimpscape-ppa

Star

Gimpscape Repository for Debian Based Distributions

repository custom extractor ppa inkscape pdf-extractor

Updated Mar 26, 2022
Shell

XFY9326 / MinerU-VLM-App

Star

MinerU 2.0 VLM 网页应用

python gradio pdf-extractor mineru

Updated Jun 27, 2025
JavaScript

Hymian7 / PDFtkSharp

Star

C# Wrapper around PDFLabs PDFtk Server CLI

cli pdf wrapper pdf-merge pdf-extractor pdf-merger pdf-merge-api

Updated Jul 19, 2022
C#

homfarnam / pdf-to-image-telegram-bot

Sponsor

Star

Pdf to Image Converter - A simple tool to convert pdf to image in Telegram

nodejs javascript telegram telegram-bot pdf-extractor gramjs

Updated Oct 20, 2022
JavaScript

H-Software224 / khuthon_2024

Star

Let's go khuthon in 2024!

nlp pdf-extractor

Updated Dec 27, 2024
Jupyter Notebook

A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.

python pdf pdf-viewer pdf-extractor python-ocr pdf-ocr-extraction python-pdf

Updated May 10, 2025
Python

Improve this page

Add a description, image, and links to the pdf-extractor topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-extractor topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-extractor

Here are 91 public repositories matching this topic...

torakiki / pdfsam

UglyToad / PdfPig

GowenGit / docnet

DocumindHQ / documind

pdftables / python-pdftables-api

asepmaulanaismail / pdf-to-txt-python

Siltaar / doc_crawler.py

Madgrades / madgrades-extractor

NotYuSheng / OmniPDF

saiedislamshuvo / pdf-splitter-tool-react

xiaoyao9184 / docker-marker

sensein / GrobidArticleExtractor

SR-Sujon / llamachirp

talrand / DocnetExtended

gimpscape / gimpscape-ppa

XFY9326 / MinerU-VLM-App

Hymian7 / PDFtkSharp

homfarnam / pdf-to-image-telegram-bot

H-Software224 / khuthon_2024

sfkbstnc / pdf-extractor-cli

Improve this page

Add this topic to your repo