PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
-
Updated
Dec 8, 2025 - Java
PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
Read and extract text and other content from PDFs in C# (port of PDFBox)
DocNET is as fast PDF editing and reading library for modern .NET applications
Open-source platform for extracting structured data from documents using AI.
Python library to interact with https://pdftables.com API
Simple pdf to text with python using PDFtk and PyPDF2
Explore a website recursively and download all the wanted documents (PDF, ODT…)
UW-Madison course and grade distribution data extraction tool.
OmniPDF is a PDF analyzer capable of translation, summarization, captioning and conversational capabilities through Retrieval-Augmented-Generation (RAG).
This is a simple ReactJS project that allows you to split a PDF file into separate pages, each page with a given name.
Docker implementation of the Marker pdf to markdown
Engage in dynamic conversations with PDFs to extract and comprehend information using locally hosted LLM variants of Ollama by integrating RAG.
DocNetExtended is a small extension library built upon the DocNet library, designed to extract text in a readable order from PDFs
Gimpscape Repository for Debian Based Distributions
C# Wrapper around PDFLabs PDFtk Server CLI
Pdf to Image Converter - A simple tool to convert pdf to image in Telegram
A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.
Add a description, image, and links to the pdf-extractor topic page so that developers can more easily learn about it.
To associate your repository with the pdf-extractor topic, visit your repo's landing page and select "manage topics."