- Create a folder named
pdfinpdfReader/and put pdf documents in./pdf/ - Run
Python pdfReader.py - pdf documents will be converted into txt files in
./txt/
- Webpage: https://euske.github.io/pdfminer/
- Download (PyPI): https://pypi.python.org/pypi/pdfminer/
- Demo WebApp: http://pdf2html.tabesugi.net:8080/
- Install Python 2.6 or newer. (For Python 3 support have a look at pdfminer.six).
- Download the source code.
- Unpack it.
- Run
setup.py:
$ python setup.py install
- Do the following test:
$ pdf2txt.py samples/simple1.pdf
Parse English and Chinese Papers
- Webpage:
- Python 2.7.9
- Python 3.4.3