A pure python-based utility to extract text from docx files.
The code is taken and adapted from python-docx. It can however also extract text from header, footer and hyperlinks.
pip install docx2txta. From command line:
docx2txt file.docxb. From python:
import docx2txt
text = docx2txt.process("file.docx")Report to ankush dot shah dot nitk at gmail dot com