textract

Package — DocumentsPython 3.7+Intermediate

Extract text from any document format (PDF, DOCX, PPTX, etc.)

Quick Info

Documentation
Official Docs
Python Version
3.7+
Dependencies
chardet, argcomplete, beautifulsoup4, xlrd, six, SpeechRecognition, pdfminer.six, docx2txt, python-pptx, EbookLib
Install
pip install textract

Learn by Difficulty

Quick Example

python
# Install: pip install textract
import textract

# Basic textract usage
print(f"Using textract")
# See documentation for detailed examples

textract is a third-party package. Extract text from any document format (PDF, DOCX, PPTX, etc.). Install with: pip install textract

Try in Playground

Tags

packagedocumentsfile-formatoffice