Metadata-Version: 2.1
Name: ocrpy
Version: 0.3.7
Summary: unified interface to google vision, aws textract, azure, tesseract OCR, EasyOCR tools.
Project-URL: Source, https://github.com/maxent-ai/ocrpy
Author-email: Maxentlabs <maxentlabsai@gmail.com>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Legal Industry
Classifier: Intended Audience :: Other Audience
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Telecommunications Industry
Classifier: License :: OSI Approved :: Apache Software License
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.7
Requires-Dist: attrs==21.4.0
Requires-Dist: beautifulsoup4==4.9.1
Requires-Dist: boto3==1.19.7
Requires-Dist: cloudpathlib==0.9.0
Requires-Dist: google-cloud-vision==1.0.0
Requires-Dist: numpy==1.21.1
Requires-Dist: opencv-python==4.1.2.30
Requires-Dist: pandas==1.3.3
Requires-Dist: pdf2image==1.14.0
Requires-Dist: pytesseract==0.3.6
Requires-Dist: python-dotenv==0.17.1
Requires-Dist: tqdm==4.64.0
Requires-Dist: transformers==4.20.1
Description-Content-Type: text/markdown

# ocrpy
[![Downloads](https://static.pepy.tech/personalized-badge/ocrpy?period=total&units=abbreviation&left_color=black&right_color=blue&left_text=Downloads)](https://pepy.tech/project/ocrpy)
![contributors](https://img.shields.io/github/contributors/maxent-ai/ocrpy?color=blue)
![PyPi](https://img.shields.io/pypi/v/ocrpy?color=blue)
![tag](https://img.shields.io/github/v/tag/maxent-ai/ocrpy)
![mit-license](https://img.shields.io/github/license/maxent-ai/ocrpy?color=blue)

Unified interface to google vision, aws textract, azure and tesseract OCR tools.


## Installation

```python
pip install ocrpy
```

### Sample Usage

```python
from ocrpy import TextOcrPipeline

# running pipeline from pipeline config.
ocr_pipeline = TextOcrPipeline.from_config("ocrpy_config.yaml")
ocr_pipeline.process()


# alternatively you can also run a pipeline like this:
pipeline = TextOcrPipeline(source_dir='s3://document_bucket/', 
                           destination_dir="gs://processed_document_bucket/outputs/", 
                           parser_backend='aws-textract', 
                           credentials={"AWS": "path/to/aws-credentials.env/file", 
                                        "GCP": "path/to/gcp-credentials.json/file"})
pipeline.process()
```

