Metadata-Version: 2.1
Name: tiny-blocks
Version: 0.1.13
Summary: Tiny Block Operations for Data Pipelines
Home-page: https://github.com/pyprogrammerblog/tiny-blocks
License: LICENSE
Author: Jose Vazquez
Author-email: josevazjim88@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyMySQL (>=1.0.2,<2.0.0)
Requires-Dist: SQLAlchemy (>=1.4.39,<2.0.0)
Requires-Dist: boto3 (>=1.24.43,<2.0.0)
Requires-Dist: cryptography (>=37.0.4,<38.0.0)
Requires-Dist: cx-Oracle (>=8.3.0,<9.0.0)
Requires-Dist: kafka-python (>=2.0.2,<3.0.0)
Requires-Dist: minio (>=7.1.11,<8.0.0)
Requires-Dist: pandas (>=1.4.3,<2.0.0)
Requires-Dist: psycopg2 (>=2.9.3,<3.0.0)
Requires-Dist: pydantic (>=1.9.1,<2.0.0)
Project-URL: Documentation, https://tiny-blocks.readthedocs.io/en/latest/
Description-Content-Type: text/markdown

 tiny-blocks
=============

[![Documentation Status](https://readthedocs.org/projects/tiny-blocks/badge/?version=latest)](https://tiny-blocks.readthedocs.io/en/latest/?badge=latest)
[![License-MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/pyprogrammerblog/tiny-blocks/blob/master/LICENSE)
[![GitHub Actions](https://github.com/pyprogrammerblog/tiny-blocks/workflows/CI/badge.svg/)](https://github.com/pyprogrammerblog/tiny-blocks/workflows/CI/badge.svg/)
[![PyPI version](https://badge.fury.io/py/tiny-blocks.svg)](https://badge.fury.io/py/tiny-blocks)

Tiny Blocks to build large and complex ETL data pipelines!

Tiny-Blocks is a library for **data engineering** operations. 
Each **pipeline** is made out of **tiny-blocks** glued with the `>>` operator.
This library relies on a fundamental streaming abstraction consisting of three
parts: **extract**, **transform**, and **load**. You can view a pipeline 
as an extraction, followed by zero or more transformations, followed by a sink. 
Visually, this looks like:

```
extract -> transform1 -> transform2 -> ... -> transformN -> load
```

You can also `fan-in`, `fan-out` for more complex operations.

```
extract1 -> transform1 -> |-> transform2 -> ... -> | -> transformN -> load1
extract2 ---------------> |                        | -> load2
```

Tiny-Blocks use **generators** to stream data. Each **chunk** is a **Pandas DataFrame**. 
The `chunksize` or buffer size is adjustable per pipeline.

Installation
-------------

Install it using ``pip``

```shell
pip install tiny-blocks
```

Basic usage
---------------

```python
from tiny_blocks.extract import FromCSV
from tiny_blocks.transform import Fillna
from tiny_blocks.load import ToSQL

# ETL Blocks
from_csv = FromCSV(path='/path/to/source.csv')
fill_na = Fillna(value="Hola Mundo")
to_sql = ToSQL(dsn_conn='psycopg2+postgres://...', table_name="sink")

# Pipeline
from_csv >> fill_na >> to_sql
```

Examples
----------------------

For more complex examples please visit 
the [notebooks' folder](https://github.com/pyprogrammerblog/tiny-blocks/blob/master/notebooks/Examples.ipynb).


Documentation
--------------

Please visit this [link](https://tiny-blocks.readthedocs.io/en/latest/) for documentation.

