Metadata-Version: 2.1
Name: openfe
Version: 0.0.4
Summary: OpenFE: automated feature generation beyond expert-level performance
Home-page: https://github.com/ZhangTP1996/OpenFE
Author: Tianping Zhang
Author-email: ztp18@mails.tsinghua.edu.cn
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

<div id="top" align="center">

<img src=https://github.com/ZhangTP1996/OpenFE/blob/master/doc/logo/openfe.svg width=300 />

OpenFE: automated feature generation beyond expert-level performance
-----------------------------
<h3> |<a href="https://arxiv.org/abs/2211.12507"> Paper </a> | 
<a href="https://openfe-document.readthedocs.io/en/latest/"> Documentation </a> | 
<a href="https://github.com/ZhangTP1996/OpenFE/tree/master/examples"> Examples </a> |  </h3>

</div>

OpenFE is a powerful framework for automated feature generation in tabular data. 
OpenFE is easy-to-use, effective, and efficient with following advantages:
- OpenFE covers 23 operators summarized from numerous Kaggle competitions to generate candidate features.
- OpenFE supports binary-classification, multi-classification, and regression tasks.
- OpenFE is accurate in retrieving effective candidate features for improving the learning performance of both GBDT and neural networks.
- OpenFE is efficient and supports parallel computing.

For further details, please refer to [our paper](https://arxiv.org/abs/2211.12507). Extensive comparison experiments
on public datasets show that OpenFE outperforms existing feature generation methods on both effectiveness and efficiency.
Moreover, we validate OpenFE on the [IEEE-CIS Fraud Detection](https://www.kaggle.com/competitions/ieee-fraud-detection)
Kaggle competition, and show that a simple XGBoost model with features generated by OpenFE 
beats 99.3% of 6351 data science teams. The features generated by OpenFE results in larger performance
improvement than those provided by the first-place team in the competition.

Get Started and Documentation
-----------------------------

**Installation**

It is recommended to use **pip** for installation.

```
pip install openfe
```

Please do not use **conda install openfe** for installation.
It will install another python package different from ours.

**A Quick Example**

It only takes four lines of codes to generate features by OpenFE. First, we generate features by OpenFE.
Next, we augment the train and test data by the generated features.

```
from openfe import openfe, transform

ofe = openfe()
features = ofe.fit(data=train_x, label=train_y, n_jobs=n_jobs)  # generate new features
train_x, test_x = transform(train_x, test_x, features, n_jobs=n_jobs) # transform the train and test data according to generated features.
```

We provide an example using the standard california_housing dataset in 
[this link](<https://github.com/ZhangTP1996/OpenFE/blob/master/examples/california_housing.py>). 
A more complicated example demonstrating OpenFE can outperform machine learning experts in the IEEE-CIS Fraud Detection 
Kaggle competition is provided in [this link](<https://github.com/ZhangTP1996/OpenFE/blob/master/examples/IEEE-CIS-Fraud-Detection/main.py>).
Users can also refer to our [documentation] for more advanced usage of OpenFE and FAQ about feature generation.
