# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['neattext',
 'neattext.explainer',
 'neattext.functions',
 'neattext.pattern_data']

package_data = \
{'': ['*']}

setup_kwargs = {
    'name': 'neattext',
    'version': '0.1.0',
    'description': 'Neattext - a simple NLP package for cleaning text',
    'long_description': '# neattext\nNeatText:a simple NLP package for cleaning textual data and text preprocessing\n\n[![Build Status](https://travis-ci.org/Jcharis/neattext.svg?branch=master)](https://travis-ci.org/Jcharis/neattext)\n\n[![GitHub license](https://img.shields.io/github/license/Jcharis/neattext)](https://github.com/Jcharis/neattext/blob/master/LICENSE)\n\n#### Problem\n+ Cleaning of unstructured text data\n+ Reduce noise [special characters,stopwords]\n+ Reducing repetition of using the same code for text preprocessing\n\n#### Solution\n+ convert the already known solution for cleaning text into a reuseable package\n\n\n#### Installation\n```bash\npip install neattext\n```\n\n### Usage\n+ The OOP Way(Object Oriented Way)\n+ NeatText offers 4 main classes for working with text data\n\t- TextFrame : a frame-like object for cleaning text\n\t- TextCleaner: remove or replace specifics\n\t- TextExtractor: extract unwanted text data\n\t- TextMetrics: word stats and metrics\n\n### Overall Components of NeatText\n![](images/neattext_features_jcharistech.png)\n\n### Using TextFrame\n+ Keeps the text as `TextFrame` object. This allows us to do more with our text. \n+ It inherits the benefits of the TextCleaner and the TextMetrics out of the box with some additional features for handling text data.\n+ This is the simplest way for text preprocessing with this library alternatively you can utilize the other classes too.\n\n\n```python\n>>> import neattext as nt \n>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."\n>>> docx = nt.TextFrame(text=mytext)\n>>> docx.text \n"This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."\n>>>\n>>> docx.describe()\nKey      Value          \nLength  : 73             \nvowels  : 21             \nconsonants: 34             \nstopwords: 4              \npunctuations: 8              \nspecial_char: 8              \ntokens(whitespace): 10             \ntokens(words): 14             \n>>> \n>>> docx.length\n73\n>>> # Scan Percentage of Noise(Unclean data) in text\n>>> d.noise_scan()\n{\'text_noise\': 19.17808219178082, \'text_length\': 73, \'noise_count\': 14}\n>>> \n>>> docs.head(16)\n\'This is the mail\'\n>>> docx.tail()\n>>> docx.count_vowels()\n>>> docx.count_stopwords()\n>>> docx.count_consonants()\n>>> docx.nlongest()\n>>> docx.nshortest()\n>>> docx.readability()\n```\n#### Basic NLP Task (Tokenization,Ngram,Text Generation)\n```python\n>>> docx.word_tokens()\n>>>\n>>> docx.sent_tokens()\n>>>\n>>> docx.term_freq()\n>>>\n>>> docx.bow()\n```\n#### Basic Text Preprocessing\n```python\n>>> docx.normalize()\n\'this is the mail example@gmail.com ,our website is https://example.com 😊.\'\n>>> docx.normalize(level=\'deep\')\n\'this is the mail examplegmailcom our website is httpsexamplecom \'\n\n>>> docx.remove_puncts()\n>>> docx.remove_stopwords()\n>>> docx.remove_html_tags()\n>>> docx.remove_special_characters()\n>>> docx.remove_emojis()\n>>> docx.fix_contractions()\n```\n\n##### Handling Files with NeatText\n+ Read txt file directly into TextFrame\n```python\n>>> import neattext as nt \n>>> docx_df = nt.read_txt(\'file.txt\')\n```\n+ Alternatively you can instantiate a TextFrame and read a text file into it\n```python\n>>> import neattext as nt \n>>> docx_df = nt.TextFrame().read_txt(\'file.txt\')\n```\n\n##### Chaining Methods on TextFrame\n```python\n>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."\n>>> docx = TextFrame(t1)\n>>> result = docx.remove_emails().remove_urls().remove_emojis()\n>>> print(result)\n\'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.\'\n```\n\n\n\n#### Clean Text\n+ Clean text by removing emails,numbers,stopwords,emojis,etc\n+ A simplified method for cleaning text by specifying as True/False what to clean from a text\n```python\n>>> from neattext.functions import clean_text\n>>> \n>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."\n>>> \n>>> clean_text(mytext)\n\'mail example@gmail.com ,our website https://example.com .\'\n```\n+ You can remove punctuations,stopwords,urls,emojis,multiple_whitespaces,etc by setting them to True.\n\n+ You can choose to remove or not remove punctuations by setting to True/False respectively\n\n```python\n>>> clean_text(mytext,puncts=True)\n\'mail example@gmailcom website https://examplecom \'\n>>> \n>>> clean_text(mytext,puncts=False)\n\'mail example@gmail.com ,our website https://example.com .\'\n>>> \n>>> clean_text(mytext,puncts=False,stopwords=False)\n\'this is the mail example@gmail.com ,our website is https://example.com .\'\n>>> \n```\n+ You can also remove the other non-needed items accordingly\n```python\n>>> clean_text(mytext,stopwords=False)\n\'this is the mail example@gmail.com ,our website is https://example.com .\'\n>>>\n>>> clean_text(mytext,urls=False)\n\'mail example@gmail.com ,our website https://example.com .\'\n>>> \n>>> clean_text(mytext,urls=True)\n\'mail example@gmail.com ,our website .\'\n>>> \n\n```\n\n#### Removing Punctuations [A Very Common Text Preprocessing Step]\n+ You remove the most common punctuations such as fullstop,comma,exclamation marks and question marks by setting most_common=True which is the default\n+ Alternatively you can also remove all known punctuations from a text.\n```python\n>>> import neattext as nt \n>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don\'t forget the email when you enter !!!!!"\n>>> docx = nt.TextFrame(mytext)\n>>> docx.remove_puncts()\nTextFrame(text="This is the mail example@gmailcom our WEBSITE is https://examplecom 😊 Please dont forget the email when you enter ")\n\n>>> docx.remove_puncts(most_common=False)\nTextFrame(text="This is the mail examplegmailcom our WEBSITE is httpsexamplecom 😊 Please dont forget the email when you enter ")\n```\n\n#### Removing Stopwords [A Very Common Text Preprocessing Step]\n+ You can remove stopwords from a text by specifying the language. The default language is English\n+ Supported Languages include English(en),Spanish(es),French(fr)|Russian(ru)|Yoruba(yo)|German(de)\n\n```python\n>>> import neattext as nt \n>>> mytext = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊. Please don\'t forget the email when you enter !!!!!"\n>>> docx = nt.TextFrame(mytext)\n>>> docx.remove_stopwords(lang=\'en\')\nTextFrame(text="mail example@gmail.com ,our WEBSITE https://example.com 😊. forget email enter !!!!!")\n```\n\n\n#### Remove Emails,Numbers,Phone Numbers,Dates,Btc Address,VisaCard Address,etc \n```python\n>>> print(docx.remove_emails())\n>>> \'This is the mail  ,our WEBSITE is https://example.com 😊.\'\n>>>\n>>> print(docx.remove_stopwords())\n>>> \'This mail example@gmail.com ,our WEBSITE https://example.com 😊.\'\n>>>\n>>> print(docx.remove_numbers())\n>>> docx.remove_phone_numbers()\n>>> docx.remove_btc_address()\n```\n\n\n#### Remove Special Characters\n```python\n>>> docx.remove_special_characters()\n```\n\n#### Remove Emojis\n```python\n>>> print(docx.remove_emojis())\n>>> \'This is the mail example@gmail.com ,our WEBSITE is https://example.com .\'\n```\n\n\n#### Remove Custom Pattern\n+ You can also specify your own custom pattern, incase you cannot find what you need in the functions using the `remove_custom_pattern()` function\n```python\n>>> import neattext.functions as nfx \n>>> ex = "Last !RT tweeter multiple &#7777"\n>>> \n>>> nfx.remove_custom_pattern(e,r\'&#\\d+\')\n\'Last !RT tweeter multiple  \'\n\n\n\n```\n\n#### Replace Emails,Numbers,Phone Numbers\n```python\n>>> docx.replace_emails()\n>>> docx.replace_numbers()\n>>> docx.replace_phone_numbers()\n```\n\n#### Chain Multiple Methods\n```python\n>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊 and it will cost $100 to subscribe."\n>>> docx = TextCleaner(t1)\n>>> result = docx.remove_emails().remove_urls().remove_emojis()\n>>> print(result)\n\'This is the mail  ,our WEBSITE is   and it will cost $100 to subscribe.\'\n\n```\n\n### Using TextExtractor\n+ To Extract emails,phone numbers,numbers,urls,emojis from text\n```python\n>>> from neattext import TextExtractor\n>>> docx = TextExtractor()\n>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."\n>>> docx.extract_emails()\n>>> [\'example@gmail.com\']\n>>>\n>>> docx.extract_emojis()\n>>> [\'😊\']\n```\n\n\n### Using TextMetrics\n+ To Find the Words Stats such as counts of vowels,consonants,stopwords,word-stats\n```python\n>>> from neattext import TextMetrics\n>>> docx = TextMetrics()\n>>> docx.text = "This is the mail example@gmail.com ,our WEBSITE is https://example.com 😊."\n>>> docx.count_vowels()\n>>> docx.count_consonants()\n>>> docx.count_stopwords()\n>>> docx.word_stats()\n```\n\n### Usage \n+ The MOP(method/function oriented way) Way\n\n```python\n>>> from neattext.functions import clean_text,extract_emails\n>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."\n>>> clean_text(t1,puncts=True,stopwords=True)\n>>>\'this mail examplegmailcom website httpsexamplecom\'\n>>> extract_emails(t1)\n>>> [\'example@gmail.com\']\n```\n\n+ Alternatively you can also use this approach\n```python\n>>> import neattext.functions as nfx \n>>> t1 = "This is the mail example@gmail.com ,our WEBSITE is https://example.com ."\n>>> nfx.clean_text(t1,puncts=True,stopwords=True)\n>>>\'this mail examplegmailcom website httpsexamplecom\'\n>>> nfx.extract_emails(t1)\n>>> [\'example@gmail.com\']\n```\n\n### Explainer\n+ Explain an emoji or unicode for emoji \n\t- emoji_explainer()\n\t- emojify()\n\t- unicode_2_emoji()\n\n\n```python\n>>> from neattext.explainer import emojify\n>>> emojify(\'Smiley\')\n>>> \'😃\'\n```\n\n```python\n>>> from neattext.explainer import emoji_explainer\n>>> emoji_explainer(\'😃\')\n>>> \'SMILING FACE WITH OPEN MOUTH\'\n```\n\n```python\n>>> from neattext.explainer import unicode_2_emoji\n>>> unicode_2_emoji(\'0x1f49b\')\n\t\'FLUSHED FACE\'\n```\n\n\n\n### Documentation\nPlease read the [documentation](https://github.com/Jcharis/neattext/wiki) for more information on what neattext does and how to use is for your needs.\n\n### More Features To Add\n+ basic nlp task\n+ currency normalizer\n\n#### Acknowledgements\n+ Inspired by packages like `clean-text` from Johannes Fillter and `textify` by JCharisTech\n\n\n#### NB\n+ Contributions Are Welcomed\n+ Notice a bug, please let us know.\n+ Thanks A lot\n\n\n#### By\n+ Jesse E.Agbe(JCharis)\n+ Jesus Saves @JCharisTech\n\n\n\n',
    'author': 'Jesse E.Agbe(JCharis)',
    'author_email': 'jcharistech@gmail.com',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/Jcharis/neattext',
    'packages': packages,
    'package_data': package_data,
    'python_requires': '>=3.3,<4.0',
}


setup(**setup_kwargs)
