# -*- coding: utf-8 -*-
from setuptools import setup

package_dir = \
{'': 'src'}

packages = \
['cogclassifier', 'cogclassifier.script']

package_data = \
{'': ['*'], 'cogclassifier': ['bin/Darwin/*', 'bin/Linux/*', 'bin/Windows/*']}

install_requires = \
['altair>=4.2.0,<5.0.0', 'pandas>=1.4.1,<2.0.0', 'requests>=2.27.1,<3.0.0']

entry_points = \
{'console_scripts': ['COGclassifier = cogclassifier.cogclassifier:main',
                     'plot_cog_classifier_barchart = '
                     'cogclassifier.script.plot_cog_classifier_barchart:main',
                     'plot_cog_classifier_piechart = '
                     'cogclassifier.script.plot_cog_classifier_piechart:main']}

setup_kwargs = {
    'name': 'cogclassifier',
    'version': '1.0.4',
    'description': 'A tool for classifying prokaryote protein sequences into COG functional category',
    'long_description': '# COGclassifier\n\n![Python3](https://img.shields.io/badge/Language-Python3-steelblue)\n![OS](https://img.shields.io/badge/OS-Windows_|_Mac_|_Linux-steelblue)\n![License](https://img.shields.io/badge/License-MIT-steelblue)\n[![Latest PyPI version](https://img.shields.io/pypi/v/cogclassifier.svg)](https://pypi.python.org/pypi/cogclassifier)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/cogclassifier.svg?color=green)](https://anaconda.org/bioconda/cogclassifier)  \n![CI workflow](https://github.com/moshi4/COGclassifier/actions/workflows/ci.yml/badge.svg)\n[![codecov](https://codecov.io/gh/moshi4/COGclassifier/branch/main/graph/badge.svg?token=F7O5HA2J3G)](https://codecov.io/gh/moshi4/COGclassifier)\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Installation](#installation)\n- [Workflow](#workflow)\n- [Usage](#usage)\n- [Output Contents](#output-contents)\n- [Customize Charts](#customize-charts)\n\n## Overview\n\nCOG(Cluster of Orthologous Genes) is a database that plays an important role in the annotation, classification, and analysis of microbial gene function.\nFunctional annotation, classification, and analysis of each gene in newly sequenced bacterial genomes using the COG database is a common task.\nHowever, there was no COG functional classification command line software that is easy-to-use and capable of producing publication-ready figures.\nTherefore, I developed COGclassifier to fill this need.\nCOGclassifier can automatically perform the processes from searching query sequences into the COG database, to annotation and classification of gene functions, to generation of publication-ready figures (See figure below).\n\n![ecoli_barchart_fig](https://raw.githubusercontent.com/moshi4/COGclassifier/main/images/ecoli/classifier_count_barchart.png)  \nFig.1: Barchart of COG funcitional category classification result for E.coli\n\n![ecoli_piechart_sort_fig](https://raw.githubusercontent.com/moshi4/COGclassifier/main/images/ecoli/classifier_count_piechart_sort.png)  \nFig.2: Piechart of COG funcitional category classification result for E.coli\n\n## Installation\n\nCOGclassifier is implemented in Python3.\nRPS-BLAST(v2.13.0) is bundled in COGclassifier ([src/cogclassifier/bin](https://github.com/moshi4/COGclassifier/tree/main/src/cogclassifier/bin)).  \n\n**Install bioconda package:**\n\n    conda install -c bioconda -c conda-forge cogclassifier\n\n**Install PyPI stable package:**\n\n    pip install cogclassifier\n\n**Install latest development package:**\n\n    pip install git+https://github.com/moshi4/COGclassifier.git\n\n## Workflow\n\nDescription of COGclassifier\'s automated workflow.\n\n### 1. Download COG & CDD resources\n\nDownload 4 required COG & CDD files from FTP site.\n\n- `fun-20.tab` (<https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/fun-20.tab>)  \n    Descriptions of COG functional categories.  \n\n    <details>\n    <summary>Show more information</summary>\n\n    > Tab-delimited plain text file with descriptions of COG functional categories  \n    > Columns:  \n    >  \n    > 1\\. Functional category ID (one letter)  \n    > 2\\. Hexadecimal RGB color associated with the functional category  \n    > 3\\. Functional category description  \n    > Each line corresponds to one functional category. The order of the categories is meaningful (reflects a hierarchy of functions; determines the order of display)  \n    >\n    > (From <https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/Readme.2020-11-25.txt>)\n\n    </details>\n\n- `cog-20.def.tab` (<https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/cog-20.def.tab>)  \n    COG descriptions such as \'COG ID\', \'COG functional category\', \'COG name\', etc...  \n\n    <details>\n    <summary>Show more information</summary>\n\n    > Tab-delimited plain text file with COG descriptions  \n    > Columns:  \n    >  \n    > 1\\. COG ID  \n    > 2\\. COG functional category (could include multiple letters in the order of importance)  \n    > 3\\. COG name  \n    > 4\\. Gene associated with the COG (optional)  \n    > 5\\. Functional pathway associated with the COG (optional)  \n    > 6\\. PubMed ID, associated with the COG (multiple entries are semicolon-separated; optional)  \n    > 7\\. PDB ID of the structure associated with the COG (multiple entries are semicolon-separated; optional)  \n    > Each line corresponds to one COG. The order of the COGs is arbitrary (displayed in the lexicographic order)  \n    >\n    > (From <https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/Readme.2020-11-25.txt>)\n\n    </details>\n\n- `cddid.tbl.gz` (<https://ftp.ncbi.nih.gov/pub/mmdb/cdd/>)  \n    Summary information about the CD(Conserved Domain) model.  \n\n    <details>\n    <summary>Show more information</summary>\n\n    >"cddid.tbl.gz" contains summary information about the CD models in this\n    >distribution, which are part of the default "cdd" search database and are\n    >indexed in NCBI\'s Entrez database. This is a tab-delimited text file, with a\n    >single row per CD model and the following columns:  \n    >  \n    >PSSM-Id (unique numerical identifier)  \n    >CD accession (starting with \'cd\', \'pfam\', \'smart\', \'COG\', \'PRK\' or "CHL\')  \n    >CD "short name"  \n    >CD description  \n    >PSSM-Length (number of columns, the size of the search model)  \n    >\n    > (From <https://ftp.ncbi.nih.gov/pub/mmdb/cdd/README>)\n\n    </details>\n\n- `Cog_LE.tar.gz` (<https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/>)  \n    COG database, a part of CDD(Conserved Domain Database), for RPS-BLAST search.  \n\n### 2. RPS-BLAST search against COG database\n\nRun query sequences RPS-BLAST against COG database [Default: E-value = 1e-2].\nBest-hit (=lowest e-value) blast results are extracted and used in next functional classification step.\n\n### 3. Classify query sequences into COG functional category\n\nFrom best-hit results, extract relationship between query sequences and COG functional category as described below.\n\n1. Best-hit results -> CDD ID\n2. CDD ID -> COG ID (From `cddid.tbl`)\n3. COG ID -> COG Functional Category Letter (From `cog-20.def.tab`)\n4. COG Functional Category Letter -> COG Functional Category Definition (From `fun-20.tab`)\n\n> :warning:\n> If functional category with multiple letters exists, first letter is treated as functional category\n> (e.g. COG4862 has multiple letters `KTN`. A letter `K` is treated as functional category).\n\nUsing the above information, the number of query sequences classified into each COG functional category is calculated and\nfunctional annotation and classification results are output.\n\n## Usage\n\n### Basic Command\n\n    COGclassifier -i [query protein fasta file] -o [output directory]\n\n### Options\n\n    -h, --help            show this help message and exit\n    -i , --infile         Input query protein fasta file\n    -o , --outdir         Output directory\n    -d , --download_dir   Download COG & CDD resources directory (Default: \'~/.cache/cogclassifier\')\n    -t , --thread_num     RPS-BLAST num_thread parameter (Default: MaxThread - 1)\n    -e , --evalue         RPS-BLAST e-value parameter (Default: 0.01)\n    -v, --version         Print version information\n\n### Example Command\n\nClassify E.coli protein sequences into COG functional category ([ecoli.faa](https://github.com/moshi4/COGclassifier/blob/main/example/input/ecoli.faa?raw=true)):  \n\n    COGclassifier -i ./example/input/ecoli.faa -o ./ecoli_cog_classifier\n\n### Example API\n\n```python\nfrom cogclassifier import cogclassifier\n\nquery_fasta_file = "./example/input/ecoli.faa"\noutdir = "./ecoli_cog_classifier"\ncogclassifier.run(query_fasta_file, outdir)\n```\n\n## Output Contents\n\nCOGclassifier outputs 4 result text files, 3 html format chart files.  \n\n- **`rpsblast_result.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/mycoplasma_cog_classifier/rpsblast_result.tsv))  \n  RPS-BLAST against COG database result (format = `outfmt 6`).  \n\n- **`classifier_result.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/mycoplasma_cog_classifier/classifier_result.tsv))  \n  Query sequences classified into COG functional category result.  \n  This file contains all classified query sequences and associated COG information.  \n\n    <details>\n    <summary>Table of detailed tsv format information (9 columns)</summary>\n\n    | Columns          | Contents                               | Example Value                       |\n    | ---------------- | -------------------------------------- | ----------------------------------- |\n    | QUERY_ID         | Query ID                               | NP_414544.1                         |\n    | COG_ID           | COG ID of RPS-BLAST top hit result     | COG0083                             |\n    | CDD_ID           | CDD ID of RPS-BLAST top hit result     | 223161                              |\n    | EVALUE           | RPS-BLAST top hit evalue               | 2.5e-150                            |\n    | IDENTITY         | RPS-BLAST top hit identity             | 45.806                              |\n    | GENE_NAME        | Abbreviated gene name                  | ThrB                                |\n    | COG_NAME         | COG gene name                          | Homoserine kinase                   |\n    | COG_LETTER       | Letter of COG functional category      | E                                   |\n    | COG_DESCRIPTION  | Description of COG functional category | Amino acid transport and metabolism |\n\n    </details>\n\n- **`classifier_count.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/ecoli_cog_classifier/classifier_count.tsv))  \n  Count classified sequences per COG functional category result.  \n\n    <details>\n    <summary>Table of detailed tsv format information (4 columns)</summary>\n\n    | Columns     | Contents                                | Example Value                                   |\n    | ------------| --------------------------------------- | ----------------------------------------------- |\n    | LETTER      | Letter of COG functional category       | J                                               |\n    | COUNT       | Count of COG classified sequence        | 259                                             |\n    | COLOR       | Symbol color of COG functional category | #FCCCFC                                         |\n    | DESCRIPTION | Description of COG functional category  | Translation, ribosomal structure and biogenesis |\n\n    </details>\n\n- **`classifier_stats.txt`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/ecoli_cog_classifier/classifier_stats.txt))  \n  The percentages of the classified sequences are described as example below.  \n  > 86.35% (3575 / 4140) sequences classified into COG functional category.\n\n- **`classifier_count_barchart.html`**  \n  Barchart of COG funcitional category classification result.  \n  COGclassifier uses [`Altair`](https://altair-viz.github.io/) visualization library for plotting html format charts.  \n  In web browser, Altair charts interactively display tooltips and can export image as PNG or SVG format.\n\n  ![classifier_count_barchart](https://raw.githubusercontent.com/moshi4/COGclassifier/main/images/vega-lite_functionality.png)\n\n- **`classifier_count_piechart.html`**  \n  Piechart of COG funcitional category classification result.  \n  Functional category with percentages less than 1% don\'t display letter on piechart.  \n\n  ![classifier_count_piechart](https://raw.githubusercontent.com/moshi4/COGclassifier/main/images/ecoli/classifier_count_piechart.png)\n\n- **`classifier_count_piechart_sort.html`**  \n  Piechart with descending sort by count.  \n  Functional category with percentages less than 1% don\'t display letter on piechart.  \n\n  ![classifier_count_piechart](https://raw.githubusercontent.com/moshi4/COGclassifier/main/images/ecoli/classifier_count_piechart_sort.png)\n\n## Customize Charts\n\nCOGclassifier also provides barchart & piechart plotting scripts to customize charts appearence.\nEach script can plot the following feature charts from `classifier_count.tsv`. See wiki for details.\n\n- Features of **plot_cog_classifier_barchart** script ([wiki](https://github.com/moshi4/COGclassifier/wiki/Customize-Barchart))  \n  - Adjust figure width, height, barwidth\n  - Plot charts with percentage style instead of count number style\n  - Fix maximum value of Y-axis  \n  - Descending sort by count number or not  \n  - Plot charts from user-customized \'classifier_count.tsv\'\n\n- Features of **plot_cog_classifier_piechart** script ([wiki](https://github.com/moshi4/COGclassifier/wiki/Customize-Piechart))  \n  - Adjust figure width, height\n  - Descending sort by count number or not\n  - Show letter on piechart or not\n  - Plot charts from user-customized \'classifier_count.tsv\'\n',
    'author': 'moshi',
    'author_email': None,
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/moshi4/COGclassifier/',
    'package_dir': package_dir,
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'entry_points': entry_points,
    'python_requires': '>=3.8,<4.0',
}


setup(**setup_kwargs)
