Metadata-Version: 2.1
Name: mumerge
Version: 0.0.4
Summary: A tool for combining bed regions from multiple bed files in a probabilistically prinipled manner.
Home-page: https://github.com/jtstanley/mumerge
Author: Jacob T. Stanley
Author-email: jacob.stanley@colorado.edu
License: BSD (3-clause)
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
License-File: LICENSE
License-File: AUTHORS.rst

=======
muMerge
=======

A tool for combining bed regions from multiple bed files in a probabilistically-prinipled manner.

Installation
============
In order to use ``mumerge`` it is recommended to install it within a virtual environment or package manager---e.g. ``venv`` or ``conda``.

Via ``pip``
-----------
The simplest way of installing ``mumerge`` within your virtual environment is using ``pip``. Be sure to use the appropriate version of Python if you have multiple versions installed. ``mumerge`` can then be installed with one of the following commands. 

From PyPI:
::

    $ python -m pip install mumerge


From GitHub:
::

    $ python -m pip install git+https://github.com/Dowell-Lab/mumerge

If successful, ``mumerge`` should now be callable from the command line.

In order to upgrade to the latest version of ``mumerge`` from a previous one, include ``--upgrade`` in other of the previous ``pip`` commands.

Via ``git clone``
-----------------
Alternatively, you can download ``mumerge`` and all supporting files by cloning the GitHub repository to your local machine using ``git``:
::

    $ git clone https://github.com/Dowell-Lab/mumerge.git

If you clone the repo, you may want to add directory ``mumerge/mumerge`` to your system ``PATH`` variable (this will depend on your platform/OS) so that you can run ``mumerge`` directly from the command-line.

Dependencies
============
* Python\>=3.6 https://www.python.org/downloads/
* NumPy https://numpy.org/
* bedtools https://bedtools.readthedocs.io/en/latest/content/installation.html

Bedtools
--------
muMerge relies on ``bedtools`` in order to group together those bed regions from the input bed files that will be combined by muMerge probabilistically. This grouping is done using the ``bedtools merge`` command. A ``bedtools`` binary is included as a part of the package, located at ``/mumerge/bin/bedtools``.

Running demo
============
To demonstrate the functionality of muMerge a simple example including bedfiles and an input file are included in the package.

Usage
=====

For general usage, used the help command:
::

    $ mumerge -h

This will return the general commands needed to run muMerge:
::

    usage: mumerge.py [-h] [-H] [-i INPUT] [-o OUTPUT] [-w WIDTH] [-m MERGED] [-r] [-v]

    Merges region calls (mu) generated by Tfit, or other peak calling functions across
    multiple samples and replicates.

    optional arguments:
      -h, --help            show this help message and exit
      -H, --HELP            Verbose help info about the input format.
      -i INPUT, --input INPUT
                            Input file (full path) containing bedfiles, sample ID's and
                            replicate grouping names (tab delimited). Each sample on separate
                            line. First line header, equal to '#file<TAB>sampid<TAB>group',
                            required. 'file' must be full path. 'sampid' can be any string.
                            'group' can be string or integer. See '-H' help flag for more
                            information.
      -o OUTPUT, --output OUTPUT
                            Output file basename (full path, sans extension). WARNING:
                            will overwrite any existing file)
      -w WIDTH, --width WIDTH
                            The ratio of a the sigma for the corresponding probabilty
                            distribution to the bed region (half-width) --- sigma:half-bed
                            (default: 1). The choice for this parameter will depend on the
                            data type as well as how bed regions were inferred from the
                            expression data.
      -m MERGED, --merged MERGED
                            Sorted bedfile (full path) containing the regions over which
                            to combine the sample bedfiles. If not specified, mumerge will
                            generate one directly from the sample bedfiles.
      -r, --remove_singletons
                            Remove calls not present in more than 1 sample
      -v, --verbose         Verbose printing during processing.

Input file
----------
The ``<INPUT>`` file is a tab delimited text file that contains paths to BED files to be merged along with sample names as condition/replicate information for each sample. In the example below, there are 4 samples with two treatment groups.
::

    #file   sampid  group
    /path/to/sample1.bed    sample1 control
    /path/to/sample2.bed    sample2 control
    /path/to/sample3.bed    sample3 treatment
    /path/to/sample4.bed    sample4 treatment

You can find this information using the ``-H`` flag---i.e. running ``mumerge -H``, which will return the following:
::

    Input file containing bedfiles, sample ID's, and replicate groupings. Input
    file (indicated by the '-i' flag) should be of the following (tab delimited)
    format:

    #file   sampid  group
    /full/file/path/filename1.bed   sampid1 A
    /full/file/path/filename2.bed   sampid2 B
    ...

    Header line indicated by '#' character must be included and fields must
    follow the same order as non-header lines. The order of subsequent lines does
    matter. 'group' identifiers should group files that are technical/biological
    replicates. Different experimental conditions should recieve different 'group'
    identifiers. The 'group' identifier can be of type 'int' or 'str'. If 'sampid'
    is not specified, then default sample ID's will be used.

Output files
------------
muMerge returns the merged regions in BED file format (``project_id_MUMERGE.bed``). Additionally, a log file (``project_id.log``) that details the summary of the run is also inlcuded along with intermediate files (``project_id_MISCALLS.bed`` and ``project_id_BEDTOOLS_MERGE.bed``).

Runtime
-------
The overall run time depends on the the number for input BED files and regions being merged. A test case, where 8 samples (~30,000 regions) with 6 condition groups were merged, took about 12 minutes on a MacBook Pro iCore i9 2.3 GHz running macOS v 10.14.6.

Cite
====
Please cite the following article if you use muMerge: `Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment <https://doi.org/10.1038/s42003-021-02153-7>`

BibTeX citation:

::

    @article{rubin2021transcription,
      title={Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment},
      author={Rubin, Jonathan D and Stanley, Jacob T and Sigauke, Rutendo F and Levandowski, Cecilia B and Maas, Zachary L and Westfall, Jessica and Taatjes, Dylan J and Dowell, Robin D},
      journal={Communications biology},
      volume={4},
      number={1},
      pages={1--15},
      year={2021},
      publisher={Nature Publishing Group}
    }


