Metadata-Version: 2.1
Name: topometry
Version: 0.0.2.6
Summary: Fast, accurate learning of data topology with self-adaptive metrics, graphs and layouts
Home-page: https://github.com/davisidarta/topometry
Author: Davi Sidarta-Oliveira
Author-email: davisidarta@fcm.unicamp.br
License: UNKNOWN
Project-URL: Bug Tracker, https://github.com/davisidarta/topometry/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

[![Latest PyPI version](https://img.shields.io/pypi/v/topometry.svg)](https://pypi.org/project/topometry/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation Status](https://readthedocs.org/projects/topometry/badge/?version=latest)](https://topometry.readthedocs.io/en/latest/?badge=latest)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/DaviSidarta.svg?label=Follow%20%40davisidarta&style=social)](https://twitter.com/davisidarta)


## TopOMetry - Topologically Optimized geoMetry

**Table of Contents**

- [What is TopOMetry and why it is useful](#topological-metrics-basis-graphs-and-layouts)
- [Installation](#installation-and-dependencies)
- [Quick-start](#quick-start)
- [Tutorials](#tutorials-and-examples)

## Topological metrics, basis, graphs and layouts

TopOMetry is a high-level python library to explore data topology.
It allows learning topological metrics, dimensionality reduced basis and graphs from data, as well
to visualize them with different layout optimization algorithms. The main objective is to achieve approximations of
the [Laplace-Beltrami Operator](https://en.wikipedia.org/wiki/Laplace%E2%80%93Beltrami_operator), a natural way to describe
data geometry and its high-dimensional topology.

TopOMetry is designed to handle large-scale data matrices containing
extreme topological diversity, such as those
generated from [single-cell omics](https://en.wikipedia.org/wiki/Single_cell_sequencing), and can be used to perform topology-preserving
visualizations.

TopOMetry main class is the ``TopOGraph`` object. In a ``TopOGraph``, topological metrics are recovered with diffusion
harmonics or Continuous-k-Nearest-Neighbors, and used to obtain topological basis (multiscale Diffusion Maps and/or
diffuse or continuous versions of Laplacian Eigenmaps).

On top of these basis, new graphs can be learned using k-nearest-neighbors
graphs or additional topological operators. The learned metrics, basis and graphs are stored as different attributes of the
``TopOGraph`` object.

Finally, different visualizations of the learned topology can be optimized with ``pyMDE`` by solving a
[Minimum-Distortion Embedding](https://github.com/cvxgrp/pymde) problem. TopOMetry also implements an adapted, non-uniform
version of the seminal [Uniform Manifold Approximation and Projection (UMAP)](https://github.com/lmcinnes/umap)
for graph layout optimization (we call it MAP for short).

Alternatively, you can use TopOMetry to add topological information to your favorite workflow
by using its dimensionality reduced basis to compute k-nearest-neighbors instead of PCA.

## Installation and dependencies

TopOMetry requires some pre-existing libraries to power its scalability and flexibility. TopOMetry is implemented in python and builds complex, high-level models
inherited from [scikit-learn](https://github.com/scikit-learn/scikit-learn)
``BaseEstimator``, making it flexible and easy to apply and/or combine with different workflows on virtually any domain.


* [scikit-learn](https://github.com/scikit-learn/scikit-learn) - for general algorithms
* [ANNOY](https://github.com/spotify/annoy) - for optimized neighbor index search
* [nmslib](https://github.com/nmslib/nmslib) - for fast and accurate k-nearest-neighbors
* [kneed](https://github.com/arvkevi/kneed) - for finding nice cuttofs
* [pyMDE](https://github.com/cvxgrp/pymde) - for optimizing layouts

Prior to installing TopOMetry, make sure you have [cmake](https://cmake.org/), [scikit-build](https://scikit-build.readthedocs.io/en/latest/) and [setuptools](https://setuptools.readthedocs.io/en/latest/) available in your system. If using Linux:
```
sudo apt-get install cmake
pip3 install scikit-build setuptools
```
TopOMetry uses either NMSlib or HNSWlib really fast approximate nearest-neighborhood search across different
distance metrics. By default, it uses NMSlib. If your CPU supports advanced instructions, we recommend you install
nmslib separately for better performance:
```
pip3 install --no-binary :all: nmslib
```
Alternatively, you can use HNSWlib for k-nearest-neighbor search backend:
```
pip3 install hnswlib
```

Then, you can install TopOMetry and its other requirements with pip:
```
pip3 install numpy pandas scipy numba torch matplotlib scikit-learn kneed pymde
```
```
pip3 install topometry
```
Alternatevely, clone this repo and build from source:
```
git clone https://github.com/davisidarta/topometry
cd topometry
pip3 install .
```
## Quick-start

From a large data matrix ``data`` (np.ndarray, pd.DataFrame or sp.csr_matrix), you can set up a ``TopoGraph`` with default parameters:

```
import topo as tp

# Learn topological metrics and basis from data. The default is to use diffusion harmonics.
tg = tp.ml.TopOGraph()
tg = tg.fit(data)

```
Note: `topo.ml` is the high-level model module which contains the `TopOGraph` class.

After learning a topological basis, we can access topological metrics and basis in the ``TopOGraph`` object, and build different
topological graphs.

```
# Learn a topological graph. Again, the default is to use diffusion harmonics.
tgraph = tg.transform(data)
```

Then, it is possible to optimize the topological graph layout. The first option is to do so with
our adaptation of UMAP (MAP), which will minimize the cross-entropy between the topological basis
and its graph:

```
# Graph layout optimization with MAP
map_emb, aux = tp.MAP()
```

The second, albeit most interesting option is to use pyMDE to find a Minimum Distortion Embedding. TopOMetry implements some
custom MDE problems within the TopOGraph model :

```
# Set up MDE problem
mde = tg.MDE()
mde_emb = mde.embed()
```

## Tutorials and examples

Check the documentation at [ReadTheDocs](https://topometry.readthedocs.io/en/latest/)

## Contributing

Contributions are very welcome! If you're interested in adding a new feature, just let me know in the Issues section.

## License

[MIT License](https://github.com/davisidarta/topometry/blob/master/LICENSE)

Copyright (c) 2021 Davi Sidarta-Oliveira, davisidarta(at)gmail.com

 


