Metadata-Version: 2.1
Name: extcats
Version: 2.4.3
Summary: Tools to organize and query astronomical catalogs
Home-page: https://github.com/AmpelProject/extcats
License: MIT
Author: Matteo Giomi
Author-email: matteo.giomi@desy.de
Maintainer: Jakob van Santen
Maintainer-email: jakob.van.santen@desy.de
Requires-Python: >=3.8,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Provides-Extra: ingest
Requires-Dist: astropy (>=4.2,<6)
Requires-Dist: healpy (>=1.14.0,<2.0.0)
Requires-Dist: pandas (>=1.2,<2.0); extra == "ingest"
Requires-Dist: pymongo (>=3.7,<5.0)
Requires-Dist: tqdm (>=4.58.0,<5.0.0); extra == "ingest"
Project-URL: Repository, https://github.com/AmpelProject/extcats
Description-Content-Type: text/x-rst

*******
extcats
*******

.. image:: https://coveralls.io/repos/github/AmpelProject/extcats/badge.svg?branch=master
   :target: https://coveralls.io/github/AmpelProject/extcats?branch=master

tools to organize and query astronomical catalogs
#################################################


This modules provides classes to import astronomical catalogs into 
a **mongodb** database, and to efficiently query this database for 
positional matches.


Description:
############

The two main classes of this module are:

    - **CatalogPusher**: will process the raw files with the catalog sources and creates a database. See *insert_example* notebook for more details and usage instruction.
    
    - **CatalogQuery**: will perform queries on the catalogs. See *query_example* for examples and benchmarking.

Supported queries includes:

 - all the sources with a certain distance.
 - closest source at a given position.
 - binary search: return yes/no if anything is around the positon.
 - user defined queries.

The first item on the above list (cone search around target) provides the basic block for the other two types of positional-based queries. The code supports tree types of basic
cone-search queries, depending on the indexing strategy of the database.

    - using **HEALPix**: if the catalog sources have been assigned an HEALPix index (using `healpy <https://healpy.readthedocs.io/en/latest/#>`_).
     
    - using **GeoJSON** (or 'legacy coordinates'): if the catalog documents have the 
      position arranged in one of these two formats (`example 
      <https://docs.mongodb.com/manual/geospatial-queries/>`_), the query is based on
      the ``$geoWithin`` and ``$centerSphere`` mongo operators.
    
    - **raw**: this method uses the ``$where`` keyword to evaluate on each document a ``javascript``
      function computing the angular distance between each source and the target. This method 
      does not require any additional field to be added to the catalog but has, in general, 
      poorer performances with respect to the methods above.
      
All the core functions are defined in the ``catquery_utils`` module. In all cases the 
results of the queries will be return an ``astropy.table.Table`` objects.


Notes on indexing and query performances:
-----------------------------------------

The recommended method to index and query catalogs is based on the GeoJSON coorinate type.
See the *example_insert* notebook for how this can be implemented. 


Performant queries requires the database indexes to reside in the RAM. The indexes are 
efficiently compressed by mongodb default engine (WiredTiger), however there is little
redundant (and hence compressible) information in accurately measured coordinate pairs.
As a consequence, GeoJSON type indexes tends to require fair amount of free memory (of 
the order 40 MB for 2M entries). For large catalogs (and / or small RAM) indexing on 
coordinates might not be feasible. In this case, the HEALPix based indexing should 
be used. As (possibly) many sources shares the same HEALPix index, compression is 
more efficient into moderating RAM usage.

Installation:
^^^^^^^^^^^^^

The easiest way to install the Python library is with pip:
::
    
    pip install extcats

If you want do modify `extcats` itself, you'll need an editable installation.
After cloning this Git repository:
::
   
    poetry install

Usefull links:
--------------

 - `mongodb installation <https://docs.mongodb.com/manual/administration/install-community/>`_
 - `healpy <https://healpy.readthedocs.io/en/latest/#>`_
 - `astropy <http://www.astropy.org/>`_

