Metadata-Version: 2.1
Name: swh.indexer
Version: 0.4.2
Summary: Software Heritage Content Indexer
Home-page: https://forge.softwareheritage.org/diffusion/78/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-indexer
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-indexer/
Description: swh-indexer
        ============
        
        Tools to compute multiple indexes on SWH's raw contents:
        - content:
          - mimetype
          - ctags
          - language
          - fossology-license
          - metadata
        - revision:
          - metadata
        
        An indexer is in charge of:
        - looking up objects
        - extracting information from those objects
        - store those information in the swh-indexer db
        
        There are multiple indexers working on different object types:
          - content indexer: works with content sha1 hashes
          - revision indexer: works with revision sha1 hashes
          - origin indexer: works with origin identifiers
        
        Indexation procedure:
        - receive batch of ids
        - retrieve the associated data depending on object type
        - compute for that object some index
        - store the result to swh's storage
        
        Current content indexers:
        
        - mimetype (queue swh_indexer_content_mimetype): detect the encoding
          and mimetype
        
        - language (queue swh_indexer_content_language): detect the
          programming language
        
        - ctags (queue swh_indexer_content_ctags): compute tags information
        
        - fossology-license (queue swh_indexer_fossology_license): compute the
          license
        
        - metadata: translate file into translated_metadata dict
        
        Current revision indexers:
        
        - metadata: detects files containing metadata and retrieves translated_metadata
          in content_metadata table in storage or run content indexer to translate
          files.
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
