Metadata-Version: 2.1
Name: anda
Version: 0.0.6
Summary: A package collecting various functions to work with ancient Mediterranean datasets (textual, spatial, etc.)
Home-page: https://github.com/sdam-au/anda
Author: Vojtech Kase
Author-email: vojtech.kase@gmail.com
License: UNKNOWN
Description: # anda
        
        [toc]
        
        ```bash
        pip install anda
        ```
        
        
        
        This is a Python package for collecting, manipulation and visualizing various ancient Mediterranean data. It focus on their temporal, textual and spatial aspects. It is structured into several gradually evolving submodules, namely `gr`, `imda`, `concs`, and `textnet`.
        
        ## anda.gr
        
        ```python
        from anda import gr
        ```
        
        This module is dedicated to preprocessing of ancient Greek textual data. It contains functions for lemmatization, posttagging and translation. It relies heavely on Morhesus Dictionary. 
        
        ### Lemmatization
        
        A minimal usage is to lemmatize individual word. You can  either ask for only the first lemma (`return_first_lemma()`) or for all possibilities (`return_all_unique_lemmata()`.  In most cases , the outcome is the same:
        
        ```python
        gr.return_first_lemma("ἐπιστήμην")
        > 'ἐπιστήμη'
        
        gr.return_all_unique_lemmata("ἐπιστήμην")
        > 'ἐπιστήμη'
        ```
        
        Above these are functions `lemmatize_string()` and `gr.get_lemmatized_sentences()`. Both work with string of any length. The first returns a list of lemmata. The second returns a list of lemmatized sentences.
        
        ```python
        string = "Πρότασις μὲν οὖν ἐστὶ λόγος καταφατικὸς ἢ ἀποφατικὸς τινὸς κατά τινος. Οὗτος δὲ ἢ καθόλου ἢ ἐν μέρει ἢ ἀδιόριστος. Λέγω δὲ καθόλου μὲν τὸ παντὶ ἢ μηδενὶ ὑπάρχειν, ἐν μέρει δὲ τὸ τινὶ ἢ μὴ τινὶ ἢ μὴ παντὶ ὑπάρχειν, ἀδιόριστον δὲ τὸ ὑπάρχειν ἢ μὴ ὑπάρχειν ἄνευ τοῦ καθόλου, ἢ κατὰ μέρος, οἷον τὸ τῶν ἐναντίων εἶναι τὴν αὐτὴν ἐπιστήμην ἢ τὸ τὴν ἡδονὴν μὴ εἶναι ἀγαθόν."
        
        gr.lemmatize_string(string)
        > ['πρότασις', 'λόγος', 'καταφατικός', 'ἀποφατικός', 'καθόλου', 'μέρος', 'ἀδιόριστος', 'λέγω', 'καθόλου', 'πᾶς', 'μηδείς', 'ὑπάρχω', 'μέρος', 'πᾶς', 'ὑπάρχω', 'ἀδιόριστον', 'ὑπάρχω', 'ὑπάρχω', 'ἄνευ', 'καθόλου', 'μέρος', 'οἷος', 'ἐναντίος', 'αὐτην', 'ἐπιστήμη', 'ἡδονην', 'ἀγαθός']
        
        gr.get_lemmatized_sentences(string)
        > [['πρότασις', 'λόγος', 'καταφατικός', 'ἀποφατικός'], ['καθόλου', 'μέρος', 'ἀδιόριστος'], ['λέγω', 'καθόλου', 'πᾶς', 'μηδείς', 'ὑπάρχω', 'μέρος', 'πᾶς', 'ὑπάρχω', 'ἀδιόριστον', 'ὑπάρχω', 'ὑπάρχω', 'ἄνευ', 'καθόλου', 'μέρος', 'οἷος', 'ἐναντίος', 'αὐτην', 'ἐπιστήμη', 'ἡδονην', 'ἀγαθός']]
        ```
        
        All lemmatization functions can be further parametrized by several arguments
        
        * `all_lemmata=False` : 
        * `filter_by_postag=["n","a","v"]`: returns only nouns ("n"), adjectives ("a") and verbs ("v")
        * `involve_unknown=True`, if `False`, it returns only words found in the dictionary
        
        Thus, you can run:
        
        ```python
        lemmatized_sentences = gr.get_lemmatized_sentences(string, all_lemmata=False, filter_by_postag=["n","a","v"], involve_unknown=False)
        print(lemmatized_sentences)
        > [['λόγος'], ['μέρος'], ['πᾶς', 'μηδείς', 'ὑπάρχω', 'μέρος', 'πᾶς', 'ὑπάρχω', 'ὑπάρχω', 'ὑπάρχω', 'ἄνω/ἀνίημι', 'μέρος', 'οἷος', 'ἐναντίος', 'ἐπιστήμη', 'ἀγαθός']]
        ```
        
        
        
        (1) `get_lemmatized_sentences(string, all_lemmata=False, filter_by_postag=None, involve_unknown=False)`:  it receives a raw Greek text of any kind and extent as its input  Such input is  processed by a series of subsequent functions embedded within each other, which might be also used independently
        
        (1) `get_sentences()` splits the string into sentences by common sentence separators.
        
        (2) `lemmatize_string(sentence)`  first calls `tokenize_string()`, which makes a basic cleaning and stopwords filtering for the sentence, and returns a list of words. Subsequently, each word from the tokenized sentence is sent either to `return_first_lemma()` or to `return_all_unique_lemmata()`, on the basis of the value of the parameter `all_lemmata=` (set to `False` by default). 
        
        (4) `return_all_unique_lemmata()`goes to the `morpheus_dict` values and returns all unique lemmata.
        
        (5) Parameter `filter_by_postag=` (default `None`) enables to sub-select  chosen word types from the tokens, on the basis of first character in the tag "p" . Thus, to choose only  nouns, adjectives, and verbs, you can set  `filter_by_postag=["n", "a", "v"].`
        
        ### Translation
        
        Next to the lemmatization, there is also a series of functions for translations, like `return_all_unique_translations(word, filter_by_postag=None, involve_unknown=False)`, useful for any wordform, and `lemma_translator(word)`, where we already have a lemma.
        
        ```python
        gr.return_all_unique_translations("ὑπάρχειν", filter_by_postag=None, involve_unknown=False)
        > 'to begin, make a beginning'
        
        gr.lemma_translator("λόγος")
        > 'the word'
        ```
        
        ### Morphological analysis
        
        You can also do a morphological analysis of a string
        
        ```python
        gr.morphological_analysis(string)[1:4]
        > [{'i': '564347',
          'f': 'μέν',
          'b': 'μεν',
          'l': 'μέν',
          'e': 'μεν',
          'p': 'g--------',
          'd': '20753',
          's': 'on the one hand, on the other hand',
          'a': None},
         {'i': '642363',
          'f': 'οὖν',
          'b': 'ουν',
          'l': 'οὖν',
          'e': 'ουν',
          'p': 'g--------',
          'd': '23870',
          's': 'really, at all events',
          'a': None},
         {'i': '264221',
          'f': 'ἐστί',
          'b': 'εστι',
          'l': 'εἰμί',
          'e': 'ειμι',
          'p': 'v3spia---',
          'd': '9722',
          's': 'I have',
          'a': None}]
        ```
        
        ## imda
        
        This module will serve for importing various ancient Mediterranean resources. Most of them will be imported directly from open third-party online resources. However, some of them have been preprocessed as part of the SDAM project.
        
        The ideal is that it will work like this:
        
        ```
        imda.list_datasets()
        >>> ['roman_provinces_117', 'EDH', 'roman_cities_hanson', 'orbis_network']
        ```
        
        And:
        
        ```python
        rp = imda.import_dataset("roman_provinces_117", "gdf")
        type(rp)
        >>>geopandas.geodataframe
        ```
        
        
        
        ## concs
        
        This module contains functions for working
        
        ## textnet
        
        This module contains functions for generating, analyzing and visualizing word co-occurrence networks. It has been designed especially for working with textual data in ancient Greek. 
        
        ## Versions history
        
        * 0.0.5 - greek dictionaries included within the package
        * 0.0.5 - experimenting with data inclusion
        * 0.0.4 - docs
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Classifier: Operating System :: OS Independent
Requires-Python: >=3.4
Description-Content-Type: text/markdown
