Metadata-Version: 1.0
Name: sempler
Version: 0.1.1
Summary: Sample from general structural causal models (SCMs)
Home-page: http://pypi.python.org/pypi/sempler/
Author: Juan L Gamella
Author-email: juangamella@gmail.com
License: LICENSE.txt
Description: Sempler: Sampling from general structural causal models (SCMs)
        ==============================================================
        
        Visit https://github.com/juangamella/sempler for the full DOCs.
        
        Disclaimer: This package is still at its infancy and the API could be
        subject to change. Use at your own risk, but also know that feedback is
        very welcome :)
        
        Two main classes are provided:
        
        -  ``sempler.ANM``: to define and sample from general additive noise
           SCMs. Any assignment function is possible, as are the noise
           distributions.
        -  ``sempler.LGANM``: to define and sample from a linear model with
           Gaussian additive noise (i.e. a Gaussian Bayesian network).
        
        Both classes define a ``sample`` function which generates samples from
        the SCM, in the observational setting or under interventions.
        
        Additionally, ``sempler.LGANM`` allows sampling "in the population
        setting", i.e. by returning a symbolic gaussian distribution,
        ``sempler.NormalDistribution``, defined by its mean and covariance,
        which allows for manipulation such as conditioning, marginalization and
        regression in the population setting.
        
        ANMs - General Additive Noise Models
        ------------------------------------
        
        The ANM class allows to define and sample from general additive noise
        models. Any assignment function is possible, as are the noise
        distributions.
        
        ANMs are defined by providing the following arguments:
        
        1. ``A`` (``np.array``): a connectivity matrix, representing the
           underlying DAG
        
        2. ``assignments`` (``list``): the functional assignments, i.e. a list
           with a function per variable in the SCM, which takes as many
           arguments as parents (incoming edges) of the variable and returns a
           single (numerical) value. For variables which are source nodes in the
           graph, ``None`` is used.
        
        3. ``noise_distributions`` (``list``): the noise distributions of each
           variable, i.e. a list with a function per variable which can be
           called with a single (int) parameter n and returns n samples. Any
           distribution is possible (even arbitrary deterministic ones); see
           sempler.noise for common ones (uniform, gaussian, laplace, ...).
        
        The parameters of the ANM follow this *functional* approach to give you
        maximum flexibility. For more standard, linear SCMs with gaussian noise,
        it is easier to use the LGANM class.
        
        Sampling
        ~~~~~~~~
        
        Samples are generated by calling the ``sample`` function, with
        parameters:
        
        -  ``n`` (``int``): the number of samples
        -  ``do\_interventions`` (``dict``, optional): a dictionary containing
           the distribution functions (see ``sempler.noise``) from which to
           generate samples for each intervened variable
        -  ``shift\_interventions`` (``dict``, optional): a dictionary
           containing the distribution functions (see ``sempler.noise``) from
           which to generate the noise which is added to each intervened
           variable
        -  ``random\_state`` (``int``, optional): seed for the random state
           generator
        
        An example: creating an ANM with standard Gaussian noise and linear and
        non-linear assignments, and sampling from it.
        
        .. code:: python
        
            import sempler
            import sempler.noise as noise
            import numpy as np
        
            # Connectivity matrix
            A = np.array([[0, 0, 0, 1, 0],
              [0, 0, 1, 0, 0],
              [0, 0, 0, 1, 0],
              [0, 0, 0, 0, 1],
              [0, 0, 0, 0, 0]])
        
            # Noise distributions (see sempler.noise)
            noise_distributions = [noise.normal(0,1)] * 5
        
            # Variable assignments
            functions = [None, None, np.sin, lambda x: np.exp(x[:,0]) + 2*x[:,1], lambda x: 2*x]
        
            # All together
            anm = sempler.ANM(A, functions, noise_distributions)
        
            # Sampling from the observational setting
            samples = anm.sample(100)
        
            # Sampling under a shift intervention on variable 1
            samples = anm.sample(100, shift_interventions = {1: noise.normal(0,1)})
        
        LGANMs - Linear Gaussian Additive Noise Models
        ----------------------------------------------
        
        The ``sempler.LGANM`` class defines linear models with Gaussian additive
        noise (i.e. a Gaussian Bayesian networks).
        
        LGANMs are defined by providing the following arguments:
        
        -  ``W`` (``np.array``): weighted connectivity matrix representing the
           DAG
        -  ``variances`` (``np.array`` or ``tuple``): the variances of the noise
           terms. Can be either a vector of variances or a tuple indicating a
           range for their uniform sampling.
        -  ``means`` (``np.array`` or ``tuple``, optional): the means of the
           noise terms. Either a vector of means or a tuple indicating the range
           for uniform sampling. If left unspecified all means are set to zero.
        
        Sampling
        ~~~~~~~~
        
        Sampling is again done by calling the ``sample`` function, with
        parameters:
        
        -  ``n`` (``int``, optinal): the number of samples. Ignored if
           ``population`` is True, defaults to 100.
        -  ``population`` (``bool``, optional): If set to True, parameter ``n``
           is ignored and ``sample`` returns a ``sempler.NormalDistribution``
           object, which is a symbolic gaussian distribution (see below).
        -  ``do\_interventions`` (``dict``, optional): Dictionary with keys
           being the targets of the interventions and values being either a
           number (the variable is deterministically set to this value) or a
           tuple with the mean and variance of the normal distribution from
           which to sample the variable.
        -  ``shift\_interventions`` (``dict``, optional): Dictionary with keys
           being the targets of the interventions and values being either a
           number (which is then added to the variable) or a tuple with the mean
           and variance of the normal distribution from which to sample added
           noise.
        
        An example: creating a LGANM with noise means and variances sampled
        uniformly from [0,1], and sampling from it.
        
        .. code:: python
        
            import sempler
            import numpy as np
        
            # Connectivity matrix
            W = np.array([[0, 0, 0, 0.1, 0],
                          [0, 0, 2.1, 0, 0],
                          [0, 0, 0, 3.2, 0],
                          [0, 0, 0, 0, 5.0],
                          [0, 0, 0, 0, 0]])
        
            # All together
            lganm = sempler.LGANM(W, (0,1), (0,1))
        
            # Sampling from the observational setting
            samples = lganm.sample(100)
        
            # Sampling under a shift intervention on variable 1 with standard gaussian noise
            samples = lganm.sample(100, shift_interventions = {1: (0,1)})
        
            # Sampling the observational environment in the "population setting"
            distribution = lganm.sample(population = True)
        
        Symbolic Normal Distribution
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The ``sempler.NormalDistribution`` class allows for symbolic
        representation of a multivariate normal distribution, and is returned
        when calling ``LGANM.sample`` with ``population=True``.
        
        An example:
        
        .. code:: python
        
            import numpy as np
            import sempler
        
            # Define by mean and covariance
            mean = np.array([1,2,3])
            covariance = np.array([[1, 2, 4], [2, 6, 5], [4, 5, 1]])
            distribution = sempler.NormalDistribution(mean, covariance)
        
            # Marginal distribution of X0 and X1 (also a NormalDistribution object)
            marginal = distribution.marginal([0, 1])
        
            # Conditional distribution of X2 on X1=1 (also a NormalDistribution object)
            conditional = distribution.conditional(2,1,1)
        
            # Regress X0 on X1 and X2 in the population setting (no estimation errors)
            (coefs, intercept) = distribution.regress(0, [1,2])
        
        
Platform: UNKNOWN
