Metadata-Version: 1.0
Name: deep_rl
Version: 0.3.6
Summary: UNKNOWN
Home-page: UNKNOWN
Author: Jonáš Kulhánek
Author-email: jonas.kulhanek@live.com
License: MIT License
Description: # Deep RL PyTorch
        [![https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg](https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg)](https://singularity-hub.org/collections/2581)
        
        This repo contains implementation of popular Deep RL algorithms. Furthermore it contains unified interface for training and evaluation with unified model saving and visualization. It can be used as a good starting point when implementing new RL algorithm in PyTorch.
        
        ## Getting started
        If you want to base your algorithm on this repository, start by installing it as a package
        ```
        pip install git+https://github.com/jkulhanek/deep-rl-pytorch.git
        ```
        
        If you want to run attached experiments yourself, feel free to clone this repository.
        ```
        git clone https://github.com/jkulhanek/deep-rl-pytorch.git
        ```
        
        All dependencies are prepared in a docker container. If you have nvidia-docker enabled, you can use this image. To pull and start the image just run:
        
        ```
        docker run --runtime=nvidia --net=host -it kulhanek/deep-rl-pytorch:latest bash
        ```
        
        From there, you can either clone your own repository containing your experiments or clone this one.
        
        ## Concepts
        All algorithms are implemented as base classes. In your experiment your need to subclass from those base classes. The `deep_rl.core.AbstractTrainer` class is used for all trainers and all algorithms inherit this class. Each trainer can be wrapped in several wrappers (classes extending `deep_rl.core.AbstractWrapper`). Those wrappers are used for saving, logging, terminating the experiment and etc. All experiments should be registered using `@deep_rl.register_trainer` decorator. This decorator than wraps the trainer with default wrappers. This can be controlled by passing arguments to the decorator. All registered trainers (experiments) can be run by calling `deep_rl.make_trainer(<<name>>).run()`.
        
        ## Implemented algorithms
        ### A2C
        A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) [2] which according to OpenAI [1] gives equal performance. It is however more efficient for GPU utilization.
        
        Start your experiment by subclassing `deep_rl.a2c.A2CTrainer`.
        Several models are included in `deep_rl.a2c.model`. You may want to use at least some helper modules contained in this package when designing your own experiment.
        
        In most of the models, initialization is done according to [3].
        
        ### Asynchronous Advantage Actor Critic (A3C) [2]
        This implementation uses multiprocessing. It comes with two optimizers - RMSprop and Adam.
        
        ### Actor Critic using Kronecker-Factored Trust Region (ACKTR) [1]
        This is an improvement of A2C described in [1].
        
        ## Experiments
        > Comming soon
        
        ## Requirements
        Those packages must be installed before using the framework for your own algorithm:
        - OpenAI baselines (can be installed by running `pip install git+https://github.com/openai/baselines.git`)
        - PyTorch
        - Visdom (`pip install visdom`)
        - Gym (`pip install gym`)
        - MatPlotLib
        
        Those packages must be installed prior running experiments:
        - DeepMind Lab
        - Gym[atari]
        
        ## Sources
        This repository is based on work of several other authors. We would like to express our thanks.
        - https://github.com/openai/baselines/tree/master/baselines
        - https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/tree/master/a2c_ppo_acktr
        - https://github.com/miyosuda/unreal
        - https://github.com/openai/gym
        
        ## References
        [1] Wu, Y., Mansimov, E., Grosse, R.B., Liao, S. and Ba, J., 2017. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in neural information processing systems (pp. 5279-5288).
        
        [2] Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and Kavukcuoglu, K., 2016, June. Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928-1937).
        
        [3] Saxe, A.M., McClelland, J.L. and Ganguli, S., 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120.
        
        
Platform: UNKNOWN
