40 Matching Annotations
  1. Mar 2023
    1. asks for the Minecraft domain.

      They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.

  2. Feb 2023
    1. Definition 3.2 (simple reward machine).

      The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.

    2. e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs

      So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.

    3. However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.

      Fascinating idea, why not? Why are we hiding the reward from the agent really?

    4. Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning

      [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"

    1. Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning

      [Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"

  3. Dec 2022
    1. Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"

      A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!

    1. "Decision Transformer: Reinforcement Learning via Sequence Modeling" (Chen, NeurIPS, 2021)

      Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.

  4. Sep 2022
    1. We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
    1. AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.

      S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.

  5. Jul 2022
  6. Jun 2022
  7. May 2022
    1. Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.

    1. Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.

  8. Mar 2022
  9. Jan 2022
  10. Jul 2021
  11. Jun 2021
  12. Mar 2021
    1. Using chemicals to improve our economy of attention and become emotionally "fitter" is an option that penetrated public consciousness some time ago.

      Same is true of reinforcement learning algorithms.

  13. Sep 2020
  14. Jul 2020
  15. May 2020
  16. Apr 2020
  17. Mar 2019
  18. Feb 2019
    1. We present MILABOT: a deep reinforcement learning chatbot developed by theMontreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prizecompetition. MILABOT is capable of conversing with humans on popular smalltalk topics through both speech and text. The system consists of an ensemble ofnatural language generation and retrieval models, including template-based models,bag-of-words models, sequence-to-sequence neural network and latent variableneural network models. By applying reinforcement learning to crowdsourced dataand real-world user interactions, the system has been trained to select an appropriateresponse from the models in its ensemble. The system has been evaluated throughA/B testing with real-world users, where it performed significantly better thanmany competing systems. Due to its machine learning architecture, the system islikely to improve with additional data
  19. Jul 2016
    1. Think of all the hard work and the sweat you put in to the things that your proudest of.

      Always feels good to say, "I worked out today!"