- Mar 2023
-
arxiv.org arxiv.org
-
asks for the Minecraft domain.
They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
-
- Feb 2023
-
arxiv.org arxiv.org
-
Definition 3.2 (simple reward machine).
The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
-
e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
-
However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
Fascinating idea, why not? Why are we hiding the reward from the agent really?
-
Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
[Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
-
-
proceedings.mlr.press proceedings.mlr.press
-
Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
[Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
-
-
arxiv.org arxiv.org
-
[Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
Improving the 2020 Agent57 performance to be more efficeint.
-
- Dec 2022
-
arxiv.org arxiv.org
-
Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
-
-
www.fandm.edu www.fandm.edu
-
Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
-
-
arxiv.org arxiv.org
-
[Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
-
-
arxiv.org arxiv.org
-
"Decision Transformer: Reinforcement Learning via Sequence Modeling" (Chen, NeurIPS, 2021)
Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
-
- Sep 2022
-
arxiv.org arxiv.org
-
We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
-
-
arxiv.org arxiv.org
-
AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
Tags
Annotators
URL
-
- Jul 2022
-
ieeexplore.ieee.org ieeexplore.ieee.org
-
A recent overview of RL methods used for autonomous driving.
-
- Jun 2022
-
assets.pubpub.org assets.pubpub.org
-
Discussion on
Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
-
- May 2022
-
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
-
Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
-
-
arxiv.org arxiv.org
-
Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
-
-
arxiv.org arxiv.org
-
Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
-
-
arxiv.org arxiv.org
-
Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
-
- Mar 2022
-
arxiv.org arxiv.org
-
The paper that introduced the MineRL challenge dataset.
Tags
Annotators
URL
-
- Jan 2022
-
www.grandin.com www.grandin.com
-
reinforcement
"Reinforcement means to the act of reinforcing."
-
- Jul 2021
-
psyarxiv.com psyarxiv.com
-
Palminteri, S. (2021). Choice-confirmation bias and gradual perseveration in human reinforcement learning [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/dpqj6
-
- Jun 2021
-
-
Chadi, M.-A., & Mousannif, H. (2021). Reinforcement Learning Based Decision Support Tool For Epidemic Control [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/tcr8s
-
- Mar 2021
-
www.opendemocracy.net www.opendemocracy.net
-
Using chemicals to improve our economy of attention and become emotionally "fitter" is an option that penetrated public consciousness some time ago.
Same is true of reinforcement learning algorithms.
-
- Sep 2020
-
-
Ozaita, J., Baronchelli, A., & Sánchez, A. (2020). The emergence of segregation: From observable markers to group specific norms. ArXiv:2009.05354 [Physics, q-Bio]. http://arxiv.org/abs/2009.05354
-
-
journals.sagepub.com journals.sagepub.com
-
Ludwig, V. U., Brown, K. W., & Brewer, J. A. (2020). Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? Perspectives on Psychological Science, 1745691620931460. https://doi.org/10.1177/1745691620931460
-
- Jul 2020
-
-
Harvey, A., Armstrong, C. C., Callaway, C. A., Gumport, N. B., & Gasperetti, C. E. (2020). COVID-19 Prevention via the Science of Habit Formation [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/57jyg
-
- May 2020
-
-
Radulescu, A., Holmes, K., & Niv, Y. (2020). On the convergent validity of risk sensitivity measures [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/qdhx4
-
-
-
psyarxiv.com psyarxiv.com
-
Hertz, U. (2020). Cognitive learning processes account for asymmetries in adaptations to new social norms [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/7thku
-
-
-
Liu, L., Wang, X., Tang, S., & Zheng, Z. (2020). Complex social contagion induces bistability on multiplex networks. ArXiv:2005.00664 [Physics]. http://arxiv.org/abs/2005.00664
-
- Apr 2020
-
-
Ting, C., Palminteri, S., Lebreton, M., & Engelmann, J. B. (2020, March 25). The elusive effects of incidental anxiety on reinforcement-learning. https://doi.org/10.31234/osf.io/7d4tc MLA
-
- Mar 2019
-
cjc.ict.ac.cn cjc.ict.ac.cn
-
深度强化学习综述
-
-
cjc.ict.ac.cn cjc.ict.ac.cn
-
深度强化学习综述
-
-
github.com github.com
-
reinforcement-learning code and paper tutorials
-
- Feb 2019
-
gitee.com gitee.com
-
We present MILABOT: a deep reinforcement learning chatbot developed by theMontreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prizecompetition. MILABOT is capable of conversing with humans on popular smalltalk topics through both speech and text. The system consists of an ensemble ofnatural language generation and retrieval models, including template-based models,bag-of-words models, sequence-to-sequence neural network and latent variableneural network models. By applying reinforcement learning to crowdsourced dataand real-world user interactions, the system has been trained to select an appropriateresponse from the models in its ensemble. The system has been evaluated throughA/B testing with real-world users, where it performed significantly better thanmany competing systems. Due to its machine learning architecture, the system islikely to improve with additional data
-
- Jul 2016
-
thesocialwrite.com thesocialwrite.com
-
Think of all the hard work and the sweat you put in to the things that your proudest of.
Always feels good to say, "I worked out today!"
-