asks for the Minecraft domain.
They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
asks for the Minecraft domain.
They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
Definition 3.2 (simple reward machine).
The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
Fascinating idea, why not? Why are we hiding the reward from the agent really?
U is a finite set of states,
Apply a set of logical rules to the state space to obtain a finite set of states.
state-reward function,
reward is a constant number assigned to each set of states
Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
[Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
[Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
Bell’s theorem is aboutcorrelations (joint probabilities) of stochastic real variables and therefore doesnot apply to quantum theory, which neither describes stochastic motion nor usesreal-valued observables
strong statement, what do people think about this? is it accepted by anyone or dismissed?
"Finding Optimal Solutions to Rubik's Cub e Using Pattern Databases" by Richard E. Korf, AAAI 1997.
The famous "Korf Algorithm" for finding the optimal solution to any Rubik's Cube state.
make up facts less often
but not "never"
On prompts submitted by our customers to the API,[1
really? so that's how they make money.
Question: what kind of bias does this introduce into the model?
Blog post from OpenAI in Jan 2022 explaining some of the approaches they use to train, reduce and tube their LLM for particular tasks. This was all precursor to the ChatGPT system we now see.
"Talking About Large Language Models" by Murray Shanahan
[Nam, NeurIPS, 2022]. "Reinforcement Learning with State ObservationCosts in Action-Contingent Noiselessly Observable Markov Decision Processes"
Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
[Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
10K
Kind of ambiguous to use 10K when one of the most important variables is K.
n embedding for each timestep is learned and added to eachtoken – note this is different than the standard positional embedding used by transformers, as onetimestep corresponds to three tokens
one timestep corresponds to three tokens
we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a
Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.
"Burn Severity in Canada's Mountain National Parks: Patterns, Drivers, and Predictions" Weiwei Wang, Xianli Wang, et al Geophysical Research Letters
"On the Opportunities and Risks of Foundation Models" This is a large report by the Center for Research on Foundation Models at Stanford. They are creating and promoting the use of these models and trying to coin this name for them. They are also simply called large pre-trained models. So take it with a grain of salt, but also it has a lot of information about what they are, why they work so well in some domains and how they are changing the nature of ML research and application.
We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
A recent overview of RL methods used for autonomous driving.
As a baseline model we took the feature representation from a large pre-trained CNN such as ResNet50, by using the model and excluding the final dense layer, and using this in place of our convolution layers. We had predicted that this would likely get us some performance, but would inherently be worse, since we had fixed some of our trainable parameters.
They didn't try to train the CNN from scratch.
Discussion on
Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
Interesting sounding high level paper about the limits and constraints on general intelligence and how this might relate to the struggles AI/ML research has had historically.
Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
The paper that introduced the MineRL challenge dataset.
Weak supervision also objectively identifies relevant morphological features from the tissue microenv-iornment without any a priori knowledge or subjective annotation. In three separate analyses, we showed thatour models can identify well-known morphological features and accordingly, has the capability of identify-ing new morphological features of diagnostic, prognostic, and therapeutic relevance.
Their target images are very large and there is a known (supervised) label for the entire image, but no labels for parts of an image (e.g. where is the tumor exactly?). So the powerful property of their method is the ability to learn what parts of the image relate to the label on it's own.
The Canadian experiment has been built, in large part, around the American experiment: They have the melting pot, we have the cultural mosaic; they have the free market, we have sensible regulation; they have “life, liberty and the pursuit of happiness,” we have “peace, order and good government.”
I agree with this.
Northrop Frye once defined a Canadian as “an American who rejects the Revolution.”
I see what he means but I wouldn't go this far. Canadians do have a seperate cultural identity. It is defined by its lack of definiton and certainty, in contrast to American certainty. This ks why it isore resilient. It cannot have certainty because our nation was founded on "two solitudes" of French and English, Catholic and Protestant, and also the very different, though equally destructive relationship of the Eurooean colonizers with the Indigenous Peoples of Canada.
A flaw lurked right at the core of the experiment, as flaws so often do in works of ambitious genius.
The flaw was an assumption that everyone had the nation's best interests at heart, that they all wanted the same thing deep down.
Difference is the core of the American experience. Difference is its genius. There has never been a country so comfortable with difference, so full of difference.
Diversity is Strength. This is really one of their founding principles, even in its hypocrisy. For them the diversity was in religious faith and ways of thinking but did not include gender, ethnicity or anything else. In time this changed and it is the only reason America has done so well.
Such a map, plus the universal property of AA A<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math>, is in fact enough to reconstruct the entire Turing structure of CC \mathsf{C}<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mstyle mathvariant="sans-serif"><mi>C</mi></mstyle></mrow><annotation encoding="application/x-tex">\mathsf{C}</annotation></semantics></math>.
The minimal necessary to construct a Turing machine
not necessarily extensional, only intensional)
Whats the difference?