46 Matching Annotations
  1. Last 7 days
    1. U is a finite set of states,

      Apply a set of logical rules to the state space to obtain a finite set of states.

    2. state-reward function,

      reward is a constant number assigned to each set of states

    3. Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning

      [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"

  2. Jan 2023
  3. www.cs.princeton.edu www.cs.princeton.edu
    1. "Finding Optimal Solutions to Rubik's Cub e Using Pattern Databases" by Richard E. Korf, AAAI 1997.

      The famous "Korf Algorithm" for finding the optimal solution to any Rubik's Cube state.

    1. make up facts less often

      but not "never"

    2. On prompts submitted by our customers to the API,[1

      really? so that's how they make money.

      Question: what kind of bias does this introduce into the model?

      • which topics and questions grt trained on?
      • what is the goal of training? truth? clickability?
    3. Blog post from OpenAI in Jan 2022 explaining some of the approaches they use to train, reduce and tube their LLM for particular tasks. This was all precursor to the ChatGPT system we now see.

    1. Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"

      Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.

  4. Dec 2022
    1. Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"

      A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!

    1. "Decision Transformer: Reinforcement Learning via Sequence Modeling" (Chen, NeurIPS, 2021)

      Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.

  5. Nov 2022
    1. 10K

      Kind of ambiguous to use 10K when one of the most important variables is K.

    2. n embedding for each timestep is learned and added to eachtoken – note this is different than the standard positional embedding used by transformers, as onetimestep corresponds to three tokens

      one timestep corresponds to three tokens

    1. we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a

      Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.

    1. "On the Opportunities and Risks of Foundation Models" This is a large report by the Center for Research on Foundation Models at Stanford. They are creating and promoting the use of these models and trying to coin this name for them. They are also simply called large pre-trained models. So take it with a grain of salt, but also it has a lot of information about what they are, why they work so well in some domains and how they are changing the nature of ML research and application.

    1. Using adversarial deep learning approaches to get a better correction for causal inference from observational data.

    2. Kallus, N. (2020). DeepMatch: Balancing deep covariate representations for causal inference using adversarial training. In I. H. Daumé, & A. Singh (Eds.), Proceedings of the 37th international conference on machine learning. In Proceedings of Machine Learning Research: vol. 119 (pp. 5067–5077). PMLR

    1. Bias-variance trade-off

      The Bias - Variance Tradeoff!

    2. (Cousineau,Verter, Murphy and Pineau, 2023) " Estimating causal effects with optimization-based methods: A review and empirical comparison"

    1. Matching: This approach seeks to replicate a balanced experimental design usingobservational data by finding close matches between pairs or groups of units andseparating out the ones that received a specified treatment from those that did not, thusdefining the control groups.

      Matching approach to dealing with sampling bias. Basically use some intrinsic, or other, metric about the situations to cluster them so that "similar" situations will be dealt with similiarly. Then analysis is carried out on those clusters. Number of clusters has to be defined, some method, like k-means, if often used. Depends a lot on the similarity metric, the clustering approach, other assumptions

    2. To avoid such bias, a fundamental aspect in the research design of studies of causalinference is the identification strategy: a clear definition of the sources of variation in the datathat can be used to estimate the causal effect of interest.

      To avoid making false conclusions, studies must identify all the sources of variation. Is this is even possible in most caes?

    3. Terwiesch, 2022 - "A review of Empircal Operations Managment over the Last Two Decades" Listed as an important review of methods for addressing biases in Operations management by explicitly addressing causality.

  6. Sep 2022
    1. We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
    1. AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.

      S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.

  7. Jul 2022
    1. As a baseline model we took the feature representation from a large pre-trained CNN such as ResNet50, by using the model and excluding the final dense layer, and using this in place of our convolution layers. We had predicted that this would likely get us some performance, but would inherently be worse, since we had fixed some of our trainable parameters.

      They didn't try to train the CNN from scratch.

  8. Jun 2022
    1. Discussion of the paper:

      Ghojogh B, Ghodsi A, Karray F, Crowley M. Theoretical Connection between Locally Linear Embedding, Factor Analysis, and Probabilistic PCA. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/7eqtuyyc

  9. May 2022
    1. Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.

    1. Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.

  10. Mar 2022
    1. Weak supervision also objectively identifies relevant morphological features from the tissue microenv-iornment without any a priori knowledge or subjective annotation. In three separate analyses, we showed thatour models can identify well-known morphological features and accordingly, has the capability of identify-ing new morphological features of diagnostic, prognostic, and therapeutic relevance.

      Their target images are very large and there is a known (supervised) label for the entire image, but no labels for parts of an image (e.g. where is the tumor exactly?). So the powerful property of their method is the ability to learn what parts of the image relate to the label on it's own.

  11. Jan 2022
    1. The Canadian experiment has been built, in large part, around the American experiment: They have the melting pot, we have the cultural mosaic; they have the free market, we have sensible regulation; they have “life, liberty and the pursuit of happiness,” we have “peace, order and good government.”

      I agree with this.

    2. Northrop Frye once defined a Canadian as “an American who rejects the Revolution.”

      I see what he means but I wouldn't go this far. Canadians do have a seperate cultural identity. It is defined by its lack of definiton and certainty, in contrast to American certainty. This ks why it isore resilient. It cannot have certainty because our nation was founded on "two solitudes" of French and English, Catholic and Protestant, and also the very different, though equally destructive relationship of the Eurooean colonizers with the Indigenous Peoples of Canada.

    3. A flaw lurked right at the core of the experiment, as flaws so often do in works of ambitious genius.

      The flaw was an assumption that everyone had the nation's best interests at heart, that they all wanted the same thing deep down.

    4. Difference is the core of the American experience. Difference is its genius. There has never been a country so comfortable with difference, so full of difference.

      Diversity is Strength. This is really one of their founding principles, even in its hypocrisy. For them the diversity was in religious faith and ways of thinking but did not include gender, ethnicity or anything else. In time this changed and it is the only reason America has done so well.

  12. Jul 2021
    1. Such a map, plus the universal property of AA A<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math>, is in fact enough to reconstruct the entire Turing structure of CC \mathsf{C}<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mstyle mathvariant="sans-serif"><mi>C</mi></mstyle></mrow><annotation encoding="application/x-tex">\mathsf{C}</annotation></semantics></math>.

      The minimal necessary to construct a Turing machine

    2. not necessarily extensional, only intensional)

      Whats the difference?