Hypothesis

29 Matching Annotations

Feb 2025
epoch.ai epoch.ai

How has DeepSeek improved the Transformer architecture?

1
1. mark.crowley 02 Feb 2025
  
  in Public
  
  Detailed explanation of what DeepSeek model is doing differently to improve performance and training time over ChatGPT.
  
  large-language-models transformers deepseek chat-gpt
Visit annotations in context

Tags

transformers

large-language-models

deepseek

chat-gpt

Annotators

mark.crowley

URL

epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
Aug 2024
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-direct-multi-view-multi-person-3d-pose-estimation-Paper.pdf

1
1. mark.crowley 15 Aug 2024
  
  in Public
  
  MvP : "Direct Multi-view Multi-person 3D Pose Estimation" Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
  
  Influential paper on learning consistent skeletal models of human pose from multiview images
  
  transformers pedestrian-detection multi-view attention projective-attention consistency-learning
Visit annotations in context

Tags

attention

pedestrian-detection

transformers

multi-view

projective-attention

consistency-learning

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/6da9003b743b65f4c0ccd295cc484e57-Paper.pdf
openaccess.thecvf.com openaccess.thecvf.com

Multiple View Geometry Transformers for 3D Human Pose Estimation

1
1. mark.crowley 15 Aug 2024
  
  in Public
  
  Really interesting and innovative method for using multiview perspective data to learn human pose and pedestrian detection.
  
  transformers pedestrian-detection hierarchical-modelling consistency-learning
Visit annotations in context

Tags

transformers

hierarchical-modelling

pedestrian-detection

consistency-learning

Annotators

mark.crowley

URL

openaccess.thecvf.com/content/CVPR2024/papers/Liao_Multiple_View_Geometry_Transformers_for_3D_Human_Pose_Estimation_CVPR_2024_paper.pdf
Jan 2024
arxiv.org arxiv.org

2401.05566.pdf

1
1. mark.crowley 26 Jan 2024
  
  in Public
  
  Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
  
  Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
  
  large-language-models transformers rdgrp rdgrp-w24
Visit annotations in context

Tags

transformers

large-language-models

rdgrp-w24

rdgrp

Annotators

mark.crowley

URL

arxiv.org/pdf/2401.05566
cdn.openai.com cdn.openai.com

gpt-4-system-card.pdf

1
1. mark.crowley 06 Jan 2024
  
  in Public
  
  GPT-4 System CardOpenAIMarch 23, 2023
  
  chat-gpt large-language-models openai system-cards transformers toread reading_group_crowley
Visit annotations in context

Tags

large-language-models

toread

system-cards

chat-gpt

openai

transformers

reading_group_crowley

Annotators

mark.crowley

URL

cdn.openai.com/papers/gpt-4-system-card.pdf
Nov 2023
proceedings.mlr.press proceedings.mlr.press

janner22a.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

reinforcement-learning

rdgrp-f23

Annotators

mark.crowley

URL

proceedings.mlr.press/v162/janner22a/janner22a.pdf
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-offline-reinforcement-learning-as-one-big-sequence-modeling-problem-Paper.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

reinforcement-learning

rdgrp-f23

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf
Oct 2023
arxiv.org arxiv.org

RoBERTa: A Robustly Optimized BERT Pretraining Approach

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Introduction of the RoBERTa improved analysis and training approach to BERT NLP models.
  
  large-language-models nlp transformers rdgrp-s23 reading_group_crowley
Visit annotations in context

Tags

large-language-models

rdgrp-s23

nlp

transformers

reading_group_crowley

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.11692
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
  
  Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
  
  reinforcement-learning transformers generative-models minecraft minerl rdgrp-f23 reading_group_crowley
Visit annotations in context

Tags

minecraft

generative-models

minerl

reinforcement-learning

transformers

reading_group_crowley

rdgrp-f23

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2308.13067.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
  
  transformers large-language-models nlp reading_group_crowley rdgrp-f23
Visit annotations in context

Tags

large-language-models

nlp

rdgrp-f23

transformers

reading_group_crowley

Annotators

mark.crowley

URL

arxiv.org/pdf/2308.13067.pdf
arxiv.org arxiv.org

2212.05032.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"
  
  Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
  
  chatgpt large-language-models nlp transformers ece657a
Visit annotations in context

Tags

large-language-models

nlp

chatgpt

transformers

ece657a

Annotators

mark.crowley

URL

arxiv.org/pdf/2212.05032.pdf
cdn.openai.com cdn.openai.com

Language Models are Unsupervised Multitask Learners

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-2 Introduction paper
  
  Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

large-language-models

nlp

rdgrp-s23

transformers

gpt

machine-learning

reading_group_crowley

Annotators

mark.crowley

URL

cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Attention is All You Need" Foundational paper introducing the Transformer Architecture.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

large-language-models

rdgrp-s23

nlp

transformers

reading_group_crowley

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-3 introduction paper
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

large-language-models

nlp

rdgrp-s23

transformers

gpt

machine-learning

reading_group_crowley

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
arxiv.org arxiv.org

2105.03322.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Are Pre-trained Convolutions Better than Pre-trained Transformers?"
  
  transformers deep-learning nlp large-language-models reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

deep-learning

large-language-models

nlp

rdgrp-s23

transformers

reading_group_crowley

Annotators

mark.crowley

URL

arxiv.org/pdf/2105.03322.pdf
arxiv.org arxiv.org

2201.08239.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  LaMDA: Language Models for Dialog Application
  
  "LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

large-language-models

rdgrp-s23

nlp

transformers

reading_group_crowley

Annotators

mark.crowley

URL

arxiv.org/pdf/2201.08239.pdf
osf.io osf.io

Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
  
  reading_group_crowley transformers reading_group_crowley rdgrp-s23 nlp large-language-models
Visit annotations in context

Tags

large-language-models

rdgrp-s23

nlp

transformers

reading_group_crowley

Annotators

mark.crowley

URL

osf.io/m6gcn/
typeshare.co typeshare.co

Under the Hood: How to Use ChatGPT's Attention Mechanism for Better Prompts

1
1. chrisaldrich 22 Oct 2023
  
  in Public
  
  https://typeshare.co/go-go-golems/posts/under-the-hood-how-to-use-chatgpts-attention-mechanism-for-better-prompts
  
  read ChatGPT prompts attention mechanism transformers artificial intelligence for writing auto-regression
Visit annotations in context

Tags

auto-regression

prompts

attention mechanism

ChatGPT

artificial intelligence for writing

transformers

read

Annotators

chrisaldrich

URL

typeshare.co/go-go-golems/posts/under-the-hood-how-to-use-chatgpts-attention-mechanism-for-better-prompts
Jul 2023
arxiv.org arxiv.org

2307.09288.pdf

1
1. mark.crowley 19 Jul 2023
  
  in Public
  
  LLAMA 2 Release Paper
  
  large-language-models transformers
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2307.09288
Jun 2023
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 28 Jun 2023
  
  in Public
  
  We use the same model and architecture as GPT-2
  
  What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
  
  machine-learning transformers gpt ml-practice
Visit annotations in context

Tags

transformers

gpt

machine-learning

ml-practice

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
jmlr.org jmlr.org

20-074.pdf

2
1. mark.crowley 06 Jun 2023
  
  in Public
  
  introducing a unified framework that converts all text-basedlanguage problems into a text-to-text format
  
  this is their goal, to have a single model, including hyperparameters and setup, that can be used for any NLP task.
  
  nlp transformers
2. mark.crowley 06 Jun 2023
  
  in Public
  
  Paper introducing the T5 Text-to-Text transformer mdoel from google. (Raffel, JMLR, 2020)
  
  transformers nlp
Visit annotations in context

Tags

transformers

nlp

Annotators

mark.crowley

URL

jmlr.org/papers/volume21/20-074/20-074.pdf
Apr 2023
srush.github.io srush.github.io

The Annotated S4

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  The Annotated S4 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré.
  
  A new approach to transformers
  
  transformers large-language-models
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

srush.github.io/annotated-s4/
arxiv.org arxiv.org

Efficiently Modeling Long Sequences with Structured State Spaces

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, and Christopher R ́eDepartment of Computer Science, Stanford University
  
  transformers large-language-models
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2111.00396
Jan 2023
inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net

Untitled document

1
1. mark.crowley 10 Jan 2023
  
  in Public
  
  "Talking About Large Language Models" by Murray Shanahan
  
  nlp large-language-models deep-learning transformers
Visit annotations in context

Tags

deep-learning

large-language-models

transformers

nlp

Annotators

mark.crowley

URL

inst-fs-iad-prod.inscloudgate.net/files/4b2a700d-1125-444d-bd72-8045fe274f37/Shanahan2023.pdf
Dec 2022
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

reinforcement-learning

transformers

proj-minerl

conf-neurips-2022

transfer-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
Nov 2022
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 22 Nov 2022
  
  in Public
  
  we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a
  
  Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.
  
  transformers attention-mechanism
Visit annotations in context

Tags

transformers

attention-mechanism

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

transformers

offline-learning

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
Feb 2022
www.supercoloring.com www.supercoloring.com

Transformers Toys | Free Printable Papercraft Templates

1
1. Onjanirina 20 Feb 2022
  
  in Public
  
  Paper Transformer Toys Templates
  
  transformers
Visit annotations in context

Tags

transformers

Annotators

Onjanirina

URL

supercoloring.com/paper-crafts/paper-toys/tv-series-and-movies-characters/transformers-toys