Hypothesis

59 Matching Annotations

Apr 2023
towardsdatascience.com towardsdatascience.com

Please Stop Drawing Neural Networks Wrong

1
1. mshook 25 Apr 2023
  
  in Public
  
  Now we are getting somewhere. At this point, we also see that the dimensions of W and b for each layer are specified by the dimensions of the inputs and the number of nodes in each layer. Let’s clean up the above diagram by not labeling every w and b value individually.
  
  nn ml visualization howto playground
Visit annotations in context

Tags

ml

howto

visualization

nn

playground

Annotators

mshook

URL

towardsdatascience.com/please-stop-drawing-neural-networks-wrong-ffd02b67ad77
Feb 2023
arxiv.org arxiv.org

2202.05262.pdf

1
1. mshook 01 Feb 2023
  
  in Public
  
  the Elhage et al.(2021) study showing an information-copying role for self-attention.
  
  It turns out Meng does refer to induction heads, just not by name.
  
  induction attention ml nn transformer
Visit annotations in context

Tags

ml

induction

nn

transformer

attention

Annotators

mshook

URL

arxiv.org/pdf/2202.05262
Jan 2023
www.cs.toronto.edu www.cs.toronto.edu

FFA13.pdf

1
1. mshook 10 Jan 2023
  
  in Public
  
  e twoareas in which the forward-forward algorithm may be superior to backpropagation are as a model oflearning in cortex and as a way of making use of very low-power analog hardware without resortingto reinforcement learning(Jabri and Flower, 1992).
  
  ffa nn ml brain
Visit annotations in context

Tags

ml

brain

ffa

nn

Annotators

mshook

URL

cs.toronto.edu/~hinton/FFA13.pdf
Dec 2022
rewriting.csail.mit.edu rewriting.csail.mit.edu

Rewriting a Deep Generative Model

1
1. mshook 24 Dec 2022
  
  in Public
  
  Our method is based on the hypothesis that the weights of a generator act as Optimal Linear Associative Memory (OLAM). OLAM is a classic single-layer neural data structure for memorizing associations that was described by Teuvo Kohonen and James A Anderson (independently) in the 1970s. In our case, we hypothesize that within a large modern multilayer convolutional network, the each individual layer plays the role of an OLAM that stores a set of rules that associates keys, which denote meaningful context, with values, which determine output.
  
  nn ml memory gpt
Visit annotations in context

Tags

ml

gpt

nn

memory

Annotators

mshook

URL

rewriting.csail.mit.edu/
Sep 2022
transformer-circuits.pub transformer-circuits.pub

Toy Models of Superposition

1
1. mshook 16 Sep 2022
  
  in Public
  
  Consider a toy model where we train an embedding of five features of varying importanceWhere “importance” is a scalar multiplier on mean squared error loss. in two dimensions, add a ReLU afterwards for filtering, and vary the sparsity of the features.
  
  colah autoencoder toy model ml nn
Visit annotations in context

Tags

autoencoder

ml

model

nn

toy

colah

Annotators

mshook

URL

transformer-circuits.pub/2022/toy_model/index.html
pyimagesearch.com pyimagesearch.com

A Deep Dive into Transformers with TensorFlow and Keras: Part 1 - PyImageSearch

2
1. mshook 05 Sep 2022
  
  in Public
  
  Now, the progression of NLP, as discussed, tells a story. We begin with tokens and then build representations of these tokens. We use these representations to find similarities between tokens and embed them in a high-dimensional space. The same embeddings are also passed into sequential models that can process sequential data. Those models are used to build context and, through an ingenious way, attend to parts of the input sentence that are useful to the output sentence in translation.
  
  transformer concise explanation paragraph ml nlp nn dimension
2. mshook 05 Sep 2022
  
  in Public
  
  Data, matrix multiplications, repeated and scaled with non-linear switches. Maybe that simplifies things a lot, but even today, most architectures boil down to these principles. Even the most complex systems, ideas, and papers can be boiled down to just that:
  
  ml nn minimal explanation
Visit annotations in context

Tags

concise

ml

nlp

nn

dimension

paragraph

minimal

transformer

explanation

Annotators

mshook

URL

pyimagesearch.com/2022/09/05/a-deep-dive-into-transformers-with-tensorflow-and-keras-part-1/
Jun 2022
direct.mit.edu direct.mit.edu

Human Language Understanding & Reasoning

1
1. mshook 14 Jun 2022
  
  in Public
  
  The dominant idea is one of attention, by which a representation at a position is computed as a weighted combination of representations from other positions. A common self-supervision objective in a transformer model is to mask out occasional words in a text. The model works out what word used to be there. It does this by calculating from each word position (including mask positions) vectors that represent a query, key, and value at that position. The query at a position is compared with the value at every position to calculate how much attention to pay to each position; based on this, a weighted average of the values at all positions is calculated. This operation is repeated many times at each level of the transformer neural net, and the resulting value is further manipulated through a fully connected neural net layer and through use of normalization layers and residual connections to produce a new vector for each word. This whole process is repeated many times, giving extra layers of depth to the transformer neural net. At the end, the representation above a mask position should capture the word that was there in the original text: for instance, committee as illustrated in Figure 1.
  
  transformer explanation attention qkv ml nn nlp language gpt good
Visit annotations in context

Tags

ml

nlp

nn

good

qkv

language

gpt

transformer

attention

explanation

Annotators

mshook

URL

direct.mit.edu/daed/article/151/2/127/110621/Human-Language-Understanding-amp-Reasoning
e2eml.school e2eml.school

Transformers from Scratch

1
1. mshook 09 Jun 2022
  
  in Public
  
  This trick of using a one-hot vector to pull out a particular row of a matrix is at the core of how transformers work.
  
  Matrix multiplication as table lookup
  
  transformer language nlp ml nn explanation matrix
Visit annotations in context

Tags

ml

language

nlp

nn

transformer

matrix

explanation

Annotators

mshook

URL

e2eml.school/transformers.html
May 2022
www.pnas.org www.pnas.org

The neural architecture of language: Integrative modeling converges on predictive processing | Proceedings of the National Academy of Sciences

1
1. mshook 30 May 2022
  
  in Public
  
  Given the complexities of the brain’s structure and the functions it performs, any one of these models is surely oversimplified and ultimately wrong—at best, an approximation of some aspects of what the brain does. However, some models are less wrong than others, and consistent trends in performance across models can reveal not just which model best fits the brain but also which properties of a model underlie its fit to the brain, thus yielding critical insights that transcend what any single model can tell us.
  
  brain nn ml comparison
Visit annotations in context

Tags

ml

brain

comparison

nn

Annotators

mshook

URL

pnas.org/doi/10.1073/pnas.2105646118
www.gwern.net www.gwern.net

1988-lang.pdf

1
1. mshook 18 May 2022
  
  in Public
  
  Such a highly non-linear problem would clearly benefitfrom the computational power of many layers. Unfortu-nately, back-propagation learning generally slows downby an order of magnitude every time a layer is added toa network.
  
  The problem in 1988
  
  1988 ml nn nonlinear spiral dataset c vintage
Visit annotations in context

Tags

ml

nn

vintage

c

spiral

1988

dataset

nonlinear

Annotators

mshook

URL

gwern.net/docs/ai/1988-lang.pdf
Apr 2022
arxiv.org arxiv.org

Understanding Neural Networks Through Deep Visualization

1
1. mshook 29 Apr 2022
  
  in Public
  
  Ourpre-trained network is nearly identical to the “AlexNet”architecture (Krizhevsky et al., 2012), but with local re-ponse normalization layers after pooling layers following(Jia et al., 2014). It was trained with the Caffe frameworkon the ImageNet 2012 dataset (Deng et al., 2009)
  
  ml nn visualization alexnet
Visit annotations in context

Tags

ml

visualization

nn

alexnet

Annotators

mshook

URL

arxiv.org/pdf/1506.06579.pdf
cs231n.github.io cs231n.github.io

CS231n Convolutional Neural Networks for Visual Recognition

1
1. mshook 20 Apr 2022
  
  in Public
  
  Example 1. For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5*5*3 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume. Example 2. Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3*3*20 = 180 connections to the input volume. Notice that, again, the connectivity is local in 2D space (e.g. 3x3), but full along the input depth (20).
  
  These two examples are the first two layers of Andrej Karpathy's wonderful working ConvNetJS CIFAR-10 demo here
  
  explanation cnn convolution nn ml image classificaiton
Visit annotations in context

Tags

ml

convolution

cnn

nn

image

classificaiton

explanation

Annotators

mshook

URL

cs231n.github.io/convolutional-networks/
cs.stanford.edu cs.stanford.edu

ConvNetJS CIFAR-10 demo

1
1. mshook 20 Apr 2022
  
  in Public
  
  input (32x32x3)max activation: 0.5, min: -0.5max gradient: 1.08696, min: -1.53051Activations:Activation Gradients:Weights:Weight Gradients:conv (32x32x16)filter size 5x5x3, stride 1max activation: 3.75919, min: -4.48241max gradient: 0.36571, min: -0.33032parameters: 16x5x5x3+16 = 1216
  
  The dimensions of these first two layers are explained here
  
  example explanation dimension nn ml cnn convolution tensor
Visit annotations in context

Tags

ml

convolution

nn

dimension

tensor

cnn

example

explanation

Annotators

mshook

URL

cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
codelabs.developers.google.com codelabs.developers.google.com

TensorFlow.js: Make your own "Teachable Machine" using transfer learning with TensorFlow.js | Google Codelabs

1
1. mshook 19 Apr 2022
  
  in Public
  
  Here the lower level layers are frozen and are not trained, only the new classification head will update itself to learn from the features provided from the pre-trained chopped up model on the left.
  
  ml nn transferlearning cnn explanation
Visit annotations in context

Tags

ml

cnn

nn

transferlearning

explanation

Annotators

mshook

URL

codelabs.developers.google.com/tensorflowjs-transfer-learning-teachable-machine
distill.pub distill.pub

Feature Visualization

1
1. mshook 02 Apr 2022
  
  in Public
  
  Starting from random noise, we optimize an image to activate a particular neuron (layer mixed4a, unit 11).
  
  And then we use that image as a kind of variable name to refer to the neuron in a way that more helpful than the the layer number and neuron index within the layer. This explanation is via one of Chris Olah's YouTube videos (https://www.youtube.com/watch?v=gXsKyZ_Y_i8)
  
  ml feature visualization colah good nn cnn inception interpretability
Visit annotations in context

Tags

ml

inception

nn

good

feature

visualization

cnn

interpretability

colah

Annotators

mshook

URL

distill.pub/2017/feature-visualization
Mar 2022
quillette.com quillette.com

The Case Against the Case Against AI

1
1. mshook 07 Mar 2022
  
  in Public
  
  A special quality of humans, not shared by evolution or, as yet, by machines, is our ability to recognize gaps in our understanding and to take joy in the process of filling them in. It is a beautiful thing to experience the mysterious, and powerful, too.
  
  joy mystery ml human nn
Visit annotations in context

Tags

ml

joy

nn

human

mystery

Annotators

mshook

URL

quillette.com/2022/01/07/the-case-against-the-case-against-ai/
Feb 2022
neuralnetworksanddeeplearning.com neuralnetworksanddeeplearning.com

Neural Networks and Deep Learning

1
1. mshook 05 Feb 2022
  
  in Public
  
  Somewhat confusingly, and for historical reasons, such multiple layer networks are sometimes called multilayer perceptrons or MLPs, despite being made up of sigmoid neurons, not perceptrons. I'm not going to use the MLP terminology in this book, since I think it's confusing, but wanted to warn you of its existence.
  
  mlp nn ml perceptron
Visit annotations in context

Tags

ml

mlp

perceptron

nn

Annotators

mshook

URL

neuralnetworksanddeeplearning.com/chap1.html
Dec 2021
www.nature.com www.nature.com

Toward a universal decoder of linguistic meaning from brain activation

1
1. mshook 28 Dec 2021
  
  in Public
  
  To test whether these distributed representations of meaning are neurally plausible, a number of studies have attempted to learn a mapping between particular semantic dimensions and patterns of brain activation
  
  brain neuron nn map concept dimention language
Visit annotations in context

Tags

neuron

brain

dimention

language

concept

nn

map

Annotators

mshook

URL

nature.com/articles/s41467-018-03068-4.pdf
cloud.google.com cloud.google.com

Understanding neural networks with TensorFlow Playground | Google Cloud Blog

1
1. mshook 11 Dec 2021
  
  in Public
  
  the only thing an artificial neuron can do: classify a data point into one of two kinds by examining input values with weights and bias.
  
  How does this relate to "weighted sum shows similarity between the weights and the inputs"?
  
  nn ml hyperplane classification neuron
Visit annotations in context

Tags

ml

neuron

hyperplane

nn

classification

Annotators

mshook

URL

cloud.google.com/blog/products/ai-machine-learning/understanding-neural-networks-with-tensorflow-playground
medium.com medium.com

Adventures of a Tensorflow.js n00b: Part I: Having a bad idea

1
1. mshook 02 Dec 2021
  
  in Public
  
  I’m particularly interested in two questions: First, just how weird is machine learning? Second, what sorts of choices do developers make as they shape a project?
  
  joho ml nn why question
Visit annotations in context

Tags

ml

nn

question

joho

why

Annotators

mshook

URL

medium.com/@tensorflow/adventures-of-a-tensorflow-js-n00b-part-i-having-a-bad-idea-25dc7f9ddc8e
Nov 2021
www.cell.com www.cell.com

Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

1
1. mshook 29 Nov 2021
  
  in Public
  
  ey use local computations to interpolate over task-rele-vant manifolds in a high-dimensional parameter space.
  
  evolution function nn ml manifold pdf body
Visit annotations in context

Tags

ml

manifold

evolution

nn

body

pdf

function

Annotators

mshook

URL

cell.com/neuron/pdf/S0896-6273(19)31044-X.pdf
e2eml.school e2eml.school

Transformers from Scratch

3
1. mshook 27 Nov 2021
  
  in Public
  
  Now that we've made peace with the concepts of projections (matrix multiplications)
  
  Projections are matrix multiplications.Why didn't you sayso? spatial and channel projections in the gated gmlp
  
  quesiton matrix multiplication algorithm nn ml
2. mshook 27 Nov 2021
  
  in Public
  
  Computers are especially good at matrix multiplications. There is an entire industry around building computer hardware specifically for fast matrix multiplications. Any computation that can be expressed as a matrix multiplication can be made shockingly efficient.
  
  ml nn matrix multiplication algorithm why
3. mshook 26 Nov 2021
  
  in Public
  
  The selective-second-order-with-skips model is a useful way to think about what transformers do, at least in the decoder side. It captures, to a first approximation, what generative language models like OpenAI's GPT-3 are doing.
  
  transformer attention ml good explanation nn qkv
Visit annotations in context

Tags

ml

nn

good

why

qkv

transformer

attention

quesiton

matrix

explanation

multiplication

algorithm

Annotators

mshook

URL

e2eml.school/transformers.html
distill.pub distill.pub

Visualizing Neural Networks with the Grand Tour

1
1. mshook 23 Nov 2021
  
  in Public
  
  The following figure presents a simple functional diagram of the neural network we will use throughout the article. The neural network is a sequence of linear (both convolutional A convolution calculates weighted sums of regions in the input. In neural networks, the learnable weights in convolutional layers are referred to as the kernel. For example Image credit to https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9. See also Convolution arithmetic. and fully-connected A fully-connected layer computes output neurons as weighted sum of input neurons. In matrix form, it is a matrix that linearly transforms the input vector into the output vector. ), max-pooling, and ReLU First introduced by Nair and Hinton, ReLU calculates f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x) for each entry in a vector input. Graphically, it is a hinge at the origin: Image credit to https://pytorch.org/docs/stable/nn.html#relu layers, culminating in a softmax Softmax function calculates S(yi)=eyiΣj=1NeyjS(y_i)=\frac{e^{y_i}}{\Sigma_{j=1}^{N} e^{y_j}}S(yi)=Σj=1Neyjeyi for each entry (yiy_iyi) in a vector input (yyy). For example, Image credit to https://ljvmiranda921.github.io/notebook/2017/08/13/softmax-and-the-negative-log-likelihood/ layer.
  
  This is a great visualization of MNIST hidden layers.
  
  mnist distill ml nn visualization good demo
Visit annotations in context

Tags

ml

mnist

demo

visualization

nn

distill

good

Annotators

mshook

URL

distill.pub/2020/grand-tour
towardsdatascience.com towardsdatascience.com

Transformers Explained Visually — Not just how, but Why they work so well

1
1. mshook 20 Nov 2021
  
  in Public
  
  The Query word can be interpreted as the word for which we are calculating Attention. The Key and Value word is the word to which we are paying attention ie. how relevant is that word to the Query word.
  
  Finally
  
  transformer query key value qkv attention ml nn good
Visit annotations in context

Tags

ml

key

value

nn

good

qkv

transformer

attention

query

Annotators

mshook

URL

towardsdatascience.com/transformers-explained-visually-not-just-how-but-why-they-work-so-well-d840bd61a9d3
www.lesswrong.com www.lesswrong.com

interpreting GPT: the logit lens - LessWrong

1
1. mshook 20 Nov 2021
  
  in Public
  
  Other work on interpreting transformer internals has focused mostly on what the attention is looking at. The logit lens focuses on what GPT "believes" after each step of processing, rather than how it updates that belief inside the step.
  
  gpt how ml nn transformer belief attention
Visit annotations in context

Tags

ml

how

nn

belief

gpt

attention

transformer

Annotators

mshook

URL

lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
distill.pub distill.pub

The Building Blocks of Interpretability

1
1. mshook 17 Nov 2021
  
  in Public
  
  The cube of activations that a neural network for computer vision develops at each hidden layer. Different slices of the cube allow us to target the activations of individual neurons, spatial positions, or channels.
  
  This is first explanation of
  
  colah ml nn image gmlp
Visit annotations in context

Tags

ml

gmlp

nn

image

colah

Annotators

mshook

URL

distill.pub/2018/building-blocks
towardsdatascience.com towardsdatascience.com

A Deep Dive Into the Transformer Architecture — The Development of Transformer Models

2
1. mshook 17 Nov 2021
  
  in Public
  
  The attention layer (W in the diagram) computes three vectors based on the input, termed key, query, and value.
  
  Could you be more specific?
  
  attention how ml nn transformer
2. mshook 17 Nov 2021
  
  in Public
  
  Attention is a means of selectively weighting different elements in input data, so that they will have an adjusted impact on the hidden states of downstream layers.
  
  ml nn transformer attention good explanation
Visit annotations in context

Tags

ml

nn

good

how

explanation

attention

transformer

Annotators

mshook

URL

towardsdatascience.com/a-deep-dive-into-the-transformer-architecture-the-development-of-transformer-models-acbdf7ca34e0
www.pnas.org www.pnas.org

The neural architecture of language: Integrative modeling converges on predictive processing

1
1. mshook 10 Nov 2021
  
  in Public
  
  These findings provide strong evidence for a classic hypothesis about the computations underlying human language understanding, that the brain’s language system is optimized for predictive processing in the service of meaning extraction
  
  language meaning nn ml brain gpt
Visit annotations in context

Tags

ml

brain

language

nn

gpt

meaning

Annotators

mshook

URL

pnas.org/content/118/45/e2105646118
towardsdatascience.com towardsdatascience.com

Illustrated Guide to LSTM’s and GRU’s: A step by step explanation

1
1. mshook 05 Nov 2021
  
  in Public
  
  To review, the Forget gate decides what is relevant to keep from prior steps. The input gate decides what information is relevant to add from the current step. The output gate determines what the next hidden state should be.Code DemoFor those of you who understand better through seeing the code, here is an example using python pseudo code.
  
  lstm rnn nn ml code good
Visit annotations in context

Tags

ml

rnn

code

nn

good

lstm

Annotators

mshook

URL

towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
Oct 2021
colah.github.io colah.github.io

Visualizing Representations: Deep Learning and Human Beings - colah's blog

2
1. mshook 31 Oct 2021
  
  in Public
  
  This approach, visualizing high-dimensional representations using dimensionality reduction, is an extremely broadly applicable technique for inspecting models in deep learning.
  
  dimension ml colah nn
2. mshook 31 Oct 2021
  
  in Public
  
  These layers warp and reshape the data to make it easier to classify.
  
  nn layer ml example colah
Visit annotations in context

Tags

ml

nn

dimension

layer

example

colah

Annotators

mshook

URL

colah.github.io/posts/2015-01-Visualizing-Representations/
cloud.google.com cloud.google.com

Understanding neural networks with TensorFlow Playground | Google Cloud Blog

1
1. mshook 16 Oct 2021
  
  in Public
  
  Even with this very primitive single neuron, you can achieve 90% accuracy when recognizing a handwritten text image1. To recognize all the digits from 0 to 9, you would need just ten neurons to recognize them with 92% accuracy.
  
  And here is a Google Colab notebook that demonstrates that
  
  mnist nn ml linear dimension colab
Visit annotations in context

Tags

ml

linear

mnist

colab

nn

dimension

Annotators

mshook

URL

cloud.google.com/blog/products/ai-machine-learning/understanding-neural-networks-with-tensorflow-playground
aegeorge42.github.io aegeorge42.github.io

Neural Networks from Scratch

1
1. kael 11 Oct 2021
  
  in Public
  
  nn
Visit annotations in context

Tags

nn

Annotators

kael

URL

aegeorge42.github.io/
Sep 2021
arxiv.org arxiv.org

2107.06762.pdf

3
1. mshook 25 Sep 2021
  
  in Public
  
  The models are developed in Python [46], using the Keras [47] and Tensorflow [48] libraries. Detailson the code and dependencies to run the experiments are listed in a Readme file available togetherwith the code in the Supplemental Material.
  
  I have not found the code or Readme file
  
  python celegans nn model tensorflow keras
2. mshook 25 Sep 2021
  
  in Public
  
  These results nonetheless show that it could be feasible to develop recurrent neural network modelsable to infer input-output behaviours of real biological systems, enabling researchers to advance theirunderstanding of these systems even in the absence of detailed level of connectivity.
  
  Too strong a claim?
  
  nn celegans rnn
3. mshook 25 Sep 2021
  
  in Public
  
  We show that GRU models with a hidden layersize of 4 units are able to accurately reproduce with high accuracy the system’sresponse to very different stimuli.
  
  celegans rnn nn
Visit annotations in context

Tags

rnn

nn

keras

python

tensorflow

model

celegans

Annotators

mshook

URL

arxiv.org/pdf/2107.06762.pdf
arxiv.org arxiv.org

Speech Recognition: Key Word Spotting through Image Recognition

1
1. mshook 17 Sep 2021
  
  in Public
  
  Humans perform a version of this task when interpretinghard-to-understand speech, such as an accent which is particularlyfast or slurred, or a sentence in a language we do not know verywell—we do not necessarily hear every single word that is said,but we pick up on salient key words and contextualize the rest tounderstand the sentence.
  
  Boy, don't they
  
  asr speech nn ml human language recognition understanding meaning
Visit annotations in context

Tags

ml

recognition

speech

asr

nn

meaning

language

human

understanding

Annotators

mshook

URL

arxiv.org/pdf/1803.03759.pdf
www.ccom.ucsd.edu www.ccom.ucsd.edu

MNIST Digit Playground

1
1. mshook 13 Sep 2021
  
  in Public
  
  A neural network will predict your digit in the blue square above. Your image is 784 pixels (= 28 rows by 28 columns with black=1 and white=0). Those 784 features get fed into a 3 layer neural network; Input:784 - AvgPool:196 - Dense:100 - Softmax:10.
  
  image recognition demo javascript nn ml
Visit annotations in context

Tags

ml

recognition

demo

javascript

nn

image

Annotators

mshook

URL

ccom.ucsd.edu/~cdeotte/programs/MNIST.html
www.isca-speech.org www.isca-speech.org

Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases

1
1. mshook 09 Sep 2021
  
  in Public
  
  Personalized ASR models. For each of the 432 participants with disordered speech, we create a personalized ASR model (SI-2) from their own recordings. Our fine-tuning procedure was optimized for our adaptation process, where we only have between ¼ and 2 h of data per speaker. We found that updating only the first five encoder layers (versus the complete model) worked best and successfully prevented overfitting [10]
  
  speech recognition nn google euphonia ml model
Visit annotations in context

Tags

ml

recognition

speech

model

nn

euphonia

google

Annotators

mshook

URL

isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf
jalammar.github.io jalammar.github.io

A Visual and Interactive Guide to the Basics of Neural Networks

1
1. mshook 06 Sep 2021
  
  in Public
  
  So whenever you hear of someone “training” a neural network, it just means finding the weights we use to calculate the prediction.
  
  jay nn ml
Visit annotations in context

Tags

ml

nn

jay

Annotators

mshook

URL

jalammar.github.io/visual-interactive-guide-basics-neural-networks/
Aug 2021
stats.stackexchange.com stats.stackexchange.com

What exactly are keys, queries, and values in attention mechanisms?

3
1. mshook 28 Aug 2021
  
  in Public
  
  I'm going to try provide an English text example. The following is based solely on my intuitive understanding of the paper 'Attention is all you need'.
  
  This is also good
  
  attention key value query nn ml
2. mshook 28 Aug 2021
  
  in Public
  
  For the word q that your eyes see in the given sentence, what is the most related word k in the sentence to understand what q is about?
  
  attention key value query nn ml
3. mshook 28 Aug 2021
  
  in Public
  
  So basically: q = the vector representing a word K and V = your memory, thus all the words that have been generated before. Note that K and V can be the same (but don't have to). So what you do with attention is that you take your current query (word in most cases) and look in your memory for similar keys. To come up with a distribution of relevant words, the softmax function is then used.
  
  attention key value query nn ml
Visit annotations in context

Tags

ml

value

key

nn

attention

query

Annotators

mshook

URL

stats.stackexchange.com/questions/421935/what-exactly-are-keys-queries-and-values-in-attention-mechanisms
colah.github.io colah.github.io

Deep Learning, NLP, and Representations - colah's blog

2
1. mshook 07 Aug 2021
  
  in Public
  
  A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem. It’s true, essentially, because the hidden layer can be used as a lookup table.
  
  lookup nn ml turing complete 2021 august
2. mshook 06 Aug 2021
  
  in Public
  
  Recursive Neural Networks
  
  rnn colah ml nn
Visit annotations in context

Tags

ml

rnn

nn

lookup

complete

turing

2021

colah

august

Annotators

mshook

URL

colah.github.io/posts/2014-07-NLP-RNNs-Representations/
arxiv.org arxiv.org

Big Bird: Transformers for Longer Sequences

1
1. mshook 06 Aug 2021
  
  in Public
  
  We show that BigBird is a universal approximator of sequence functions and is Turing complete,
  
  turing complete machine nlp transformer ml nn attention august 2021
Visit annotations in context

Tags

ml

nlp

nn

complete

machine

turing

2021

transformer

attention

august

Annotators

mshook

URL

arxiv.org/abs/2007.14062
Jul 2021
www.codemotion.com www.codemotion.com

BERT: how Google changed NLP - Codemotion Magazine

1
1. mshook 30 Jul 2021
  
  in Public
  
  hyper-parameters, i.e., parameters external to the model, such as the learning rate, the batch size, the number of epochs.
  
  parameter code ml nn bert model
Visit annotations in context

Tags

ml

model

nn

code

bert

parameter

Annotators

mshook

URL

codemotion.com/magazine/dev-hub/machine-learning-dev/bert-how-google-changed-nlp-and-how-to-benefit-from-this/
colah.github.io colah.github.io

Neural Networks, Types, and Functional Programming -- colah's blog

1
1. mshook 25 Jul 2021
  
  in Public
  
  Using multiple copies of a neuron in different places is the neural network equivalent of using functions. Because there is less to learn, the model learns more quickly and learns a better model. This technique – the technical name for it is ‘weight tying’ – is essential to the phenomenal results we’ve recently seen from deep learning.
  
  ml fp functional nn 2021 july
Visit annotations in context

Tags

ml

fp

functional

july

nn

2021

Annotators

mshook

URL

colah.github.io/posts/2015-09-NN-Types-FP/
www.baeldung.com www.baeldung.com

Euclidean Distance vs Cosine Similarity | Baeldung on Computer Science

1
1. mshook 16 Jul 2021
  
  in Public
  
  Vectors with a small Euclidean distance from one another are located in the same region of a vector space. Vectors with a high cosine similarity are located in the same general direction from the origin.
  
  ml nn embeddings distance angle cosine comparison explanation
Visit annotations in context

Tags

ml

embeddings

nn

cosine

comparison

angle

distance

explanation

Annotators

mshook

URL

baeldung.com/cs/euclidean-distance-vs-cosine-similarity
iamtrask.github.io iamtrask.github.io

A Neural Network in 11 lines of Python (Part 1) - i am trask

2
1. mshook 15 Jul 2021
  
  in Public
  
  If you're serious about neural networks, I have one recommendation. Try to rebuild this network from memory.
  
  ml nn howto learn recommendation good
2. mshook 09 Jul 2021
  
  in Public
  
  If you're serious about neural networks, I have one recommendation. Try to rebuild this network from memory.
  
  nn howto learn ml python
Visit annotations in context

Tags

ml

howto

learn

nn

good

python

recommendation

Annotators

mshook

URL

iamtrask.github.io/2015/07/12/basic-python-network/
mlech26l.github.io mlech26l.github.io

The wormnet project - Part 1

1
1. mshook 05 Jul 2021
  
  in Public
  
  In our research, i.e., the wormnet project, we try to build machine learning models motivated by the C. elegans nervous system. By doing so, we have to pay a cost, as we constrain ourselves to such models in contrast to standard artificial neural networks, whose modeling space is purely constraint by memory and compute limitations. However, there are potentially some advantages and benefits we gain. Our objective is to better understand what’s necessary for effective neural information processing to emerge.
  
  celegans nn neuron ml biology
Visit annotations in context

Tags

ml

neuron

biology

nn

celegans

Annotators

mshook

URL

mlech26l.github.io/pages/2020/09/14/wormnet1.html
Jun 2021
cloud.google.com cloud.google.com

Understanding neural networks with TensorFlow Playground | Google Cloud Blog

1
1. mshook 17 Jun 2021
  
  in Public
  
  This dataset can not be classified by a single neuron, as the two groups of data points can't be divided by a single line.
  
  nn explanation ml tensorflow
Visit annotations in context

Tags

ml

explanation

tensorflow

nn

Annotators

mshook

URL

cloud.google.com/blog/products/ai-machine-learning/understanding-neural-networks-with-tensorflow-playground
Jun 2015
Local file Local file

Mongolisches Reich

1
1. consta37 27 Jun 2015
  
  in Public
  
  aren miteinander aber
  
  nn
  
  nn
Tags

nn

Annotators

consta37
Jan 2015
cs231n.github.io cs231n.github.io

CS231n Convolutional Neural Networks for Visual Recognition

1
1. alshedivat 11 Jan 2015
  
  in Public
  
  k - Nearest Neighbor Classifier
  
  Is there a probabilistic interpretation of k-NN? Say, something like "k-NN is equivalent to [a probabilistic model] under the following conditions on the data and the k."
  
  k-NN probabilistic model
Visit annotations in context

Tags

k-NN

probabilistic model

Annotators

alshedivat

URL

cs231n.github.io/classification/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators