- Sep 2024
-
moodle.lynchburg.edu moodle.lynchburg.edu
-
what can be more instructive thanthe leadership of a group within a group?
ML: That is why sports teams have certain captains within certain positions on a field
-
But the hushing of the criticism of honest opponents is a dangerousthing. It leads some of the best of the critics to unfortunate silence andparalysis of effort, and others toburst into speech so passionatelyand intemperately as to lose listeners.
ML: just like in life today, some people are willing to speak out for what they believe in even if that means losing support from others, and others choose to remain silent in times like that
-
-
www.datacamp.com www.datacamp.com
-
How deep learning differs from traditional machine learning While machine learning has been a transformative technology in its own right, deep learning takes it a step further by automating many of the tasks that typically require human expertise. Deep learning is essentially a specialized subset of machine learning, distinguished by its use of neural networks with three or more layers. These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—in order to "learn" from large amounts of data. You can explore machine learning vs deep learning in more detail in a separate post.
-
- Jul 2024
-
www.datacenterdynamics.com www.datacenterdynamics.com
-
“For our customer base, there's a lot of folks who say ‘I don't actually need the newest B100 or B200,’” Erb says. “They don’t need to train the models in four days, they’re okay doing it in two weeks for a quarter of the cost. We actually still have Maxwell-generation GPUs [first released in 2014] that are running in production. That said, we are investing heavily in the next generation.”
What would the energy cost be of the two compared like this?
-
- May 2024
-
Local file Local file
-
normalizeddifference vegetation index (NDVI)
O Índice de Vegetação por Diferença Normalizada (NDVI, do inglês Normalized Difference Vegetation Index) é uma métrica amplamente utilizada na área de sensoriamento remoto para quantificar a vegetação em uma determinada área a partir de imagens de satélite ou aeronaves. Este índice é baseado na reflexão da luz em diferentes comprimentos de onda pelas plantas.
-
- Apr 2024
-
zenodo.org zenodo.org
-
Machine learning is acknowledged to have originated with the work of McCulloch and Pitts (1943). They recognised that brain signals are digital in nature, more specifically binary signals. According to Chakraborty and Joseph (2017) each ML system comprises five components: (1) a problem, (2) data source, (3) a model, (4) an optimization algorithm and (5) validation and testing.<br /> ML is best suited for situations that require extracting patterns from noisy data or sensory perception—or a data-up approach.
Benford’s Law” is one of the simplest ways to detect fraud. It is accomplished by running an analysis on the first digits in a given set of data. A predictable distribution of first digits will exist in a set of “real” data. Benford’s Law has existed since the late 1800s. AI is beneficial here because ML algorith....
-
- Mar 2024
-
-
Abstract
结论:预测结果,好于MOST(MO估计系统地低估了湍流通量的大小,改善了与观测值和减小与观测通量偏离的总幅度。),不同地点的泛化能力 不足:不含物质通量,预测结果待提升,结果因稳定性而异常,不同季节的泛化能力,运用了不易获得的变量(找到最小观测集)
Tags
Annotators
-
- Feb 2024
-
txt.cohere.com txt.cohere.com
-
Constructing Prompts for the Command Model Techniques for constructing prompts for the Command model. Developers
Tags
Annotators
URL
-
-
docs.cohere.com docs.cohere.com
-
Now, let’s modify the prompt by adding a few examples of how we expect the output to be. Pythonuser_input = "Send a message to Alison to ask if she can pick me up tonight to go to the concert together" prompt=f"""Turn the following message to a virtual assistant into the correct action: Message: Ask my aunt if she can go to the JDRF Walk with me October 6th Action: can you go to the jdrf walk with me october 6th Message: Ask Eliza what should I bring to the wedding tomorrow Action: what should I bring to the wedding tomorrow Message: Send message to supervisor that I am sick and will not be in today Action: I am sick and will not be in today Message: {user_input}""" response = generate_text(prompt, temp=0) print(response) This time, the style of the response is exactly how we want it. Can you pick me up tonight to go to the concert together?
-
But we can also get the model to generate responses in a certain format. Let’s look at a couple of them: markdown tables
-
And here’s the same request to the model, this time with the product description of the product added as context. Pythoncontext = """Think back to the last time you were working without any distractions in the office. That's right...I bet it's been a while. \ With the newly improved CO-1T noise-cancelling Bluetooth headphones, you can work in peace all day. Designed in partnership with \ software developers who work around the mayhem of tech startups, these headphones are finally the break you've been waiting for. With \ fast charging capacity and wireless Bluetooth connectivity, the CO-1T is the easy breezy way to get through your day without being \ overwhelmed by the chaos of the world.""" user_input = "What are the key features of the CO-1T wireless headphone" prompt = f"""{context} Given the information above, answer this question: {user_input}""" response = generate_text(prompt, temp=0) print(response) Now, the model accurately lists the features of the model. The answer is: The CO-1T wireless headphones are designed to be noise-canceling and Bluetooth-enabled. They are also designed to be fast charging and have wireless Bluetooth connectivity. Format
-
While LLMs excel in text generation tasks, they struggle in context-aware scenarios. Here’s an example. If you were to ask the model for the top qualities to look for in wireless headphones, it will duly generate a solid list of points. But if you were to ask it for the top qualities of the CO-1T headphone, it will not be able to provide an accurate response because it doesn’t know about it (CO-1T is a hypothetical product we just made up for illustration purposes). In real applications, being able to add context to a prompt is key because this is what enables personalized generative AI for a team or company. It makes many use cases possible, such as intelligent assistants, customer support, and productivity tools, that retrieve the right information from a wide range of sources and add it to the prompt.
-
We set a default temperature value of 0, which nudges the response to be more predictable and less random. Throughout this chapter, you’ll see different temperature values being used in different situations. Increasing the temperature value tells the model to generate less predictable responses and instead be more “creative.”
Tags
Annotators
URL
-
- Oct 2023
-
media.proquest.com media.proquest.com
-
racialized social hierarchies, thus facilitating dominationand exploitation.
We also talked about that in Traditions and Revolutions
Tags
Annotators
URL
-
- Sep 2023
-
moodle.lynchburg.edu moodle.lynchburg.edu
-
they do not know enough about the topic at hand or because, they say, theysimply are not “smart enough.”
I find myself saying this sometimes too
-
BLEND THE AUTHOR’S WORDS WITH YOUR OWN
Important to use a quotation for the essay we have to write
-
VERBS FOR MAKING A CLAIM
This will be very useful for writing my essays
-
his ability to enter complex, many-sided conversations has taken on aspecial urgency in today’s polarized red state / blue state America,
Politics in the United States
-
Letter from Birmingham Jail,
Learned in AP Gov, and important document in the Civil Rights Movement
-
f you have been taught to write atraditional five-paragraph essay, for example, you have learned how todevelop a thesis and support it with evidence.
What I was taught throughout my high school years
-
Less experiencedwriters, by contrast, are often unfamiliar with these basic moves and unsurehow to make them in their own writing.
More reading done means more experience, and better writing.
-
STATE YOUR OWN IDEAS AS A RESPONSE TOOTHERS
I think this is a good way for essays to be written, and it seems like I write a lot of essays that require this format.
-
once you mastered it you no longerhad to give much conscious thought to the various moves that go into doingit.
This is very true in my life, a lot of things come as second nature such as brushing my teeth and driving.
-
-
moodle.lynchburg.edu moodle.lynchburg.eduUntitled1
-
“Why don’tyou do something about it?’
I think this goes to a lot of things in life. A lot of people say this and say that, but none of them ever do anything
-
-
moodle.lynchburg.edu moodle.lynchburg.edu
-
for their knowledge of theirown ignorance.
Rare ability to be aware of being ignorant, and we all struggle with it.
-
The father was a quiet, simple soul, calmly ignorant, with no touch of vulgarity. The mother wasdifferent,—strong, bustling, and energetic, with a quick, restless tongue, and an ambition to live“like folks.”
This is a very similar way to my household, but I know this is not the norm in most.
-
-
moodle.lynchburg.edu moodle.lynchburg.edu
-
it is easier to do ill than well in the world
I think this relates to everyone's life. It's a lot harder to do the right that can help the world than it is to the easy thing that hurts the world thing sometimes.
-
- Jun 2023
-
-
We use the same model and architecture as GPT-2
What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
-
- May 2023
-
developers.google.com developers.google.com
- Apr 2023
-
towardsdatascience.com towardsdatascience.com
-
Now we are getting somewhere. At this point, we also see that the dimensions of W and b for each layer are specified by the dimensions of the inputs and the number of nodes in each layer. Let’s clean up the above diagram by not labeling every w and b value individually.
-
-
machinelearningmastery.com machinelearningmastery.com
-
The Delta Method, from the field of nonlinear regression. The Bayesian Method, from Bayesian modeling and statistics. The Mean-Variance Estimation Method, using estimated statistics. The Bootstrap Method, using data resampling and developing an ensemble of models.
Four methods to compute prediction intervals.
-
-
www.sciencedirect.com www.sciencedirect.com
-
A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.
Prediction intervals using quantiles. Use clustering.
-
- Feb 2023
-
arxiv.org arxiv.org
-
the Elhage et al.(2021) study showing an information-copying role for self-attention.
It turns out Meng does refer to induction heads, just not by name.
Tags
Annotators
URL
-
- Jan 2023
-
ar5iv.labs.arxiv.org ar5iv.labs.arxiv.org
-
This input embedding is the initial value of the residual stream, which all attention layers and MLPs read from and write to.
-
-
www.cs.toronto.edu www.cs.toronto.edu
-
e twoareas in which the forward-forward algorithm may be superior to backpropagation are as a model oflearning in cortex and as a way of making use of very low-power analog hardware without resortingto reinforcement learning(Jabri and Flower, 1992).
Tags
Annotators
URL
-
- Dec 2022
-
rewriting.csail.mit.edu rewriting.csail.mit.edu
-
Our method is based on the hypothesis that the weights of a generator act as Optimal Linear Associative Memory (OLAM). OLAM is a classic single-layer neural data structure for memorizing associations that was described by Teuvo Kohonen and James A Anderson (independently) in the 1970s. In our case, we hypothesize that within a large modern multilayer convolutional network, the each individual layer plays the role of an OLAM that stores a set of rules that associates keys, which denote meaningful context, with values, which determine output.
-
-
www.zhihu.com www.zhihu.com
-
OCaml 语言能做些什么?
Tags
Annotators
URL
-
-
www.technologyreview.com www.technologyreview.com
-
AI training data is filled with racist stereotypes, pornography, and explicit images of rape, researchers Abeba Birhane, Vinay Uday Prabhu, and Emmanuel Kahembwe found after analyzing a data set similar to the one used to build Stable Diffusion.
That is horrifying. You'd think that authors would attempt to remove or filter this kind of material. There are, after all models out there that are trained to find it. It makes me wonder what awful stuff is in the GPT-3 dataset too.
-
-
arxiv.org arxiv.org
-
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B),Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatlyfacilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher
By using more data on a smaller language model the authors were able to achieve better performance than with the larger models - this reduces the cost of using the model for inference.
Tags
Annotators
URL
-
- Nov 2022
-
steadyhq.com steadyhq.com
-
Kuratierungs-Filter auf Empfängerseite gibt, aber dann wäre auch e-mail-Spam als Problem gelöst und das sehe ich gerade noch nicht passieren.
gibt es projekte, die Modelle auf gesammelte spam mails trainieren?
-
-
community.interledger.org community.interledger.org
-
🌟 Highlight words as they are spoken (karaoke anybody?). 🌟 Navigate video by clicking on words. 🌟 Share snippets of text (with video attached!). 🌟 Repurpose by remixing using the text as a base and reference.
If I understand it correctly, with hyperaudio, one can also create transcription to somebody else's video or audio when embedded.
In that case, if you add to hyperaudio the annotation capablity of hypothes.is or docdrop, the vision outlined in the article on Global Knowledge Graph is already a reality.
Tags
- open
- graph
- docdrop
- repurposing
- global
- sharing
- translate
- annotation
- speech
- captions
- creative
- learning
- open source
- audio
- ML
- navigation
- wordpress
- hyperaudio
- language
- speech to text
- commons
- mobile
- video
- plugin
- conference
- lite
- speech2text
- roam
- translation
- monetization
- transcript
- timing
- knowledge
- simultaneous
- remixing
- web monetization
- interactive
Annotators
URL
-
-
www.exponentialview.co www.exponentialview.co
-
“The metaphor is that the machine understands what I’m saying and so I’m going to interpret the machine’s responses in that context.”
Interesting metaphor for why humans are happy to trust outputs from generative models
-
-
postgresml.org postgresml.org
-
Scaling PostgresML to 1 Million Requests per Second
-
- Sep 2022
-
transformer-circuits.pub transformer-circuits.pub
-
Consider a toy model where we train an embedding of five features of varying importanceWhere “importance” is a scalar multiplier on mean squared error loss. in two dimensions, add a ReLU afterwards for filtering, and vary the sparsity of the features.
Tags
Annotators
URL
-
-
moodle.lynchburg.edu moodle.lynchburg.edu
-
The present generation of Southerners are not responsible for the past
We can't judge or blame people based off of their ancestors' actions. In high school, I always hated that everyone knew my older siblings because it often felt like my future was already written for me even though I had not even experienced it myself yet.
-
Haytian revolt
We briefly touched on this in Traditions/Revolutions, and I know we will learn more about it later on in the course.
-
his educational programme was un-necessarily narrow.
When I was first annotating "The Education of the Negro," I also found Washington's idea of teaching industrial education singularly focused. However, towards the end of his article he made me come around to the idea because it seemed like a good way to instill a desire in students to work for themselves instead of someone else.
-
the Free Negroes from 1830 up to war-time hadstriven to build industrial schools, and the American Missionary Associ-ation had from the first taught various trades; and Price and others hadsought a way of honorable alliance with the best of the Southerners. ButMr. Washington first indissolubly linked these things; he put enthusiasm,unlimited energy, and perfect faith into his programme, and changed itfrom a by-path into a veritable Way of Life
ML: He was nor the first to come up with the idea obviously but he put a face on it. It seems like people myself included have a much easier time following something if there is a person in charge of it for them to follow.
-
-
moodle.lynchburg.edu moodle.lynchburg.edu
-
Our schools teach everybody a little of almosteverything, but, in my opinion, they teach very fewchildren just what they ought to know in order tomake their way successfully in life. They do not putinto their hands the tools they are best tted to use,and hence so many failures. Many a mother andsister have worked and slaved, living upon scantyfood, in order to give a son and brother a ’liberaleducation,’ and in doing this have built up a barrierbetween the boy and the work he was tted to do.Let me say to you that all honest work is honorablework. If the labor is manual, and seems common,you will have all the more chance to be thinking ofother things, or of work that is higher and bringsbetter pay, and to work out in your minds betterand higher duties and responsibilities foryourselves, and for thinking of ways by which youcan help others as well as yourselves, and bringthem up to your own higher level.
I still see this in our school systems today, especially in certain classes where you feel like you are never going to use anything that you have learned in the real world.
-
Our schools teach everybody a little of almosteverything, but, in my opinion, they teach very fewchildren just what they ought to know in order tomake their way successfully in life. They do not putinto their hands the tools they are best tted to use,and hence so many failures. Many a mother andsister have worked and slaved, living upon scantyfood, in order to give a son and brother a ’liberaleducation,’ and in doing this have built up a barrierbetween the boy and the work he was tted to do.Let me say to you that all honest work is honorablework. If the labor is manual, and seems common,you will have all the more chance to be thinking ofother things, or of work that is higher and bringsbetter pay, and to work out in your minds betterand higher duties and responsibilities foryourselves, and for thinking of ways by which youcan help others as well as yourselves, and bringthem up to your own higher level.
I still see this in our school systems today, especially in certain classes where you feel like you are never going to use anything that you have learned in the real world.
-
Our schools teach everybody a little of almosteverything, but, in my opinion, they teach very fewchildren just what they ought to know in order tomake their way successfully in life.
When I was in high school, my mom would always say that they don't teach us some of the most important life skills in class. She was always ranting about how we should have to take a finance class to prepare for adulthood.
-
“Our schools teach everybody a little of almosteverything, but, in my opinion, they teach very fewchildren just what they ought to know in order tomake their way successfully in life.
This is still accurate for schools today. For example, in middle school we had 8 classes a day for 45 minutes each for one semester. Even though we had class everyday it was far too little of time to actually learn a full subject. The teacher had to just give us a little bit of information on each topic we were supposed to cover.
-
-
moodle.lynchburg.edu moodle.lynchburg.edu
-
Uncle Bird had a small, rough farm, all woods and hills, miles from the big road; but he was fullof tales
My uncles are also full of tales that they like to share with everyone they have the chance to.
-
willow
I named my Jeep Willow.
-
-
pyimagesearch.com pyimagesearch.com
-
Now, the progression of NLP, as discussed, tells a story. We begin with tokens and then build representations of these tokens. We use these representations to find similarities between tokens and embed them in a high-dimensional space. The same embeddings are also passed into sequential models that can process sequential data. Those models are used to build context and, through an ingenious way, attend to parts of the input sentence that are useful to the output sentence in translation.
-
Data, matrix multiplications, repeated and scaled with non-linear switches. Maybe that simplifies things a lot, but even today, most architectures boil down to these principles. Even the most complex systems, ideas, and papers can be boiled down to just that:
-
- Aug 2022
-
-
Summarization of Methods for Smart Contract Vulnerabilities Detection
great reference table for SC vulenrabilities detection
-
-
towardsdatascience.com towardsdatascience.com
-
graphs
graph 深度学习动态
-
- Jul 2022
-
blogs.microsoft.com blogs.microsoft.com
-
Z-code models to improve common language understanding tasks such as name entity recognition, text summarization, custom text classification and key phrase extraction across its Azure AI services. But this is the first time a company has publicly demonstrated that it can use this new class of Mixture of Experts models to power machine translation products.
this model is what actually z-code is and what makes it special
-
have developed called Z-code, which offer the kind of performance and quality benefits that other large-scale language models have but can be run much more efficiently.
can do the same but much faster
-
- Jun 2022
-
direct.mit.edu direct.mit.edu
-
The dominant idea is one of attention, by which a representation at a position is computed as a weighted combination of representations from other positions. A common self-supervision objective in a transformer model is to mask out occasional words in a text. The model works out what word used to be there. It does this by calculating from each word position (including mask positions) vectors that represent a query, key, and value at that position. The query at a position is compared with the value at every position to calculate how much attention to pay to each position; based on this, a weighted average of the values at all positions is calculated. This operation is repeated many times at each level of the transformer neural net, and the resulting value is further manipulated through a fully connected neural net layer and through use of normalization layers and residual connections to produce a new vector for each word. This whole process is repeated many times, giving extra layers of depth to the transformer neural net. At the end, the representation above a mask position should capture the word that was there in the original text: for instance, committee as illustrated in Figure 1.
-
-
e2eml.school e2eml.school
-
This trick of using a one-hot vector to pull out a particular row of a matrix is at the core of how transformers work.
Matrix multiplication as table lookup
Tags
Annotators
URL
-
- May 2022
-
www.pnas.org www.pnas.org
-
Given the complexities of the brain’s structure and the functions it performs, any one of these models is surely oversimplified and ultimately wrong—at best, an approximation of some aspects of what the brain does. However, some models are less wrong than others, and consistent trends in performance across models can reveal not just which model best fits the brain but also which properties of a model underlie its fit to the brain, thus yielding critical insights that transcend what any single model can tell us.
Tags
Annotators
URL
-
-
www.gwern.net www.gwern.net
-
Such a highly non-linear problem would clearly benefitfrom the computational power of many layers. Unfortu-nately, back-propagation learning generally slows downby an order of magnitude every time a layer is added toa network.
The problem in 1988
-
-
colab.research.google.com colab.research.google.com
-
The source sequence will be pass to the TransformerEncoder, which will produce a new representation of it. This new representation will then be passed to the TransformerDecoder, together with the target sequence so far (target words 0 to N). The TransformerDecoder will then seek to predict the next words in the target sequence (N+1 and beyond).
-
- Apr 2022
-
-
Ourpre-trained network is nearly identical to the “AlexNet”architecture (Krizhevsky et al., 2012), but with local re-ponse normalization layers after pooling layers following(Jia et al., 2014). It was trained with the Caffe frameworkon the ImageNet 2012 dataset (Deng et al., 2009)
Tags
Annotators
URL
-
-
cs231n.github.io cs231n.github.io
-
Example 1. For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5*5*3 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume. Example 2. Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3*3*20 = 180 connections to the input volume. Notice that, again, the connectivity is local in 2D space (e.g. 3x3), but full along the input depth (20).
These two examples are the first two layers of Andrej Karpathy's wonderful working ConvNetJS CIFAR-10 demo here
-
-
cs.stanford.edu cs.stanford.edu
-
input (32x32x3)max activation: 0.5, min: -0.5max gradient: 1.08696, min: -1.53051Activations:Activation Gradients:Weights:Weight Gradients:conv (32x32x16)filter size 5x5x3, stride 1max activation: 3.75919, min: -4.48241max gradient: 0.36571, min: -0.33032parameters: 16x5x5x3+16 = 1216
The dimensions of these first two layers are explained here
-
-
codelabs.developers.google.com codelabs.developers.google.com
-
Here the lower level layers are frozen and are not trained, only the new classification head will update itself to learn from the features provided from the pre-trained chopped up model on the left.
-
-
distill.pub distill.pub
-
Starting from random noise, we optimize an image to activate a particular neuron (layer mixed4a, unit 11).
And then we use that image as a kind of variable name to refer to the neuron in a way that more helpful than the the layer number and neuron index within the layer. This explanation is via one of Chris Olah's YouTube videos (https://www.youtube.com/watch?v=gXsKyZ_Y_i8)
-
- Mar 2022
-
quillette.com quillette.com
-
A special quality of humans, not shared by evolution or, as yet, by machines, is our ability to recognize gaps in our understanding and to take joy in the process of filling them in. It is a beautiful thing to experience the mysterious, and powerful, too.
-
- Feb 2022
-
www.sigs-datacom.de www.sigs-datacom.de
-
Verfahren des Relational Machine Learning, welche unter Ausnutzung der Graphstruktur in vielen Fällen Modelle besserer Qualität liefern.
Rleational Machine Learning-Ansatz
-
In vielen Anwendungen ist es allerdings notwendig, Daten nicht nur in hoher Qualität und semantisch angereichert zur Verfügung zu stellen, sondern neues Wissen aus vorhandenen Informationen zu generieren. Hierfür nutzen wir Machine Learning.
Kombination mit ML-Anästze zur Generierung von neuem Wissen
-
-
neuralnetworksanddeeplearning.com neuralnetworksanddeeplearning.com
-
Somewhat confusingly, and for historical reasons, such multiple layer networks are sometimes called multilayer perceptrons or MLPs, despite being made up of sigmoid neurons, not perceptrons. I'm not going to use the MLP terminology in this book, since I think it's confusing, but wanted to warn you of its existence.
Tags
Annotators
URL
-
-
docs.microsoft.com docs.microsoft.com
-
Model deployment in Azure ML
-
- Dec 2021
-
cloud.google.com cloud.google.com
-
the only thing an artificial neuron can do: classify a data point into one of two kinds by examining input values with weights and bias.
How does this relate to "weighted sum shows similarity between the weights and the inputs"?
-
-
towardsdatascience.com towardsdatascience.com
-
The transformer model introduces the idea of instead of adding another complex mechanism (attention) to an already complex Seq2Seq model; we can simplify the solution by forgetting about everything else and just focusing on attention.
-
-
-
I’m particularly interested in two questions: First, just how weird is machine learning? Second, what sorts of choices do developers make as they shape a project?
-
- Nov 2021
-
www.cell.com www.cell.com
-
ey use local computations to interpolate over task-rele-vant manifolds in a high-dimensional parameter space.
Tags
Annotators
URL
-
-
e2eml.school e2eml.school
-
Now that we've made peace with the concepts of projections (matrix multiplications)
Projections are matrix multiplications.Why didn't you sayso? spatial and channel projections in the gated gmlp
-
Computers are especially good at matrix multiplications. There is an entire industry around building computer hardware specifically for fast matrix multiplications. Any computation that can be expressed as a matrix multiplication can be made shockingly efficient.
-
The selective-second-order-with-skips model is a useful way to think about what transformers do, at least in the decoder side. It captures, to a first approximation, what generative language models like OpenAI's GPT-3 are doing.
-
-
www.tensorflow.org www.tensorflow.org
-
You'll use a (70%, 20%, 10%) split for the training, validation, and test sets. Note the data is not being randomly shuffled before splitting. This is for two reasons: It ensures that chopping the data into windows of consecutive samples is still possible. It ensures that the validation/test results are more realistic, being evaluated on the data collected after the model was trained.
Train, Validation, Test: 0.7, 0.2, 0.1
-
-
distill.pub distill.pub
-
The following figure presents a simple functional diagram of the neural network we will use throughout the article. The neural network is a sequence of linear (both convolutional A convolution calculates weighted sums of regions in the input. In neural networks, the learnable weights in convolutional layers are referred to as the kernel. For example Image credit to https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9. See also Convolution arithmetic. and fully-connected A fully-connected layer computes output neurons as weighted sum of input neurons. In matrix form, it is a matrix that linearly transforms the input vector into the output vector. ), max-pooling, and ReLU First introduced by Nair and Hinton, ReLU calculates f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x) for each entry in a vector input. Graphically, it is a hinge at the origin: Image credit to https://pytorch.org/docs/stable/nn.html#relu layers, culminating in a softmax Softmax function calculates S(yi)=eyiΣj=1NeyjS(y_i)=\frac{e^{y_i}}{\Sigma_{j=1}^{N} e^{y_j}}S(yi)=Σj=1Neyjeyi for each entry (yiy_iyi) in a vector input (yyy). For example, Image credit to https://ljvmiranda921.github.io/notebook/2017/08/13/softmax-and-the-negative-log-likelihood/ layer.
This is a great visualization of MNIST hidden layers.
Tags
Annotators
URL
-
-
towardsdatascience.com towardsdatascience.com
-
The Query word can be interpreted as the word for which we are calculating Attention. The Key and Value word is the word to which we are paying attention ie. how relevant is that word to the Query word.
Finally
-
-
www.lesswrong.com www.lesswrong.com
-
Other work on interpreting transformer internals has focused mostly on what the attention is looking at. The logit lens focuses on what GPT "believes" after each step of processing, rather than how it updates that belief inside the step.
-
-
distill.pub distill.pub
-
The cube of activations that a neural network for computer vision develops at each hidden layer. Different slices of the cube allow us to target the activations of individual neurons, spatial positions, or channels.
This is first explanation of
Tags
Annotators
URL
-
-
towardsdatascience.com towardsdatascience.com
-
The attention layer (W in the diagram) computes three vectors based on the input, termed key, query, and value.
Could you be more specific?
-
Attention is a means of selectively weighting different elements in input data, so that they will have an adjusted impact on the hidden states of downstream layers.
-
-
www.pnas.org www.pnas.org
-
These findings provide strong evidence for a classic hypothesis about the computations underlying human language understanding, that the brain’s language system is optimized for predictive processing in the service of meaning extraction
Tags
Annotators
URL
-
-
towardsdatascience.com towardsdatascience.com
-
To review, the Forget gate decides what is relevant to keep from prior steps. The input gate decides what information is relevant to add from the current step. The output gate determines what the next hidden state should be.Code DemoFor those of you who understand better through seeing the code, here is an example using python pseudo code.
-
- Oct 2021
-
colah.github.io colah.github.io
-
This approach, visualizing high-dimensional representations using dimensionality reduction, is an extremely broadly applicable technique for inspecting models in deep learning.
-
These layers warp and reshape the data to make it easier to classify.
-
-
cloud.google.com cloud.google.com
-
Even with this very primitive single neuron, you can achieve 90% accuracy when recognizing a handwritten text image1. To recognize all the digits from 0 to 9, you would need just ten neurons to recognize them with 92% accuracy.
And here is a Google Colab notebook that demonstrates that
-
- Sep 2021
-
-
Humans perform a version of this task when interpretinghard-to-understand speech, such as an accent which is particularlyfast or slurred, or a sentence in a language we do not know verywell—we do not necessarily hear every single word that is said,but we pick up on salient key words and contextualize the rest tounderstand the sentence.
Boy, don't they
Tags
Annotators
URL
-
-
www.ccom.ucsd.edu www.ccom.ucsd.edu
-
A neural network will predict your digit in the blue square above. Your image is 784 pixels (= 28 rows by 28 columns with black=1 and white=0). Those 784 features get fed into a 3 layer neural network; Input:784 - AvgPool:196 - Dense:100 - Softmax:10.
Tags
Annotators
URL
-
-
www.isca-speech.org www.isca-speech.org
-
Personalized ASR models. For each of the 432 participants with disordered speech, we create a personalized ASR model (SI-2) from their own recordings. Our fine-tuning procedure was optimized for our adaptation process, where we only have between ¼ and 2 h of data per speaker. We found that updating only the first five encoder layers (versus the complete model) worked best and successfully prevented overfitting [10]
-
-
jalammar.github.io jalammar.github.io
-
So whenever you hear of someone “training” a neural network, it just means finding the weights we use to calculate the prediction.
-
- Aug 2021
-
stats.stackexchange.com stats.stackexchange.com
-
I'm going to try provide an English text example. The following is based solely on my intuitive understanding of the paper 'Attention is all you need'.
This is also good
-
For the word q that your eyes see in the given sentence, what is the most related word k in the sentence to understand what q is about?
-
So basically: q = the vector representing a word K and V = your memory, thus all the words that have been generated before. Note that K and V can be the same (but don't have to). So what you do with attention is that you take your current query (word in most cases) and look in your memory for similar keys. To come up with a distribution of relevant words, the softmax function is then used.
-
-
ericssonlearning.percipio.com ericssonlearning.percipio.com
-
Here is a list of some open data available online. You can find a more complete list and details of the open data available online in Appendix B.
DataHub (http://datahub.io/dataset)
World Health Organization (http://www.who.int/research/en/)
European Union Open Data Portal (http://open-data.europa.eu/en/data/)
Amazon Web Service public datasets (http://aws.amazon.com/datasets)
Facebook Graph (http://developers.facebook.com/docs/graph-api)
Healthdata.gov (http://www.healthdata.gov)
Google Trends (http://www.google.com/trends/explore)
Google Finance (https://www.google.com/finance)
Google Books Ngrams (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)
Machine Learning Repository (http://archive.ics.uci.edu/ml/)
As an idea of open data sources available online, you can look at the LOD cloud diagram (http://lod-cloud.net ), which displays the connections of the data link among several open data sources currently available on the network (see Figure 1-3).
-
-
colah.github.io colah.github.io
-
A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem. It’s true, essentially, because the hidden layer can be used as a lookup table.
-
Recursive Neural Networks
-
-
arxiv.org arxiv.org
Tags
Annotators
URL
-
-
mccormickml.com mccormickml.com
-
The second-to-last layer is what Han settled on as a reasonable sweet-spot.
Pretty arbitrary choice
-
-
arxiv.org arxiv.org
-
We show that BigBird is a universal approximator of sequence functions and is Turing complete,
Tags
Annotators
URL
-
- Jul 2021
-
www.codemotion.com www.codemotion.com
-
hyper-parameters, i.e., parameters external to the model, such as the learning rate, the batch size, the number of epochs.
-
-
jalammar.github.io jalammar.github.io
-
In the language of Interpretable Machine Learning (IML) literature like Molnar et al.[20], input saliency is a method that explains individual predictions.
Tags
Annotators
URL
-
-
colah.github.io colah.github.io
-
Using multiple copies of a neuron in different places is the neural network equivalent of using functions. Because there is less to learn, the model learns more quickly and learns a better model. This technique – the technical name for it is ‘weight tying’ – is essential to the phenomenal results we’ve recently seen from deep learning.
Tags
Annotators
URL
-
-
www.baeldung.com www.baeldung.com
-
Vectors with a small Euclidean distance from one another are located in the same region of a vector space. Vectors with a high cosine similarity are located in the same general direction from the origin.
-
-
iamtrask.github.io iamtrask.github.io
-
If you're serious about neural networks, I have one recommendation. Try to rebuild this network from memory.
-
If you're serious about neural networks, I have one recommendation. Try to rebuild this network from memory.
-
-
mlech26l.github.io mlech26l.github.io
-
In our research, i.e., the wormnet project, we try to build machine learning models motivated by the C. elegans nervous system. By doing so, we have to pay a cost, as we constrain ourselves to such models in contrast to standard artificial neural networks, whose modeling space is purely constraint by memory and compute limitations. However, there are potentially some advantages and benefits we gain. Our objective is to better understand what’s necessary for effective neural information processing to emerge.
Tags
Annotators
URL
-
-
aylien.com aylien.com
-
Recommendations DON'T use shifted PPMI with SVD. DON'T use SVD "correctly", i.e. without eigenvector weighting (performance drops 15 points compared to with eigenvalue weighting with (p = 0.5)). DO use PPMI and SVD with short contexts (window size of (2)). DO use many negative samples with SGNS. DO always use context distribution smoothing (raise unigram distribution to the power of (lpha = 0.75)) for all methods. DO use SGNS as a baseline (robust, fast and cheap to train). DO try adding context vectors in SGNS and GloVe.
-
- Jun 2021
-
towardsdatascience.com towardsdatascience.com
-
2D Vectors in space. Image by Author
A good image for cosine similarity.
-
-
www.incompleteideas.net www.incompleteideas.net
-
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning
This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
-
-
cloud.google.com cloud.google.com
-
"dividing n-dimensional space with a hyperplane."
-
This dataset can not be classified by a single neuron, as the two groups of data points can't be divided by a single line.
-
- Apr 2021
-
-
Machine learning app development has been gaining traction among companies from all over the world. When dealing with this part of machine learning application development, you need to remember that machine learning can recognize only the patterns it has seen before. Therefore, the data is crucial for your objectives. If you’ve ever wondered how to build a machine learning app, this article will answer your question.
-
-
towardsdatascience.com towardsdatascience.com
-
Machine learning is an extension of linear regression in a few ways. Firstly is that modern ML
Machine learning is an extension to linear model which deals with much more complicated situation where we take few different inputs and get outputs.
-
-
www.infoq.com www.infoq.com
-
survival prediction of colorectal cancer is formulated as a multi-class classification problem
-
- Nov 2020
-
blog.csdn.net blog.csdn.net
-
可以认为 π k \pi_k πk就是每个分量 N ( x ∣ μ k , Σ k ) \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k) N(x∣μk,Σk)的权重。
有的书称为责任
Tags
Annotators
URL
-
- Oct 2020
-
-
LEGO
作者将深度学习比作乐高
-
-
-
Data Augmentation
常用来增加数据量
-
- May 2020
-
www.javatpoint.com www.javatpoint.com
-
Machine learning has a limited scope
-
AI is a bigger concept to create intelligent machines that can simulate human thinking capability and behavior, whereas, machine learning is an application or subset of AI that allows machines to learn from data without being programmed explicitly
-
-
expertsystem.com expertsystem.com
-
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed
Tags
Annotators
URL
-
- Apr 2020
-
keras.io keras.io
-
Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. Use Keras if you need a deep learning library that: Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility). Supports both convolutional networks and recurrent networks, as well as combinations of the two. Runs seamlessly on CPU and GPU. Read the documentation at Keras.io. Keras is compatible with: Python 2.7-3.6.
-
- Jan 2020
-
pubs.aeaweb.org pubs.aeaweb.org
-
Suppose the algorithm chooses a tree that splits on education but not on age. Conditional on this tree, the estimated coefficients are consistent. But that does not imply that treatment effects do not also vary by age, as education may well covary with age; on other draws of the data, in fact, the same procedure could have chosen a tree that split on age instead
a caveat
-
hese heterogenous treatment effects can be used to assign treatments; Misra and Dubé (2016) illustrate this on the problem of price targeting, applying Bayesian regularized methods to a large-scale experiment where prices were randomly assigned
todo -- look into the implication for treatment assignment with heterogeneity
-
Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) take care of high-dimensional controls in treatment effect estimation by solving two simultaneous prediction problems, one in the outcome and one in the treatment equation.
this seems similar to my idea of regularizing on only a subset of the variables
-
These same techniques applied here result in split-sample instrumental variables (Angrist and Krueger 1995) and “jackknife” instrumental variables
some classical solutions to IV bias are akin to ML solutions
-
Understood this way, the finite-sample biases in instrumental variables are a consequence of overfitting.
traditional 'finite sample bias of IV' is really overfitting
-
Even when we are interested in a parameter β ˆ, the tool we use to recover that parameter may contain (often implicitly) a prediction component. Take the case of linear instrumental variables understood as a two-stage procedure: first regress x = γ′z + δ on the instrument z, then regress y = β′x + ε on the fitted values x ˆ. The first stage is typically handled as an estimation step. But this is effectively a prediction task: only the predictions x ˆ enter the second stage; the coefficients in the first stage are merely a means to these fitted values.
first stage of IV -- handled as an estimation problem, but really it's a prediction problem!
-