- Mar 2025
-
proceedings.mlr.press proceedings.mlr.press
-
Examples of mistakes where we can use attention to gain intuition into what the model saw.
Perhaps the best use of this approach is for looking for mistakes or understanding why a model does badly on certain data instances.
-
- Aug 2023
-
arxiv.org arxiv.org
-
Title: Delays, Detours, and Forks in the Road: Latent State Models of Training Dynamics Authors: Michael Y. Hu1 Angelica Chen1 Naomi Saphra1 Kyunghyun Cho Note: This paper seems cool, using older interpretable machine learning models, graphical models to understand what is going on inside a deep neural network
-
- Feb 2023
-
clementneo.com clementneo.com
-
The code to reproduce our results can be found here.
-
- Jan 2023
-
ar5iv.labs.arxiv.org ar5iv.labs.arxiv.org
-
This input embedding is the initial value of the residual stream, which all attention layers and MLPs read from and write to.
-
- Apr 2022
-
distill.pub distill.pub
-
Starting from random noise, we optimize an image to activate a particular neuron (layer mixed4a, unit 11).
And then we use that image as a kind of variable name to refer to the neuron in a way that more helpful than the the layer number and neuron index within the layer. This explanation is via one of Chris Olah's YouTube videos (https://www.youtube.com/watch?v=gXsKyZ_Y_i8)
-
- Jun 2020
-
psyarxiv.com psyarxiv.com
-
Moreau, David, and Kristina Wiebels. ‘Assessing Change in Intervention Research: The Benefits of Composite Outcomes’, 2 June 2020. https://doi.org/10.31234/osf.io/t9hw3.
-
- Jun 2019
-
towardsdatascience.com towardsdatascience.com
-
To interpret a model, we require the following insights :Features in the model which are most important.For any single prediction from a model, the effect of each feature in the data on that particular prediction.Effect of each feature over a large number of possible predictions
Machine learning interpretability
-