- Mar 2025
proceedings.mlr.press proceedings.mlr.press
Examples of mistakes where we can use attention to gain intuition into what the model saw.
Perhaps the best use of this approach is for looking for mistakes or understanding why a model does badly on certain data instances.
- Aug 2023
arxiv.org arxiv.org
Title: Delays, Detours, and Forks in the Road: Latent State Models of Training Dynamics Authors: Michael Y. Hu1 Angelica Chen1 Naomi Saphra1 Kyunghyun Cho Note: This paper seems cool, using older interpretable machine learning models, graphical models to understand what is going on inside a deep neural network
- Feb 2023
clementneo.com clementneo.com
The code to reproduce our results can be found here.
- Jan 2023
ar5iv.labs.arxiv.org ar5iv.labs.arxiv.org
This input embedding is the initial value of the residual stream, which all attention layers and MLPs read from and write to.
- Apr 2022
distill.pub distill.pub
Starting from random noise, we optimize an image to activate a particular neuron (layer mixed4a, unit 11).
And then we use that image as a kind of variable name to refer to the neuron in a way that more helpful than the the layer number and neuron index within the layer. This explanation is via one of Chris Olah's YouTube videos (https://www.youtube.com/watch?v=gXsKyZ_Y_i8)
- Jun 2020
psyarxiv.com psyarxiv.com
Moreau, David, and Kristina Wiebels. ‘Assessing Change in Intervention Research: The Benefits of Composite Outcomes’, 2 June 2020. https://doi.org/10.31234/osf.io/t9hw3.
- Jun 2019
towardsdatascience.com towardsdatascience.com
To interpret a model, we require the following insights :Features in the model which are most important.For any single prediction from a model, the effect of each feature in the data on that particular prediction.Effect of each feature over a large number of possible predictions
Machine learning interpretability