Since we are using a generalized linear model for prediction, a linear mapping is enough to allow us to see how we are learning representations around the decision boundary. Since these are high dimensional vectors, we reduce them to 2D via principal component analysis
Yeah baby