1 Matching Annotations
  1. Dec 2024
    1. overfitting.

      (TLDR: Can summarise the question I am asking in this post as: Imagine you were emailing a machine learning expert (and expert in deduction) about an ML experiment you had done and to which you suspect you are overfitting. A priori, they know no information about anything you are doing: what is the minimum amount of information they need to be able to respond 'yes you are overfitting')


      Assume she might say this because average test error is higher than average training error, which is typically the case for overfitting models (but we have seen in lectures that this can be the case for underfitting models too i.e. loss vs K (complexity) graph from lecture 5)

      We tell someone the mean train and test errors for one fitted model - and we tell them that the mean test error is higher than the mean training error. Question asks to explain why, from this data, one cannot infer that the model is overfitting.

      This question has highlighted some gaps in my knowledge about what overfitting actually means and what information we would need to tell someone about the results of an experiment for them to be able to correctly deduce 'ah yes, your model is overfitting'

      I have listed my thoughts below in terms of different 'further information' we could tell them (but I am not entirely sure about the answer thus the post):

      The main thoughts I have are: (i) We give them more granular information about the train and test errors (rather than just the means): Can we tell them them the train and test loss for every point. Now they know the variation and mean of your train and test loss - can they deduce overfitting from this? (I don't think so, but not sure - maybe low variation of the train error is indicative of overfitting?)

      (ii) Do we need to tell them about what model we have actually learned (i.e. the parameters we have learned) - not just the errors on training and test sets (intuitively for me this would be sufficient (I think) : if it was me, I would want to draw the function and see 'how wiggly it is' (extrapolating complex patterns that the training data doesn't show) - if that's the case I would say yes it seems your function is overfitting (but maybe I can't actually make that statement)

      (iii) Maybe telling them about just one model isn't actually enough: we have to tell them the results about a less complex model (i.e. models from different experiments) Ff you told them that the test error was lower on a less complex model then I think they can confidently say 'yes your model is overfitting')

      I am a bit confused here so any help would be fantastic