20 Matching Annotations
  1. Feb 2025
    1. Minor leagues performance. Right now, the model is limited to MLB performance. Adding minor leagues performance is a good indicator of a player’s future performance in the MLB, so spending time on including minor leagues performance would also improve the model. Level adjustment. As we add minor leagues performance, we probably want to condition on strength of competition (batters get worse in the minors, but pitchers do too).

      These are definitely very very important. They are also very very tricky to do right; will need to put a lot of thought into how to handle these!

    2. Contact rate

      For both contact rate and chase rate - understand this, but again, let's please not explore incorporating fixed effects at any point during this exercise until we feel phenomenal about everything else.

    3. Explore a Heckman selection model to account for survivor bias

      I think doing something LIKE the Heckman selection model would be good, but actually think it should be a little different than that model. We can talk a little more about this offline, but doing something in-model to account for survivor bias would be fantastic.

    4. Reduce the RMSE by 52% compared to the naive approach of just using the previous year’s strikeout rates Reduce the log loss by 26% Are roughly equivalent to each other in terms of predictive accuracy

      Assuming we're good to go with the sample size stuff - this is great!!

    5. the simple model is on par with the aging model, but both models are much better than the naive model.

      Great! One big question though.

      Who are you including in this evaluation? Anyone who had at least __ PA in 2023 AND 2024, or just anyone who had at least 1 PA in each?

      If we aren't doing it already, can we re-calculate comparing players who had at least 50 PA in both 2023 and 2024?

    6. the grey triangle and associated error bar – the grey horizontal line – are far from the vertical dashed line, which represents a difference of 0 between the two models

      Can we get something to explain this before we see the plots? I'm also still not 100% sure I understand this, so we can talk about this in person!

    7. Because the model is able to compensate for the lack of flexibility by increasing the observational noise in the likelihood (light blue HDIs).

      Agreed - however, I don't think that the model's predictions are "good" necessarily. I think the model is correctly identifying that there is a TON of uncertainty around these estimates. We want a model that (1) produces better point estimate predictions with (2) smaller posterior variances.

    8. we see that the blue lines and HDIs are too stiff and don’t follow the data well. This is because the latent rate is not flexible enough: the model is lacking informative covariates (e.g contact rate, chase rate, etc.), and because splines are struggling,

      I agree with the fact that this is struggling because the blue lines and HDIs are too stiff. I agree that this is probably because the latent rate is not flexible enough, and because splines are struggling.

      I actually disagree that this is because the model is lacking informative covariates. That would certainly improve predictive power of the model! However, I think this is the least important of the three things that you've diagnosed in this setting. We can (and should be able to) build a perfectly fine projection system WITHOUT fixed effects like chase rate.

      Long term, when we're projecting different outputs, we will definitely want to include fixed effects (i.e. FB velo when projecting FB pitch grade). However, my strong prior is that these are the last improvements we should incorporate because they are (1) the trickiest to do well, and (2) move the needle primarily at the edges.

    9. calibration looks good!

      This doesn't feel like it's horrendous, but it doesn't really feel great either.

      Would be interested in any methods you have in mind for calibrating Bayesian models, beyond something like the typical point estiamte calibrations applied to all posterior samples.

    10. random samples of players:

      Some of the posterior predictive intervals are just crazy wide for some batters. Is this because their sample sizes are so low? Would probably make sense to only show posterior predictive samples for guys who have some reasonable number of PA (maybe at least 50).

    11. This calibration looks very good: no obvious under- or over-fitting, nor clear L-shaped patterns.

      I don't know if I agree with this; it does look like there are some calibration issues. I also imagine the plot would look worse if the x and y axes had the same scales.

      One question - is this run for ALL batter/seasons in the training set? Would be interested in what this looks like if we restrict the population to player/seasons with some sample size threshold (maybe \(PA > 50\). Don't know if that's the right way to evaluate the model, but just something I'm curious about. My prior is that it would make the calibration look even worse, since the model will be more confident about their true talent and their sample size makes the expected noise drop.

      Regardless - we need to think more about what this is telling us. In my mind, it's saying that the model is overconfident. It's estimating true talent too close to the observed values in some cases (too much coverage of low probabilities), and that's likely what's hurting the top end as well (not enough coverage of high probabilities).

    12. the strikeout rate that would be observed if we had infinite plate appearances.

      This is an interesting interpretation. I usually think of it as "the batter's true underlying strikeout rate ability". Not sure if you think that's less appropriate phrasing - interested in your thoughts

    13. The behavior after ~30 years shouldn’t be taken too seriously, as it’s based on a very small sample of players (hence the wide HDI), and is a good illustration of the survivor bias (only the best batters survive to that age). Recall that the model is not adjusting for this bias in any way for now.

      Completely agree that we need to figure out how to adjust for survivor bias.

      I don't agree with the statement:

      "The behavior after ~30 years shouldn't be taken too seriously"

      I get the sentiment - don't take the spline estimates too seriously - but this is where we need to be as accurate as possible. We want to be able to consider long-term contract valuation, and we can't do that without having good projections for batters into their late 30s/early 40s.

    14. With our inferred value of 0.38, about 94% of players will have rates between 14% and 40%, which aligns well with domain knowledge: elite contact hitters like Luis Arraez have ~12% strikeout rates while high-strikeout power hitters like Joey Gallo have rates around 35-40%. Our inferred sigma_batters therefore tells us that learning about all the batters (especially those with lots of plate appearances) is helpful to project other batters’ strikeout rates.

      Really enjoyed this explanation! Thank you for diving into this! Very intuitive.

    15. The rank plots show that the chains have converged well (we’re looking for the chains to be well mixed, and the rank plots to be uniform):

      Could we also see some traceplots here?

      I'm not super familiar with implementing rank plots, and the scales here seem hard to really visualize well. For example, Chain 2 for sigma_matchup actually has some spots that don't look uniform, but I don't have good intuition for what a bad plot looks like.

    16. Interestingly though, the handicap is higher for lefties than righties: lefty batters are more likely to strikeout when facing a lefty pitcher than righty batters are likely to strikeout when facing a righty pitcher. Similarly, opposite-hand matchups tend to work out better for righty batters than lefty batters.

      Agreed that this is interesting - things would probably change a little bit if we could adjust for quality of pitcher.

      Either way though, another interesting thing coming out of here is the difference in platoon effects by batter handedness. The difference between LL/LR is bigger than RR/RL.

    17. Mathematically, the model is:

      I can't quite tell from the formula here. Are you essentially saying, in Wilkinson notation, the following:

      SO ~ (1 | batter) + (1 | matchup)

      or

      SO ~ matchup + (1 | batter)

      Additionally, is there a global intercept in the model? Don't think it's needed but just wanted to confirm.

    Annotators