12 Matching Annotations
  1. Nov 2021
    1. (I don’t know what ‘kriging’ means; does anyone else understand it?)

      so my understanding is that it's a general term for when you model both the mean (w/ something more than just an intercept) and covariance terms of a multivariate normal in a GP setting -- it's usually used in spatial autocorrelation models but I've heard it used for temporal autocorrelation settings too. AFAIK, though, complicated covariance functions and complicated mean functions are non-identifiable, so you have to pick one or other (some might even say trying to put a trend on the mean is non-identifiable with the the covariance function -- which it sort of is, in that a GP can pick up on a trend in-sample no problem -- but I think if there is a trend identifying it can help tighten up variance, and so is worth trying to include + it's helpful for interpretability).

    2. Can you explain more? And does this allow for flexible autocorrelation? Do we need to ‘test if its a random walk’

      yep, there are lots of possible autocorrelation functions, and you can additively or multiplicatively compose them to represent multiple simultaneous dynamics, e.g. see https://www.cs.toronto.edu/~duvenaud/cookbook/

      here's a nice little graphical primer on GPs that I like: https://distill.pub/2019/visual-exploration-gaussian-processes/ or here for more squiggly animations: http://www.infinitecuriosity.org/vizgp/

      I don't think there's much formal testing, per se, but you can do model comparison with or without different components as 'tests' for their inclusion (and maybe in an initial exploratory data viz setting just plot the output of acf() in R a la https://www.datacamp.com/community/tutorials/autocorrelation-r)

    3. A shock or permanent shift after the WSJ feature need not have been caused by it. The question is ‘how unusual is such a shock, in the context of a long time series of shocks’? And how long is this series, anyways?)

      yup, this would be folded into the dispersion term + inherent variability of the poisson distribution, in this framework. It might be a slight jostle is perfectly consistent with the variance you already see in the poisson (or any extra variance implied by overdispersion)

    4. : Poisson because its ‘arrival of events’. Why a ‘log link’?)

      eh, more poisson bc it's count data, ie non-negative integer data with no (or large / unknown) upper bound, and that's the go-to for that haha. And the log link also bc it's the canonical link function for the rate param (to keep everything positive, since rates are strictly positive). But it is good to think about the sorts of process that can give rise to difft distributions!

  2. Aug 2021
    1. But the major question I have is sort of ‘what approach will work to meaningfully asses the evidence for and against the null hypothesis.’ “Against” – this is what the standard p-value NHST presents. “For the null”… seems like it might work in a setup involving a prior putting positive probability mass on a point, but I’m not quite there yet.

      in a Bayesian framework you're less focused on rejecting a null model by seeing if some test statistic falls in the tails of its sampling distribution under the null, and more comparing different models you've specified, one of which you're free to call a "null" model. So in the end you can say "the alternative model is favored by the data with probability > 0.999" or something, or else just look at the relevant parameter in the alternative model and say 99.999 percent of the probability mass falls above 0

    2. DR: By ‘conventional likelihood’ are you referring to ‘the likelihood of the data given a specific parameter’ … the thing that maximum likelihood procedures will express as a function, and then (take the log and) try to find the highest value of?

      yup! aka also the likelihood term in the numerator of Bayes theorem, \(Pr(X|\theta)\)

    3. But if we were to use a ‘point mass prior only’ (e.g., we put P=1P=1P=1 on B=0B=0B=0) for both H0 and HA and H0 were that B=0B=0B=0 and HA that B≠0B≠0B \neq 0 this would not make sense. The prior must be something such as Pr(B=0)=1/2Pr(B=0)=1/2Pr(B=0)=1/2 Pr(B=x≠0)∼N(0,1)/2Pr(B=x≠0)∼N(0,1)/2Pr(B=x \neq 0) \sim N(0,1)/2 Is that a reasonable way to frame it?

      yeah, you could have a mixture ('spike-and-slab') prior like this, but fitting it would be obnoxious cos you'd need a jump move to move between them. Might be easier to just equivalently fit the two separate models and update discrete uniform model priors, though then you run into issues with marginal likelihood estimation, which can also be tricky!

    4. DR: Can we replace ‘models’ in the statement above with ‘range of parameters given weight in the null and alternative hypotheses?’

      yep, in the sense of having different priors. A model with a point mass prior on 0 isn't much different from a model with e.g. a laplace(0, 10000000) prior or whatever

    5. DR: OK, but how would these be different under H0 and HA… these both ‘use the same prior’ I presume. Is my take above approximately correct? (Can you correct it if not?)

      you'd have different priors / models that you're comparing, here

    6. So with my second interpretation above we have “Marginal likelihood under H0” + “Marginal likelihood under HA” L0+LA=1

      nah, there's no guarantee like that here -- the marginal likelihood is bounded between (0,inf), and the marginal log-likelihood between (-inf, inf). Also you can have as many models as you want here, and if working in the BF world just take the best ML and compare it to the second & third best etc.

    7. perhaps this works if we use a prior that puts some strictly positive probability on the exact effect B=0B=0B=0 (say, pr(B=0)=1/2pr(B=0)=1/2pr(B=0)=1/2) as well as a non-degenerate distribution of probabilities over the other effect sizes (say, normal/2)

      you could do this, but it would seem more straightforward to compare a model with \(Pr(B=0) = 1\) vs.\( Pr(B≠0) = 1\), with the former dropping the parameter from the model specification, and the latter just putting any continuous prior on B, rather than a point mass on some other value

    8. One way of operationalizing the null-hypothesis is by setting a null region, such that an effect that falls within this interval would be practically equivalent to the null (Kruschke, 2010). In our case, that means defining a range of effects we would consider equal to the drug having no effect at all.

      yeah, you can set two different priors and then use BF etc. to compare them -- one that's strongly informative around 0 (or truncated to this "null region"), one that's more weakly informative, and one with a point mass at 0. Alternatively, just use your weakly informative prior and ask how much of the posterior mass falls outside some bounds around 0, or is to one side of 0, etc.

      Should also note that since models are their priors, model comparison can sorta be thought of as prior comparison. But you're in pretty grave danger of overfitting if you use BF to just search through different priors, because the ultimate marginal likelihood will be found when the prior's a point mass on the MLE.