26 Matching Annotations
  1. Apr 2024
    1. The expectation over a sample is interpreted as a sample estimate of the mean, mV:=E[V∣S]=∫SVdPrV. Where the sample variance is, sV:=E[V2|S]−E[V|SS2]2. Sample estimates of parameters will always be denoted as English letters. The covariance between two random variables X and Y is the product of the differences from the mean for each respective variable and can be expressed as, σXY:=E[XY]−E[X]E[Y] The sample covariance can similarly defined using conditional expectations,

      I don't think this is quite correct as written. These are sample quantities, so the integrals are going to be with respect to the "empirical" measure, not the "population" measure on V.

    2. ed similarly, PrXY(X,Y∈B)=∬B⊂R2fXY(x,y)dxdy.

      Same problem here. B is not an arbitrary Borel set of R^2. It's a subset of the power set of {0,1} cross the Borel sets of R.

      Actually, I guess you can get away with just taking a Borel set in R^2, but that might be confusing for people since most of them are "silly" in this context. I don't know, something to think about.

      This is actually what I did with RVVMs. Since the power set on {0,1} is a sub-sigma-algebra of the Borel sets on R, you can still define a Bernoulli measure on the Borel sets on R. It's just that it's overkill. But the convenience is that you can just always talk about Borel sets then and not have to switch sigma-algebras. You let the distributions or random variables inform which Borel sets are actually meaningful. Pros and cons to each approach here.

    3. fGY that satisfies, PrXY(X,Y∈R2)=∬R2fXY(x,y)dxdy=1. PrGY(G,Y∈R2)=∬R2fGY(g,y)dgdy=1. The probability of

      This second equation isn't correct. You have written double Lebesgue/Riemann integrals, but G is not a continuous random variable with a PDF. You can write Pr_{GY} as a sum over G \in {0,1} of an integral over Y in R (and note that the integrand is not a PDF). Or, since you move to distributional integration notation in the next subsection, this may be a good time to properly introduce it. You can just write Pr_{GY} as an integral against dPr_{GY} with integrand 1. You can define this then as the "sum-integral", i.e., double integral with respect to a product measure that does not have a PDF or PMF.

    4. joint probability measure to the real-valued product space, (X,Y):(Ω,FΩ,PrXY)→(R2,BR2).

      Do you want products in the domain of these functions as well? If not, then you probably want to point out to the reader that since Omega is arbitrary, it is necessarily different than what it was in the previous subsections when you were using it as a sample space for a single random variable.

    5. distribution

      omit. More precisely, repeat in words what you say in the symbols: the probability that G is 0 or 1 can be explicitly written out as....

    6. Measurement errors for Bernoulli random variables are usually referred to as misclassifications (see chapter on group misclassification).

      It may be worthwhile to add a line or two here pointing out that for Bernoulli errors there will be mathematically necessary dependence between the G and G-tilde, but we will usually assume independence between X and X-tilde and Y and Y-tilde.

    7. Figure 3.4: Measurement error. Each of the components of Equation 3.1 shown with respect to the real line.

      Could make the figure clearer if you add grey dotted lines from the Y-tildes down to the R-axis.

    8. we will assume that the dependent variable of interest Y stays constant across all outcomes for an individual such that

      Point out that this is the assumption from CTT? And that it is often criticized?

    9. ritten as Y~(ℓ−1(ψ)).

      This is where that last piece is needed. If Y-tilde isn't measurable with respect to the same sigma-algebra that Y is, then no guarantee that this fibre will be a measurable set.

    10. measurement Y~. We can specify an algebraic structure relating the proxy Y~ to the true value Y, (3.1)Y~=Y+EY Where EY is the measurement error term defined as the difference between the error-prone proxy and the true value EY=Y~−Y

      You need additional structure here; in particular, that at least one of Y-tilde or E^Y is appropriately measurable. It is possible otherwise to have this algebraic relation hold pointwise; i.e., the sum of two random variables can be measurable in a different sense than either of the summands. Probably simplest here to just say that all the terms are assumed to be measurable with respect to the same sigma-algebra.

    11. closed under countably many set operations

      This is vague. What set operations? You can be explicit and list the axioms, and then give a bit of intuition for why they are reasonable (e.g., so that we can always talk about combining sets and seeing what they have in common or not), or you can just not bother with being explicit about the axioms and instead say that they are closed under some simple set operations so that we can talk about combining sets and seeing what they have in common, etc.

    12. The set Ω is the sample space (i.e., the universe of possible outcomes) where each ω∈Ω is sample unit. The σ-field FΩ is a collection of measurable subsets of Ω that is closed under countably many set operations. Let (Ψ,FΨ) also be a measurable space where the set Ψ is the population of interest (i.e., the target population) and each ψ∈Ψ is an experimental object of study such as an individual person or animal

      I might consider reversing the order of these items. It is much more natural for people to think of a "sample space" as the set of experimental objects. So start with that. Then, with that notation and concept established, introduce the "universe of possible outcomes" sample space.

      Aside: I know it's common to write, but I've never liked the language "universe of possible outcomes." It can be misleading. This comes from my physics training though. Really, the sample space Omega is "possibility space" or "all parallel worlds", etc. It's just a set, but there isn't any distribution or function attached to it yet, so it's always been confusing to me to talk about "outcomes" - outcomes of what? You're likely better off just leaving the language as is, especially since it's so common, but I just wanted to point out the ambiguity.