166 Matching Annotations
  1. Mar 2023
    1. I've mastered being trans and all *I've* learned about gender is that it's kind of stupid (sometimes bad stupid, like hegemony, and sometimes good stupid, like monster movies)

      Imogen Binnie

  2. Feb 2023
    1. Spinning is creating an environmen t of increasing innocence. Innocence doesnot consist in simply "not harming." This is the fallacy of ideologies of nonvio-lence. Powerful innocence is seeking and naming the deep mysteries of intercon-nectedness. It is not mere helping, defending, healing, or " preventive medicine ."It must be nothing less than successive acts of transcendence and Gyn/Ecologicalcreation. In this creation, the beginning is not " the Word." The beginning is hear-ing. Hags hear forth new words and new patterns of relating. Such hearing forthis behind, before, and after the phallocratic "creation." [Pp . 413-14

      The innocence of Astraea.

  3. Feb 2022
    1. After fluorescence compensation, some cell populations will have low means and include events with negative data values

      Because compensation attempts to remove the background autofluorescence, if a cell that's negative for the marker the immunofluorescence is binding to and also exhibits less autofluorescence than average, it will have a negative value after compensation (and this population will have a low mean both before and after compensation).

    Annotators

  4. Jan 2022
    1. Log-Normal Distribution of Single Molecule Fluorescence Burstsin Micro/Nano-Fluidic Channels

      answers why distribution of fluorescence in flow cytometry data tends to be lognormal

    1. I shall largely speak ofmice, but my thoughts areon man, on healing, on life and its evolution.Threatening life and evolution are the two deaths,death of the spirit and death of the body. Evolu-tion, in terms ofancient wisdom, is the acquisitionof access to the tree of life. This takes us back tothe white first horse ofthe Apocalypse which withits rider set out to conquer the forces thatthreaten the spirit with death. Further in Revela-tion (ii.7) we note: 'To him who conquers I willgrant to eat the tree' of life, which is in theparadise of God' and further on (Rev. xxii.2):'The leaves of the tree were for the healing ofnations.'This takes us to the fourth horse of theApocalypse (Rev. vi.7): 'I saw ... a pale horse,and its rider's name was Death, and Hadesfollowed him; and they were given power over afourth ofthe earth, to kill with the swordand withfamine and with pestilence and by wild beasts ofthe earth' (italics mine). This second death hasgradually become the predominant concem ofmodern medicine. And yet there is nothing in theearlier history of medicine, or in the preceptsembodied in the Hippocratic Oath, that precludesmedicine from being equally concerned withhealing the spirit, and healing nations, as withhealing the body. Perhaps we might do well toreflect upon another of John's transcriptions(Rev. ii. 1): 'He who conquers shall not be hurtby thesecond death.'

      Wow - I have not read many papers which are so... Biblical.

    1. (Wainwrightet al., 2014

      Shows that GBM relies on immune checkpoint molecules IDO, CTLA-4, and PD-L1. (IDO attracts regulatory T cells; CTLA-4 is expressed by T cells and CD80 on dendritic cells interacts with it during T cell activation; PD-L1 on a tumor binds to PD-1 on a T cell to tell the T cell not to kill the tumor cell.)

    2. Zhou et al., 2015

      GBM secretes periostin, a signaling protein involved in cell adhesion, wound healing, and the endothelial-mesenchymal transition. Overexpression of periostin also recruits immunosuppressive immune cells.

    3. Wainwright et al., 2012

      GBM expresses indoleamine 2,3 dioxygenase (IDO). IDO regulates the function & expansion of regulatory T cells. GBM recruits a ton of regulatory T cells that express GITR and seem to inhibit the immune response.

    4. Crane et al., 2014

      Many different tumors secrete the protein LDH5, which causes many healthy myeloid cells to produce NKG2D ligands, which causes NK cells to be less aggressive toward NKG2D ligands (down-regulates NKG2D receptors), which allows tumors expressing NKG2D ligands to get away scot-free.

    Annotators

  5. Dec 2021
    1. From historic data from June 2021 to October 2021, when Delta was dominant in Scotland, we have estimated that 75% of admissions within 14 days of a positive test were admitted for SARS-CoV-2. This percentage was constant over this five-month period.

      this data is not right-censored

    2. We used S gene status as a surrogate for Delta and Omicron VOCs, with S gene positive status indicating Delta whereas S gene negative indicated Omicron

      appears sensible when Delta & Omicron are the two VOCs. Alpha also has S gene dropout. BA.2 does not have this deletion, but that version of Omicron was not detected in the UK as of 12/7 (https://www.theguardian.com/world/2021/dec/07/scientists-find-stealth-version-of-omicron-not-identifiable-with-pcr-test-covid-variant)

  6. Jul 2021
    1. would exhibit little gender bias becausemany of its authors are professional journalists

      This seems to represent a misunderstanding of what the embedding represents - news articles just need to quote more librarians who use 'she' and more philosophers who use 'he' to generate this. The writing need not be stereotyped - all that's necessary is for the world to be biased.

    2. owever, none of these papers haverecognized how blatantly sexist the embeddings are

      Was this really true - especially given that a key example embedding in the word2vec paper is about gender?

    3. removegender stereotypes, such as the association between the wordsreceptionistandfemale, while maintaining desired associations such as between the wordsqueenandfemale

      This seems like an ill-specified task? We'll see.

    4. disturbing exten

      "Disturbing" implies some element of surprise, which seems unwarranted. (Doesn't make it less important, but the results aren't at all surprising based on the source texts.)

  7. Jun 2021
  8. May 2021
    1. Evaluation of sentiment transfer is difficult and is still an openresearch problem (Mir et al., 2019)

      ...because in its full generality measuring sentiment requires a complete understanding of social interaction, and is highly subculturally specific. This is not just an open research problem, it seems impossible without GAI.

    2. BLEU score on the test setwhich contains 100K parallel sentences.

      BLEU score is an unintuitive metric here - wouldn't some way of measuring how well we discovered the codebook be better?

    3. Note that the loss used in previous work does not include the negative entropy term,−Hq. Ourobjective results in this additional “regularizer”, the negative entropy of the transduction distribution,−Hq. Intuitively,−Hqhelps avoid a peaked transduction distribution

      This is critical.

    4. we introduce further parameter tying between the two directions of transduction: the same encoder isemployed for bothxandy, and a domain embeddingcis provided to the same decoder to specifythe transfer direction, as shown in Figure 2

      Ooh, this is interesting.

    5. herefore, we use the samearchitecture for each inference network as used in the transduction models, and tie their parameters

      This seems like a lot of text on one of the simpler ideas in the paper?

    6. emissions are one-to-one

      Not generally true of an HMM - but maybe this is an assumption that is often made when doing inference? I don't know the usual HMM inference techniques.

    7. Markov assumption on the latent sequence

      But because the sequence is latent and corresponds arbitrarily to outputs, this doses not actually seem like a strong independence assumption to me.

    8. p(X, ̄X,Y, ̄Y;θx| ̄y,θy| ̄x) =(m∏i=1p(x(i)| ̄y(i);θx| ̄y)pD2( ̄y(i)))(n∏j=m+1p(y(j)| ̄x(j);θy| ̄x)pD1( ̄x(j)))(1)
      • The joint likelihood of the observed and latent sentences,
      • given the parameters of the two transduction distributions
      • ...is given by...
      • the product over the m sentences we observe in the first domain of
      • the probability of each observed sentence given the corresponding latent sentence and the transduction parameters
      • times the probability under the language model that governs the second domain of the latent sentence
      • all multiplied by
      • the product over the n sentences we observe in the second domain of
      • the probability of each observed sentence given the corresponding latent sentence and the transduction parameters
      • times the probability under the language model that governs the first domain of the latent sentence.
    9. typicallyenforce overly strong independence assumptions about data to make exact inference tractable

      The three dangerous statistical assumptions:

      • the average is representative
      • we have independence
      • it's pretty much linear
    10. the noisy channel model (Shannon, 1948)

      Here Shannon establishes a relationship between the amount of noise and the maximum transmission efficiency under an error-correcting code that compensates for the noise with high confidence.

    11. Style transfer hashistorically referred to sequence transduction problems that modify superficial properties of text –i.e. style rather than content

      It is not totally clear what is in this category, is it. I wonder if the metaphor of "style" is limiting our imagination for how these models are used?

    12. e.g.the HMM

      Hidden Markov Model.

      A Hidden Markov Model in its full generality assumes very little about the data it generates - I wonder what this means in this context?

    1. LMLE(θ) =N∑i=1logpθ(y(i)|x(i))

      This is what we're aiming to maximize: the sum of the log likelihoods of the observations under the model parameterized by \(\theta\).

      (Is there a bit of a problem here? In situations where x translates to multiple good y's, the ground truth probability p(y|x) is lower. Are we overweighting situations with fewer right answers?)

    2. which hurts its generalization tounseen inputs, that is known as the “exposure bias” problem

      There's some question about to what extent this is a first-order problem and to what extent this is just difficulty in generalization. See: Quantifying Exposure Bias for Open-Ended Language Generation, He at al. & Generalization in Generation: A Closer Look at Exposure Bias, Schmidt.

    1. Multiplyingthe output probabilities of this predictor withG’soriginal probabilities and then renormalizing yieldsa model for the desiredP(X|a)via Bayes’ Rule.

      Here's the key math. \(P(X|a) = P(a|X) * P(X) / P(a)\). Since we already have the constraint that a probability adds to 1, we can consider P(a) to be just a normalizing factor and we don't actually have to know/derive it.

  9. Apr 2021
    1. Now if both players knew each other’s cards, they would agreethat if the last card is a 3 or 8 of any suit, Bob wins, otherwise Alice wins.

      No - Bob wins if the river is a heart, and loses otherwise.

    1. USING GOSSIPS TO SPREAD INFORMATION:THEORY AND EVIDENCE FROM TWO RANDOMIZEDCONTROLLED TRIALS

      Key takeaway: community members can identify people with high centrality in their social network.

    1. [while]

      This excerpt is similarly misleading, with that one "[while]" replacing a paragraph of text. (Again, yes Darwin was sexist, but these quotes misrepresent the text, and Darwin's stances).

    2. The western nations of Europe . . . now so immeasurably surpass their former savage progenitors[that they]stand at the summit of civilization. . . . [T]he civilised races of man will almost certainly exterminate,and replace, the savage races throughout the world.

      I'm not going to defend Charles Darwin and I think we can all agree racism and sexism inhere to his worldview, but the quote is from The Descent of Man, not Origin of Species, and those ellipses are very misleading - they cut out more than 20 pages. As excerpted, it looks like Darwin is advocating genocide, but I don't think anyone would conclude that from the quotes in context.

  10. arxiv.org arxiv.org
    1. Unlike this constant value — which isour expectation if there were nodegree correlations — the solid line increases from near 300 for low degree individuals to nearly 820 forindividuals with a thousand friends confirming the network’s positive assortativity.

      Would be interesting to analyze the source of this. My first guess would be that it's generational, reflecting different usage patterns, rather than connectedness being "causal" (though I'm not positive I can nail down what the difference is).

    2. A naive approach to counting friends-of-friends

      Interesting - the linear model seems like a more naive approach to me. The approach described here doesn't become sensible until you tie in degree assortivity, which is not that intuitive, and not nearly strong enough to justify \(k^2\).

    3. The second-largest connected component only has just over 2000 individuals

      Who is this second-largest component!?

      And has it been joined to the main component since 2011?

  11. Jul 2020
  12. May 2020
    1. gradient

      General observation about gradient descent unrelated to this paper - if you look at the partial derivative of any particular parameter, it's got two components, which, slightly metaphorically, correspond to how strongly it was activated and how much influence it had over the error. This differs from my introspective feeling about how my own learning works, where the understanding I'm most sure of, the part of my model which is most strongly activated, is not the thing that changes the most. It introspectively (and so unreliably) feels like I'm more likely to try to spin up a patch for the error - take an under-activated part of the model, shift its outbound connections to try to better match the shape of the error, and crank up the weight going into it. Gradient descent sort of only punishes mistakes rather than rewarding growth (though maybe that sense is just a consequence of the arbitrary choice of sign: minimizing loss vs. maximizing its opposite).

      Basically, what if we looked for a highly activated cell that didn't affect the outcome much one way or another and see if we could make it affect the loss more and better? Once a cell gets a tiny weight going into it and tiny weights in its output stream, is there any hope for it to matter? How often does this sort of "self-pruning" zero-ing happen? Are these metaphors at all sensible?

    2. We will refer toPas the perceptual latent space

      It seems like we could choose an arbitrary layer in the critic and use its input as this latent space - what choices of layer make this "perceptual"?

    3. z∼ N(0,1)

      General GAN question - what happens if we make a different assumption about the z distribution?

      The weights and biases in the first layer can scale and translate this along any dimension into any normal distribution, but what breaks if it's uniform instead? If we make our z distribution multimodal, does that help to learn multimodal domains better? Like ImageNet instead of faces, or non-centered objects, or scenes with more than one thing, all of which StyleGAN has a relatively hard time with?

    4. The problem we address, while close to the formula-tion of the single-image super-resolution problem, is in factrather different.

      Predicting the highest likelihood higher-resolution image vs. finding a probability distribution of higher-resolution images.

  13. Sep 2019
    1. When we look at humans, we see them as plotters or schemers or competition. But when we look at puppies, or kittens, or other animals, none of that social machinery kicks in. We're able to see them as just creatures, pure and innocent things, exploring an environment they will never fully understand, just following the flow of their lives.

      When I look at myself, when do I apply the schemer paradigm and when the kitten?

    1. Note that the ‘supremum’in the definition ofck|ℓis actually a ‘maximum’.

      These seems like a subtle point, and it's not obvious to me at first blush that it should be so. Why couldn't this be irrational, say? There are an infinite number of families in this class of families.

    2. k|ℓ-separated

      Translation: The family is \(k|l-separated\) if any \(k-l\) elements of the ground set, and any \(l\) other ones, you can find a set in the family that has all of the \(k-l\) elements and none of the \(l\) other ones.

      Seems like the notation would be better with this interpretation in mind. n|n-separated isn't as obviously silly as 0|n-separated. And then 1|1-separated would translate into the usual definition for separable, instead of 2|1.

    3. As a trivial example, for every finitesetXwith|X| ≥k, any union-closed family onX, containing the union-closed family{A⊆X:|A| ≥ℓ} ∪ {∅}, isk|ℓ-separated.

      If it's got every set of size \(l\), then no matter what \(k\) elements you pick, the set that contains exactly the first \(l\) is in there.

      Is that the trivial example, or the only example? It's not the only example: {01234, 12345, 23450, 34501, 45012, 50123} is 5|4 separated (despite not having all the size 4 sets in it), but not 6|4 separated.

    1. In epidemiology, A is commonly referred to as exposure or treatment.

      Is it a Harvard convention that A is the treatment variable and Y is the outcome? Or a health care policy convention?

    1. as early as the 1850’s by John Snow

      ...in the course of founding epidemiology by isolating the cause of cholera to contaminated water supplies. (Compared cholera cases in houses supplied by two different water companies, before and after one switched the source of their water.)

  14. Aug 2019
    1. .

      homework question: what makes diff-in-diff different from non-randomized trials (like, when trying to figure out the effect of surgical interventions) more generally?

    2. only permit two treatment histories

      This seems normal, why do we say so? Maybe the usual machinery accounts for people dropping out of the treatment group? Or are we preparing the way for later considerations about when there are a bunch of treatments of different populations that hit at different times?

    3. ββ\beta coefficient, on the other hand, is specific to the regression estimator

      is it? we could use a different estimator than OLS, like something that tries to compensate for heteroscedasticity (WLS, GLS), and still be looking for \(\beta\).

    4. .

      I can make guesses, but it seems like it might be useful to say why we might choose one or the other, or under what circumstances incorporating the extra data is helpful or not. (It's not obvious to me that more observations necessarily helps - seems like it increases the odds that an exogenous factor is going to hit one group and not the other. And if some of these are ongoing factors and not shocks that dissipate, the performance of each group is decreasingly useful for estimating the other as time goes on.)

    5. .

      Yeah, this chart makes me sure that we're assuming that the treatment has no effect on the untreated group & the lack of treatment has no effect on the treated group.

    6. .

      From context, I'm guessing this paragraph is trying to teach me a difference between the notation used in this exposition & how it might 'usually' be used? I don't understand what new information was introduced.

    7. many statistical methods: parametric, non-parametric and everything in between.

      The distinction between parametric & non-parametric methods is pretty far over my head. Wikipedia gives me a very gross understanding, but how something could be in-between is DEFINITELY over my head.

    8. Instead of a regression coefficient, we can define the target estimand as the difference between potential outcomes under treatment versus no treatment

      Yeah! This seems much more straightforward! Why were we talking about a regression coefficient before? So confused.

    9. notation

      Know it's usual practice to do notation conventions first, but I'd prefer in an expository situation like this to see it defined as it comes up - I think I could pick up the definitions better in context. Would also help because there's more notation along the way than this table defines.

    10. Y(t)=(1−A)⋅Y0(t)+A⋅Y1(t)

      1) there must be a simpler way to write this.

      2) couldn't this just be part of the definition of \(Y^{a}\) rather than an additional assumption?

    11. observe the potential outcomes both with treatment and with no treatment, estimating the ATT would be easy. We would simply calculate the difference in these two potential outcomes for each treated unit, and take the average

      I have some fundamentally different understanding about the nature of truth here that's making this hard to read for me - if we were to observe the potential outcomes, how would they still be potential? if we calculate the effect, how is it that we're still estimating the ATT and not calculating its actual value?

    12. ATT≡E[Y1(2)−Y0(2)∣A=1]

      I know this is not a very complex formula, but it still seems unnecessarily complex for such a simple idea. It also obscures the fact that \(Y^{1}(2) | A=1\) is a fact that we have and \(Y^{0}(2) | A=1\) is an estimation. (right?)

    13. .

      These definitions I think I get, but they don't seem to fit the paragraphs above.

      In the example I'd want the estimand to be "How much did California's inpatient spending change due to the new laws?"

    14. In this example, the target estimand might be a regression coefficient, ββ\beta, that quantifies the differential change in California spending after the new law compared to the change in Nevada spending. We could use the ordinary least squares estimator to get an estimate, ^ββ^\hat{\beta}, from observed data.

      This is sensitive to the time interval around \(T_{0}\) that we choose to include in the analysis, which makes it seem to me like it's not helping to answer the question of how much the intervention helped.

    15. Finally, we define a method to estimate this using data, such as a linear model fit to inpatient spending in California and Nevada before and after the law change.

      I'm definitely all turned around by now - we estimate a question that's answerable from the data by building a model that approximates the data?

    16. .

      animation could be clearer here, the nubs on the arrows are a little ugly and if the treated/control labels & graphs faded out, the brackets wouldn't have to fly so far.

    17. estimand

      This italics looks like estimand is being defined, but it isn't actually defined here, we just get an example. From context and Latin roots, I think it's "the thing we're estimating".

      I'm also not clear how this question is statistical, or more statistical than the previous question - the change in the difference in spending in California vs. Nevada before and after the law changed seems like an observed fact.

    18. estimate causal effects of non-randomized interventions

      Followup homework: diff-in-diff looks closely related to regression discontinuity - can it be viewed as a generalization of it? Are the observations in this explanation transferable?

  15. Dec 2018
    1. Pointer to Erdos-Ko-Rado and an extension by Hilton & Milner - could look at those to get ideas for tools?

      Plus some history that's better covered in the Bruhn & Schaudt survey.

    1. A couple of strengthenings of FUNC by looking at something even more general than a partial order, that seem not to be true (but maybe could be adjusted and re-opened in useful ways).

      Don't think this direction is super likely to give leverage on FUNC because it moves things to a land with verrrry little structure.

    2. Let n={0,1,...,n−1}n={0,1,...,n−1}n=\left\{0,1,...,n-1\right\}, n>1n>1n>1, and ={Fi⊂n+1,i∈n}F={Fi⊂n+1,i∈n}\mathcal F=\left\{F_i\subset n+1,\, i\in n \right\} such that, for any i∈ni∈ni\in n, i∈Fi⊂i+1,(∗)

      I find this notation very confusing, but breaking it down:

      • we've got a collection \(\mathcal{F}\) of n sets with a ground set 0-n
      • each set contains its label
      • no set contains any elements larger than its label

      The rest of the question seems to imply the way to think about this is as a sort of truth table for an order? We have n items and each \(F_{i}\) tells you which elements are \(\leq i\).

      It's reflexive because each \(F_{i}\) contains \(i\), antisymmetric because no \(F_{i}\) contains anything larger than \(i\), but it's not necessarily transitive. So it's not necessarily a partial order.

      But you can take the partial order given by a lattice (or a intersection-closed family of sets) and represent it with one of these things.

    1. without loss of genrality suppose thatyis the only member ofg+∗(B) such that{y}is not a component ofg∗(B)

      this seems like we're losing a lot of generality! but "only" isn't necessary for the rest of the argument.

    2. Ω∗(g+∗(B)) is complete

      This isn't true under the assumptions we've made - the subcase 2 condition and the case 2 condition together means this isn't complete.

      (Not that I'm clear why its completeness matters here. The previous statement, that everything in the powerset is present, is true, but because \(\Omega\) with each set intersected with some set is also union-closed, not because of completeness.)

    3. CASE I:g+∗(B)∩Ak+16=∅

      the thrust of case I is correct - if we have a minimum set that overlaps with everything in \(\Omega - A_{k+1}\) and it also overlaps with \(A_{k+1}\), then the minimum set that overlaps with everything in \(\Omega\) is the same size.

    4. This lemma is important because if for someB⊆U(Ω),Ω∗(B) is complete and|Ω+∗(B)|<⌈log2|Ω∗(B)|⌉then somex∈Bbelongs to Γ(Ω). This is due to the fact that if|Ω+∗(B)|<⌈log2|Ω∗(B)|⌉then Ω∗(B)contains more components than the number of elements in the power-set℘(Ω+∗(B)).

      This doesn't follow. Each element of \(\Omega_{*}(B)\) is not, in general, distinct.

    5. We seekB⊆U(Ω) such that Ω∗(B) is union-closed but at most onecomponent of Ω∗(B) is the emptyset.

      Either there's an element or elements that are in every single set in \(\Omega\), or B is the whole universe of \(\Omega\).

    6. 5)

      In general, this is just B (so long as everything in the ground set is in some member of \(\Omega\), which it might as well be, or you could pick a smaller ground set).

    7. λ(A, B) = 1 =⇒ |A|=|B|

      This doesn't follow from the definition without some additional constraints. Some of the \(A_{i} \setminus C_{i}\) might coincide.

      Ex: \((\emptyset),(1),(2),(12) \) reduces to \((\emptyset),(1)\) if you take \(C_{3}=C_{4}=(12)\).

    8. LetA={A1, A2, ..., An} ∈℘[n] andB∈℘[∞]. Consider the functionλ:℘[∞]×℘[∞]→ {0,1}defined as such:λ(A, B) = 1 if there exists setsC1, C2, ..., Cn⊂U(A)(not all empty) such thatB={Ai\Ci:Ai∈A,1≤i≤n}; (16)otherwiseλ(A, B) = 0.

      \(\lambda\) answers: is union-closed family B formed by taking A and taking away elements from some of its sets?

    9. Lemma 9For eachn∈N,Ω(1)∈℘[n]andΩ(2)∈℘[n+ 1]there existsA, B⊂T∞such thatΩ(2)\{B}∈℘[n]andΩ(1)∪{A}∈℘[n+ 1]

      If one union-closed family has one more set than another, then you can transform one into the other by adding or subtracting that set.

    10. Therefore, if we prove the FC for the elements of℘(n, m), m≥1 we can be sure that wehave established the result for all valid finite union-closed collections.

      Or in other words, the labels that we stick on the ground set don't matter.