3,935 Matching Annotations
  1. Last 7 days
    1. Again, p is the probability of seeing results as extreme (or more extreme) as those actually observed if the null hypothesis were true. So p is computed under the assumption that the null hypothesis is true. Yet it is common for researchers, teachers and even textbooks to think of p as the probability of the null hypothesis being true (or equivalently, of the results being due to chance), an error called the "fallacy of the transposed conditional" (Haller and Krauss, 2002; Cohen, 1994, p.999).

      p-value is misinterpreted and confusing

    1. This assessment raises two issues. First, it is arbitrary. If 10 of the 15 CIs included the predicted values, would the results also support the theory, or instead refute it? If one instead used 99% CIs, would positive results for 12 of the 15 predictions be enough to support the theory? This arbitrariness arises because CIs offer no principled method for generating an inference regarding the theory.

      Estimation is too messy / complex and not clear enough

    1. To illustrate this point Oakes posed a series of true/false questions regarding the interpretation of p-vales to seventy experienced researchers and discovered that only two had a sound understanding of the underlying concept of significance [25].

      Sentences where they say people don't really know the statistics, they just apply tests without thought because it's tradition

    2. failure to check assumptions about the data required by particular tests, over-testing and using inappropriate tests

      Sentences where they say people don't really know the statistics, they just apply tests without thought because it's tradition

    3. abusing statistical tests, making illogical arguments as a result of tests, deriving inappropriate conclusions from nonsignificant results, and confusing the size of p-values with effect sizes.

      Sentences where they say people don't really know the statistics, they just apply tests without thought because it's tradition

    4. This approach, fiercely promoted by Fisher in the 1930's [9], has become the gold standard in many disciplines including quantitative evaluations in HCI. However, the approach is rather counter-intuitive; many researchers misinterpret the meaning of the p-value.

      Sentences where they say people don't really know the statistics, they just apply tests without thought because it's tradition

    1. We found that using MINE directly gave identical performance when the task was nontrivial, but became very unstable if the target was easy to predict from the context (e.g., when predicting a single step in the future and the target overlaps with the context).

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    2. We note that better [49, 27] results have been published on these target datasets, by transfer learning from a different source task.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    3. We also found that not all the information encoded is linearly accessible. When we used a single hidden layer instead the accuracy increases from 64.6 to 72.5, which is closer to the accuracy of the fully supervised model.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    4. For lasertag_three_opponents_small, contrastive loss does not help nor hurt. We suspect that this is due to the task design, which does not require memory and thus yields a purely reactive policy.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    5. Although this is a standard transfer learning benchmark, we found that models that learn better relationships in the childeren books did not necessarily perform better on the target tasks (which are very different: movie reviews etc).

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    6. We found that more advanced sentence encoders did not significantly improve the results, which may be due to the simplicity of the transfer tasks (e.g., in MPQA most datapoints consists of one or a few words), and the fact that bag-of-words models usually perform well on many NLP tasks [48].

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    7. It is important to note that the window size (maximum context size for the GRU) has a big impact on the performance, and longer segments would give better results. Our model had a maximum of 20480 timesteps to process, which is slightly longer than a second.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    8. Interestingly, CPCs capture both speaker identity and speech contents, as demonstrated by the good accuracies attained with a simple linear classifier, which also gets close to the oracle, fully supervised networks.

      please point only to the details of the most successful version of this system, especially in tables when there are many options, and also highlight sections that provide supporting context for these conditions, if appropriate

    9. Figure 6 shows that for 4 out of the 5 games performance of the agent improves significantly with the contrastive loss after training on 1 billion frames.

      please point only to the details of the most successful version of this system, especially in tables when there are many options, and also highlight sections that provide supporting context for these conditions, if appropriate

    10. Despite being relatively domain agnostic, CPCs improve upon state-of-the-art by 9% absolute in top-1 accuracy, and 4% absolute in top-5 accuracy.

      please point only to the details of the most successful version of this system, especially in tables when there are many options, and also highlight sections that provide supporting context for these conditions, if appropriate

    11. We also found that not all the information encoded is linearly accessible. When we used a single hidden layer instead the accuracy increases from 64.6 to 72.5, which is closer to the accuracy of the fully supervised model.

      please point only to the details of the most successful version of this system, especially in tables when there are many options, and also highlight sections that provide supporting context for these conditions, if appropriate

    1. Provide your best guess for the following question, and describe how likely it is that your guess is correct as one of the following expressions: ${EXPRESSION_LIST}. Give ONLY the guess and your confidence, no other words or explanation. For example:\n\nGuess: <most likely guess, as short as possible; not a complete sentence, just the guess!>\nConfidence: <description of confidence, without any extra commentary whatsoever; just a short phrase!>\n\nThe question is: ${THE_QUESTION}

      please find the barebones practical information i need to implement this system or strategy

    2. Provide your ${k} best guesses and the probability that each is correct (0.0 to 1.0) for the following question. Give ONLY the guesses and probabilities, no other words or explanation. For example:\n\nG1: <first most likely guess, as short as possible; not a complete sentence, just the guess!>\n\nP1: <the probability between 0.0 and 1.0 that G1 is correct, without any extra commentary whatsoever; just the probability!>

      please find the barebones practical information i need to implement this system or strategy

    3. Each linguistic likelihood expression is mapped to a probability using responses from a human survey on social media with 123 respondents (Fagen-Ulmschneider, 2023). Ling. 1S-opt. uses a held out set of calibration questions and answers to compute the average accuracy for each likelihood expression, using these 'optimized' values instead.

      please find the barebones practical information i need to implement this system or strategy

    4. Finally, our study is limited to short-form question-answering; future work should extend this analysis to longer-form generation settings.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    5. While our work demonstrates a promising new approach to generating calibrated confidences through verbalization, there are limitations that could be addressed in future work. First, our experiments are focused on factual recall-oriented problems, and the extent to which our observations would hold for reasoning-heavy settings is an interesting open question.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    6. the 1-stage and 2-stage verbalized numerical confidence prompts sometimes differ drastically in the calibration of their confidences. How can we reduce sensitivity of a model's calibration to the prompt?

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    7. Provide your best guess and the probability that it is correct (0.0 to 1.0) for the following question. Give ONLY the guess and probability, no other words or explanation. For example:\n\nGuess: <most likely guess, as short as possible; not a complete sentence, just the guess!>\n Probability: <the probability between 0.0 and 1.0 that your guess is correct, without any extra commentary whatsoever; just the probability!>\n\nThe question is: ${THE_QUESTION}

      please find the barebones practical information i need to implement this system or strategy

    8. Provide your best guess for the following question, and describe how likely it is that your guess is correct as one of the following expressions: ${EXPRESSION_LIST}. Give ONLY the guess and your confidence, no other words or explanation.

      please find the barebones practical information i need to implement this system or strategy

    9. To fit the temperature that is used to compute ECE-t and BS-t we split our total data into 5 folds. For each fold, we use it once to fit a temperature and evaluate metrics on the remaining folds. We find that fitting the temperature on 20% of the data yields relatively stable temperatures across folds.

      please find the barebones practical information i need to implement this system or strategy

    10. Additionally, the lack of technical details available for many state-of-the-art closed RLHF-LMs may limit our ability to understand what factors enable a model to verbalize well-calibrated confidences and differences in this ability across different models.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    11. With Llama2-70B-Chat, verbalized calibration provides improvement over conditional probabilities across some metrics, but the improvement is much less consistent compared to GPT-* and Claude-*.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    12. The verbal calibration of the open source model Llama-2-70b-chat is generally weaker than that of closed source models but still demonstrates improvement over its conditional probabilities by some metrics, and does so most clearly on TruthfulQA.

      all content that points to important caveats and gotchas that I might consider when leaning too heavily on the results of this paper

    13. Among the methods for verbalizing probabilities directly, we observe that generating and evaluating multiple hypotheses improves calibration (see Figure 1), similarly to humans (Lord et al., 1985), and corroborating a similar finding in LMs (Kadavath et al., 2022).

      please point only to the details of the most successful version of this system, especially in tables when there are many options, and also highlight sections that provide supporting context for these conditions, if appropriate

    1. the psychology research community has been strongly questioning the value of NHST in psychology for some years now [6] and calling for a more meaningful reporting of statistical inference based on effect sizes, confidence intervals and Bayesian reasoning [9].

      Mentioning the problems with p-values

    2. Similarly, if the significance level is set at 0.05, then this is the probability of the data occurring by chance when there is no experimental effect, namely one in twenty times. The more tests that are done on a particular dataset, the more likely it is that some chance variation will be extreme enough to seem like significance.

      Mentioning the problems with p-values

    3. Violation of the assumptions of any statistical test can produce p values that bear little relation to the actual probabilities of outcomes and hence comparison to the significance level of 0.05 is meaningless.

      Mentioning the problems with p-values

    4. for an analysis to be sound, it is necessary that in the tests performed the probabilities of outcomes are accurately reflected in the p values produced by the tests. If this is not the case, then the NHST argument form is severely weakened.

      Mentioning the problems with p-values

    5. NHST is the most commonly encountered form of statistical inference and is what is usually associated with producing a null hypothesis, then testing it to give some statistic such as a t value, and then turning the statistic into a p value.

      Mentioning the problems with p-values

    1. The inclusion of counterfactuals often resulted in a substantial increase in precision, indicating that the models were better able to correctly classify relevant instances while reducing false positives.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    2. Mocha addresses two seemingly contradictory objectives: (1) generating labeled data that diversifies the training dataset to aid the model's learning, and (2) maintaining structural consistency across the batches of data presented to users to support their cognitive processes.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    3. The results of our study indicate that participants spent significantly less time annotating batches of counterfactuals when they were rendered according to SAT compared to other conditions i.e., supporting the participants' selective focus on the varying phrases, rather than phrases that stay consistent.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    4. From a cognitive perspective, the theme color aligns with the human's (theorized) structural mapping engine [27] by making relational discrepancies between the original and counterfactual examples more explicit.

      return any single sentence that describes an explicit or implicit connection to theory

    5. The last two prior works also combine Variation Theory (VT) and SAT together, as we did (i.e., a corollary of SAT referred to as Analogical Transfer/Learning Theory).

      return any single sentence that describes an explicit or implicit connection to theory

    6. Estes and Hasson [17] argue that while alignable differences can be more straightforward and easier for comparison, non-alignable differences can also provide key information that might otherwise remain overlooked.

      return any single sentence that describes an explicit or implicit connection to theory

    7. This symbiotic relationship stems from the fact that Structural Alignment Theory (SAT) enhances the salience of differences, while the way we used Variation Theory (VT) to generate contradicting examples across the boundaries of labels ensures that these differences are conceptually informative.

      return any single sentence that describes an explicit or implicit connection to theory

    8. Structural Alignment Theory states that humans naturally look for structural mapping between representations of objects to help them understand, compare, and infer relationships between said objects.

      return any single sentence that describes an explicit or implicit connection to theory

    9. According to Variation Theory, learners better understand concepts by observing variations along critical features (dimensions of variation) that define that concept and, separately, observing variations along superficial features that do not define that concept—all while other features, when possible, are held constant.

      return any single sentence that describes an explicit or implicit connection to theory

    10. Mocha exemplified the application of human cognition and concept learning theories in the interactive machine learning pipeline to support the negotiation of conceptual boundaries for bi-directional human-AI alignment.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    11. This pattern of selective attention suggests that the visual cues provided by Mocha effectively guided participants to focus on more relevant information within the context of unchanged text when making their labeling decisions.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    12. Overall, the incorporation of counterfactuals has generally improved the models' F1 scores, driven largely by the improvements in precision. This suggests that counterfactuals have effectively improved performance without necessitating a significant trade-off between precision and recall.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    13. The inclusion of counterfactuals often resulted in a substantial increase in precision, indicating that the models were better able to correctly classify relevant instances while reducing false positives. This improvement suggests that the counterfactuals provided essential information that helped refine the models' decision boundaries.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    14. By visualizing these consistent pattern rules, users may be better understanding the behavior of the model through inference projection [26]. This can not only boosts the model's performance but also enable participants to validate or correct the model during the interactive training process.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    15. Thus, the integration of both theories enables users to efficiently process and compare variations, leading to more informed decisions and a clearer understanding of the model's behavior.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    16. By helping users see alignable differences, SAT-based rendering helps users focus on key variations that are essential to changing the data item's label, making it easier to interpret the effects of changes and their significance.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    17. We argue that these two theories form a symbiotic relationship (Fig. 6). Variation Theory provides the conceptual basis for generating structurally consistent differences, while Structural Alignment Theory (SAT) enhances the user's ability in recognizing and processing these differences.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    18. Participants were able to efficiently focus on key differences between the original and counterfactual examples, which facilitated more efficient annotations.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    19. The results from our user study suggest that both the participants and the model benefited from the Variation Theory (VT)-based counterfactuals and Structural Alignment Theory (SAT)-based rendering.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    20. Variation Theory provides the conceptual basis for generating structurally consistent differences, while Structural Alignment Theory (SAT) enhances the user's ability in recognizing and processing these differences.

      return any single sentence that describes an explicit or implicit connection to theory

    21. This finding is consistent with previous work that supports users' sense-making of text, e.g., by modulating text saliency. Specifically, Gu et al. [32] and Gero et al. [29] both found improved reading efficiency and comprehension with saliency-modulating text renderings.

      any single sentence that compares and contrasts this work with prior work.

    22. In decision making, SAT argues that people tend to focus on alignable differences—features that can be directly compared—rather than on differences that cannot be easily aligned.

      return any single sentence that describes an explicit or implicit connection to theory

    23. Specifically, we use Variation Theory of learning [44] which states that for learning to occur, some aspects that define the concept being learned must vary while others are held constant.

      return any single sentence that describes an explicit or implicit connection to theory

    24. According to SAT, humans compare two similar entities by trying to find structural alignments between them, and then comparing corresponding elements, with a special focus on differing aligned elements.

      return any single sentence that describes an explicit or implicit connection to theory

    25. VT posits that human learning occurs when learners experience variation across critical and superficial aspects of a concept—through exposure to contrasting examples that systematically vary along different critical and superficial feature dimensions.

      return any single sentence that describes an explicit or implicit connection to theory

    26. To analyze the annotation efficiency, we first conducted a Kruskal-Wallis rank sum test [39] to determine if there were statistically significant differences in annotation time across the three conditions, because our data violated the homogeneity of variances assumption, making non-parametric methods more appropriate.

      return any single sentence that describes data analysis done on data collected by the authors when running human subjects experiments.

    1. Taken together, these findings almost unanimously show that, on average, AI-supported writing decreases but does not eliminate writer's feelings of ownership, underscoring the need for a larger theory of AI participation in the creative process.

      sentence that refers to a theory

    2. This can be understood through the frame of precarious work [5]; as writers feel that their work is increasingly precarious, the power differential between themselves and the organizations seeking to train LLMs grows larger.

      sentence that refers to a theory

    1. The study concluded with a 15-minute semi-structured interview. During the interview, participants saw screenshots from the three conditions and were asked which they preferred and disliked, why, what they wished the interface had, what influenced their skimming, and how they normally skimmed texts.

      sentence describing any interview procedures

    2. We used these mock-ups as design probes [31] to inspire ideation and elicit creative responses. Specifically, we asked participants to compare and contrast alternative mock-ups and reflect on how they could be used or improved to support their known or emerging synthesis and information-foraging goals.

      sentence describing any interview procedures

    3. In the first part of the session, we asked participants about their strategies for selecting publication venues for their manuscript submissions, how they identify and synthesize information from venues, their approaches to writing manuscripts, and finally, the technology they have used to help with these processes, current technology shortcomings, and ideas for addressing these challenges.

      sentence describing any interview procedures

    4. The interview sessions were divided into two parts: an open-ended semi-structured interview about their backgrounds and practices, followed by feedback on a range of mock-ups, including novel reified relationships between analogous sentences in different abstracts (Figure 2).

      sentence describing any interview procedures

    5. In order to determine (1) the context in which we might offer novel views of scientific abstracts and (2) the intelligibility of various novel prototype designs for reifying cross-abstract relationships, we conducted a formative interview study with 12 active researchers (see Appendix A for participant information).

      sentence describing any interview procedures

    6. pre-computing and reifying cross-document analogous relationships make it psychologically possible for users to engage—if they are willing to be guided by it. (Lower NFC users are more likely to fall into this category.)

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    7. Lower NFC participants were generally guided by emergent visual patterns created by the interactions between features, especially blocks of color spanning multiple sentences created when all three features are turned on.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    8. Dialectical activities cannot be done on a user's behalf by AI; with variation affordances, AI is supporting the user's engagement with the data themselves.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    9. In this sense, AbstractExplorer enables dialectical activities that users may otherwise have found to be too tedious or difficult to engage with.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    10. Our work demonstrates that designs informed by Structure-Mapping Theory can support users in navigating, making use of, and engaging with variation present in information.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    11. We posit that our approach can generalize to other domains such as journalism, code synthesis, and social media analytics where visual alignment of text can enable meaningful comparisons of underlying patterns to identify relational clarity.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    12. We demonstrate how slicing sentences according to roles and visually aligning them can help readers perceive cross-document relationships in a coherent manner.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    13. In this work, we introduce a new paradigm for exploring a large corpus of small documents by identifying roles at the phrasal and sentence levels, then slice on, reify, group, and/or align the text itself on those roles, with sentences left intact.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    14. Like prior Structural Mapping Theory (SMT)-informed work in text corpora representation, AbstractExplorer's features have enabled some users to see more of both the overview and the details at the same time, facilitating abstraction without losing context.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    15. Interviews were video and audio recorded. We transcribed the audio using OpenAI's Whisper automatic speech recognition system and anonymized the transcript before analysis. We analyzed the interview data using thematic analysis [1]. First, two members of the research team independently coded four (25% of collected data) randomly chosen participant data to generate low-level codes. The inter-coder reliability between the coders was 0.88 using Krippendorff's alpha [37]. The two coders then met together to cross-check, resolve coding conflicts, and consolidate the codes into a codebook across two sessions. Using the codebook, the two coders analyzed six randomly selected participant data each. The research team then met, discussed the analysis outcomes, and finalized themes over three sessions.

      sentence describing how analysis was performed on data collected by the authors of this paper

    16. Our work demonstrates that designs informed by Structure-Mapping Theory can support users in navigating, making use of, and engaging with variation present in information. In this sense, AbstractExplorer enables dialectical activities that users may otherwise have found to be too tedious or difficult to engage with.

      any sentence that describes explicit design implications

    17. In this work, we introduce a new paradigm for exploring a large corpus of small documents by identifying roles at the phrasal and sentence levels, then slice on, reify, group, and/or align the text itself on those roles, with sentences left intact.

      any sentence that describes explicit design implications

    18. Future work could explore more seamless ways of preserving context, such as allowing users to navigate through every sentence of an abstract directly within the Cross-Sentence Relationship pane, fostering a more cohesive understanding of the content.

      any sentence that describes explicit design implications

    19. We posit that our approach can generalize to other domains such as journalism, code synthesis, and social media analytics where visual alignment of text can enable meaningful comparisons of underlying patterns to identify relational clarity.

      any sentence that describes explicit design implications

    1. IRK was supported by funding from the Prins Bernhard Cultuurfonds (The Netherlands). This project was also funded by a Canadian Social Sciences and Humanities Research Council Insight Grant (435-2021-0224), a Social Sciences and Humanities Research Council Partnership Grant (895-2018-1023), and a Canada Research Chair (950-231872) to SMc.

      reference to Montreal the city or any institution or author based there

    2. Part of this research was presented at the Society for Music Perception and Cognition Conference, Portland, Oregon (2022). The authors would like to thank Bennett K. Smith for programming the experimental interface and assisting with the experiment execution on Prolific, and Philippe Macnab-Seguin for creating the chromatic scales for the second experiment.

      reference to Montreal the city or any institution or author based there

    3. Grimaud and Eerola (2022) compared instrument ensembles of strings, woodwinds, and brass in a study where participants either rated the emotions they perceived or manipulated musical parameters to produce a certain emotion. They found that strings were associated with increased anger and fear, woodwinds with decreased anger and fear, and brass with decreased fear, in the cases of both emotion perception and production. For the other emotions (joy, sadness, calmness, power, surprise), however, results were less consistent between perception and production, indicating that the emotion-instrument association may also depend on context of the task.

      makes an explicit connection between a music theory concept and congition

    4. This research follows a constructionist approach to musical affect (Cespedes-Guevara & Eerola, 2018). That is, although we are interested in the "bottom-up" influence of certain musical features on musical affect, we believe these cannot be adequately evaluated without considering the "top-down" effects of context and individual differences that are present when affects are constructed. The perception or induction of affect does not merely arise in response to a stimulus but is also formed in relation to the individual and the context.

      makes an explicit connection between a music theory concept and congition

    5. This research follows a constructionist approach to musical affect (Cespedes-Guevara & Eerola, 2018). That is, although we are interested in the \'bottom-up\' influence of certain musical features on musical affect, we believe these cannot be adequately evaluated without considering the \'top-down\' effects of context and individual differences that are present when affects are constructed. The perception or induction of affect does not merely arise in response to a stimulus but is also formed in relation to the individual and the context.

      makes an explicit connection between a music theory concept and congition

    1. Cognitive surrenderA paper that came out this year asked: if you’re working with AI a lot, and you’re using it as a machine to answer all of your questions, what happens with System 1 and System 2?

      Cognitive surrender: what happens to System 1 and System 2 if you offload to AI to get any answers? (Is this diff from other cognitive tools, like writing and Plato's rejection of it?)

      The paper is https://doi.org/10.31234/osf.io/yk25n_v1 and it posits AI offloading as System 3. That is an interesting perspective. Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender by Shaw and Nave, 2026. Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender in Zotero

    1. 2. Validator Another basic role for AI is validating your understanding. To do this, you ask it to review your notes for errors or gaps, do basic fact checking, or critique your reasoning. Again, you can do this via the chat interface, but I also experimented with passing my notes in Obsidian using the Copilot plugin and in Emacs using gptel. Example: After reading The Epic of Gilgamesh, I wrote a note in Obsidian summarizing its plot. When I asked ChatGPT to critique my summary, it pointed out that I’d given the central character a redemption arc that isn’t present in the text. I’m so accustomed to the standard hero’s journey, that I projected it onto the book — and an LLM helped me correct this ‘hallucination.’ Suggested prompt: Here are my notes on [WORK]. What important ideas did I miss or underemphasize? Don’t rewrite my notes — just flag the gaps.

      Role 2 validator of one's understanding, also seen as basic. Might be a good complement to e.g. turning some of my notes into [[Anki]] card decks or combine in another way w spaced repetition. [[Spaced repetition 20201012201559]] [[Connecting my PKM to Anki]]

  2. Mar 2026
    1. For the record, my posts aren’t written or conceived with an LLM, although I know an increasing number of people who use one to write a first draft and then edit. I’m not a fan. The whole point of the web — its beauty — is that it’s unrelentingly human and diverse.

      A good case for disfavoring the use of AI/LLMs to write first drafts of blog posts. Implicit I believe is a distinction between using external tools to edit/proofread a human-written draft vs editing/proofreading a machine draft (granting I do not use these tools for either). Related to points I raised in Re; On AI in response to: A Positive Technologist Identity (2/4).

    1. Although there are many idiosyncrasies in what may trigger a person with misophonia, the most common triggers are created by other humans, such as the sound of someone chewing, clearing their throat, tapping their foot, or typing on a keyboard.

      any sentences referring to misophonia verbatim

    2. an fMRI study found that people with misophonia show increased response in the anterior insular cortex (AIC) in response to misophonic sounds, compared to control participants and other unpleasant or neutral sounds (Kumar et al., 2017).

      any sentences referring to misophonia verbatim

    3. Both the subjective judgment of aversiveness and the physiological measure of skin conductance response (SCR) increase when people with misophonia are presented with triggers (Edelstein et al., 2013).

      any sentences referring to misophonia verbatim

    4. The disorder is not yet recognized by the Diagnostic and Statistical Manual − 5th version (DSM-5; American Psychiatric Association, 2013), but there has been an increasing amount of research on the characterization and treatment of misophonia (Vitoratou et al., 2021; see also Brout et al., 2018, for a review).

      any sentences referring to misophonia verbatim

    1. Composers and music researchers had previously analyzed and annotated 65 movements from the Classical, Romantic, and early Modern repertoire in terms of the Taxonomy of Orchestral Grouping Effects (McAdams et al., 2022).

      please find any claims that depend on citations referring to works by any of the present authors

    2. These results confirm with orchestral excerpts the findings of studies on isolated tones with dyads or triads of instruments in which the presence of impulsive instruments reduces the perception of blend (Lembke et al., 2019; Reuter, 1996; Tardieu & McAdams, 2012).

      please find any claims that depend on citations referring to works by any of the present authors

    3. structuring by affecting sequential grouping through the segregation of auditory streams played by different instruments and segmental grouping through timbral contrasts (McAdams et al., 2022).

      please find any claims that depend on citations referring to works by any of the present authors

    4. Several other spectral and spectrotemporal descriptors were found to play a role in blend perception in orchestral works by Fischer et al. (2021). These include spectral flatness and spectral crest (different measures of the degree to which the spectrum is denser or has more emergence of spectral components), and spectral variation (the degree of variation of the spectral shape over time).

      please find any claims that depend on citations referring to works by any of the present authors

    5. Fischer et al. (2021) studied the blends of multi-instrument streams in the context of orchestral stream segregation in predominantly Romantic orchestral excerpts. They found that within-family instrument combinations blended better than between-family combinations. They demonstrated the role played by overlap in timbre correlates of spectral flatness (a measure of the tonalness/noisiness or density of the spectrum), spectral skewness (related to the shape of the spectral envelope), and spectral variation (evolution of the spectral envelope over time), as well as cues derived from the scores such as onset synchrony and the consonance of concurrent pitch relations.

      please find any claims that depend on citations referring to works by any of the present authors

    1. When the sudden drop to a pianissimo occurred towards the ending of the piece, the perceived arousal responses of CHM and WM dropped slightly but rose again immediately to end on a high arousal. These two groups of listeners appear to have anticipated a return to a loud and majestic close and therefore kept their arousal responses higher than those of the NM.

      please highlight anything related to music performance practice

    2. CHM, who are more experienced with the instruments and compositional techniques used in Chinese orchestral music, might have had an idea of which features figure more prominently in the communication of particular intentions, and therefore would have more information available for their judgments.

      please highlight anything related to music performance practice

    3. The perception of affective intentions in music is influenced by the degree of familiarity listeners have with a musical tradition, the content implicated in the music, and the complex sonic environment created by the composer's creation and the musicians' interpretation.

      please highlight anything related to music performance practice

    4. Iqa' (plural iqa'at) is used to describe a rhythmic cycle. Iqa'at are made up of two different basic building blocks, the dum and tak, onomatopoeias derived from the sound produced on membranophones such as the darabuka.

      please highlight anything related to music theory

    5. H5. Being more culturally bound, musical cues that are learned, such as modal structures, metrical relations, and so on, will exert a greater influence on listeners' perceived valence ratings than on their arousal ratings.

      please highlight anything related to music theory

    1. We also ran evaluations of model latency and classification performance under varying false positive rates for the following LLMs by OpenAI: GPT-4o, GPT-4o-mini, and o3-mini.

      sentences describing methods the authors used; one sentence at a time

    2. We ensured each list was 30 items long as our pilot studies suggested this was long enough that manual detection starts to become unwieldy (users need to scroll up and down the document), but short enough that participants could become familiar in a short period.

      sentences describing methods the authors used; one sentence at a time

    3. We adapted two intent specifications from our evals: Mars Game Design Document and Financial Advice AI Agent Memory, as these tasks mapped to the two paradigmatic types covered in Sections 2 and 2.1 (design documents, and AI memory of the user).

      sentences describing methods the authors used; one sentence at a time

    4. We chose OpenAI's ChatGPT Canvas as a baseline for five reasons: (i) it is a popular, commercially available tool, hence it is likely familiar to users; (ii) it provides a document editing view, where users can select text and ask GPT to rewrite it, or chat with an AI to make global edits; (iii) it employs a similar class of model (GPT-4o); (iv) it supports similar editing features as SemanticCommit like inline text selection, conflict highlighting, and a diff view, while adding free-form editing; and (v) similar interfaces like Anthropic Artifacts tended to rewrite the specification entirely, and did not offer Canvas's "diff" view to allow for a fair comparison.

      sentences describing methods the authors used; one sentence at a time