4,425 Matching Annotations
  1. Mar 2026
    1. Our work demonstrates that designs informed by Structure-Mapping Theory can support users in navigating, making use of, and engaging with variation present in information. In this sense, AbstractExplorer enables dialectical activities that users may otherwise have found to be too tedious or difficult to engage with.

      any sentence that describes explicit design implications

    2. In this work, we introduce a new paradigm for exploring a large corpus of small documents by identifying roles at the phrasal and sentence levels, then slice on, reify, group, and/or align the text itself on those roles, with sentences left intact.

      any sentence that describes explicit design implications

    3. Future work could explore more seamless ways of preserving context, such as allowing users to navigate through every sentence of an abstract directly within the Cross-Sentence Relationship pane, fostering a more cohesive understanding of the content.

      any sentence that describes explicit design implications

    4. We posit that our approach can generalize to other domains such as journalism, code synthesis, and social media analytics where visual alignment of text can enable meaningful comparisons of underlying patterns to identify relational clarity.

      any sentence that describes explicit design implications

    5. We consider common sequences of chunk roles to be alignable structures that could be used to support users in identifying structural similarities and differences across sentences in different abstracts, in line with Structure-Mapping Theory [17].

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    6. Like prior Structural Mapping Theory (SMT)-informed work in text corpora representation, AbstractExplorer's features have enabled some users to see more of both the overview and the details at the same time, facilitating abstraction without losing context.

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    7. This ordering prioritizes dominant structural patterns (largest groups first) while exposing fine-grained variations (via length-sorted triplets), mirroring how humans compare sentences, if SMT is an accurate description in this domain of comparative close reading.

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    8. Structural mappings between objects are part of the cognitive process of comparison according to the Structure-Mapping Theory [17], and juxtaposition can facilitate humans in recognizing particular possible structural mappings between objects [75].

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    9. In SMT terminology, rendering and arranging according to corresponding chunks reify "commonalities in structure," while variation within corresponding chunks are "alignable differences" that users are predicted to notice.

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    10. The prior SMT-informed tools in Section 2.3 for both code and natural language corpora suggest that the cognitive process of comparing texts may be no exception to the cognitive processes SMT predicts.

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    11. SMT posits that visual alignment helps people perceive relational similarities and differences more clearly, thereby improving their ability to make meaningful comparisons and understand underlying patterns [28, 38, 47].

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    12. Structural Mapping Theory (SMT) is a long-standing well-vetted theory from Cognitive Science that describes how humans attend to and try to compare objects by finding mental representations of them that can be structurally mapped to each other (analogies).

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    13. This SMT-informed approach, which AbstractExplorer shares, tries to give this mental machinery "a leg up," letting users perhaps skip some steps by accepting reified cross-document relationships identified by the computer.

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    14. The human perceptual, comparative mental machinery that SMT describes is part of what enables humans to form more abstract structured mental models from concrete examples, among other critical knowledge tasks.

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    15. These examples of text-centric lossless techniques do not abstract away or summarize; they strategically re-organize and re-render the existing text to help enhance readers' own perceptual cognition, informed by Structural Mapping Theory (SMT) [17].

      sentences that mention theory, explicitly or implicitly; one sentence at a time

    16. We process this data in a three-stage pipeline (Figure 6). In the first stage, Sentence Segmentation and Categorization, abstracts are split into individual sentences using the NLTK package, and each sentence is classified into one of the five pre-defined aspects as listed in Section 4.1.1. Classification is performed by prompting an LLM (see prompt used in Appendix D.1) with the sentence and its full abstract.

      sentence relating to methodology

    17. Then, we segment sentences within each aspect into grammar-preserving chunks (see prompt used in Appendix D.2). This results in grammatically coherent chunks that are the basis of structure patterns. After identifying chunk boundaries, we again prompt an LLM to generate labels for chunks in a human-in-the-loop approach: starting from an initial set of labels for chunk roles, when a new label is generated, a researcher from the research team examines the new label and merges it with existing labels if appropriate, controlling for the total number of labels.

      sentence relating to methodology

    18. In this study, we allowed participants to experience views of same-aspect sentences (Section 4.1.1) with different combinations of highlighting, ordering, and alignment (as described in Section 4.1.2 and Section 4.1.4) enabled or not, in order to understand which and/or what combinations most effectively supported users' ability to skim and read laterally across documents.

      sentence relating to methodology

    19. Inspired by GP-TSM [24], AbstractExplorer first segments sentences into grammar-preserving chunks—segments that respect grammatical boundaries, i.e., an LLM judges that the sentence can be truncated at that chunk boundary without breaking the grammatical integrity of the preceding text. Each chunk is then classified by an LLM as having one of nine pre-defined roles, each of which has its own assigned color.

      sentence relating to methodology

    20. We conducted a qualitative analysis of user study transcripts and survey responses using a Grounded Theory approach [8]. First, the lead researcher collected a list of participants' behaviors, approaches, reflections on their experience, and feedback about the interface. The researcher then systematically coded this data, revisiting the data multiples times and refining the codes to ensure consistency and coherence. Through this process, high-level themes were identified and organized using affinity diagramming. Once the thematic structure was finalized, the researcher gathered supporting evidence for each theme and synthesized the findings, which were reviewed by the research team to ensure agreement on the results.

      sentence describing how analysis was performed on data collected by the authors of this paper

    21. Activity log data, which revealed how participants actually used the interface, echoed the above findings. According to the log data, participants spent most of their reading time (66.31%) with vertical alignment on the second element in structure pairs, followed by alignment on the first element (29.19%), and left-justified alignment (5.13%). Highlighting usage showed a similar preference: 91.13% of time with all chunks highlighted, 8.25% with partial highlighting, and minimal time (0.63%) without highlights.

      sentence describing how analysis was performed on data collected by the authors of this paper

    22. In this section, we present findings on how AbstractExplorer supports comparative close reading at scale by integrating quantitative survey responses and log data with qualitative analysis of transcripts and open-ended responses. The qualitative analysis process is described in detail in Appendix H.

      sentence describing how analysis was performed on data collected by the authors of this paper

    23. Throughout the two tasks, we also collected detailed interaction logs including counts of user-defined aspects created, duration of highlighting usage, and time allocation across the three possible alignment options.

      sentence describing how analysis was performed on data collected by the authors of this paper

    24. Both gaze data and the semi-structured interviews revealed that lower NFC participants were more willing to be guided by the three features and took advantage of them consciously.

      sentence describing how analysis was performed on data collected by the authors of this paper

    25. Using a two-tailed Mann-Whitney U Test, we found that participants who reported their lowest perceived cognitive load when all three features were enabled had significantly lower NFC than participants who reported their lowest cognitive load level when skimming with no features enabled—in the baseline interface (p=0.03).

      sentence describing how analysis was performed on data collected by the authors of this paper

    26. For simplicity of analysis, we denote participants with NFC scores above the overall participants' median NFC of 5.42 (IQR = 0.583) as higher NFC, and lower NFC otherwise.

      sentence describing how analysis was performed on data collected by the authors of this paper

    27. To contrast participants' gaze patterns in each condition, we used a Tobii Pro Spark eye-tracker placed below the desktop monitor used by all subjects; Tobii Pro Lab software recorded each participant's gaze over time in each condition.

      sentence describing how analysis was performed on data collected by the authors of this paper

    28. We collected 80 sentences from our abstracts dataset labeled by our system as "Methodology/Contribution." Participants viewed the same 80 sentences in each condition—often with a different subset of sentences initially visible due to ordering changes—but only had two minutes to look at them in each condition.

      sentence describing how analysis was performed on data collected by the authors of this paper

    29. After obtaining an expanded set of high-level chunk labels, we assign them to each of the sentence chunks by using LLMs in a multiclass classification few-shot learning task, with the initial labels and assignment as examples (see prompt used in Appendix D.3).

      sentence describing how analysis was performed on data collected by the authors of this paper

    30. Then, we segment sentences within each aspect into grammarpreserving chunks (see prompt used in Appendix D.2). This results in grammatically coherent chunks that are the basis of structure patterns. After identifying chunk boundaries, we again prompt an LLM to generate labels for chunks in a human-in-the-loop approach: starting from an initial set of labels for chunk roles, when a new label is generated, a researcher from the research team examines the new label and merges it with existing labels if appropriate, controlling for the total number of labels.

      sentence describing how analysis was performed on data collected by the authors of this paper

    31. We process this data in a three-stage pipeline (Figure 6). In the first stage, Sentence Segmentation and Categorization, abstracts are split into individual sentences using the NLTK package, and each sentence is classified into one of the five pre-defined aspects as listed in Section 4.1.1. Classification is performed by prompting an LLM (see prompt used in Appendix D.1) with the sentence and its full abstract.

      sentence describing how analysis was performed on data collected by the authors of this paper

    1. Mocha exemplified the application of human cognition and concept learning theories in the interactive machine learning pipeline to support the negotiation of conceptual boundaries for bi-directional human-AI alignment.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    2. This pattern of selective attention suggests that the visual cues provided by Mocha effectively guided participants to focus on more relevant information within the context of unchanged text when making their labeling decisions.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    3. Overall, the incorporation of counterfactuals has generally improved the models' F1 scores, driven largely by the improvements in precision. This suggests that counterfactuals have effectively improved performance without necessitating a significant trade-off between precision and recall.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    4. The inclusion of counterfactuals often resulted in a substantial increase in precision, indicating that the models were better able to correctly classify relevant instances while reducing false positives. This improvement suggests that the counterfactuals provided essential information that helped refine the models' decision boundaries.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    5. By visualizing these consistent pattern rules, users may be better understanding the behavior of the model through inference projection [26]. This can not only boosts the model's performance but also enable participants to validate or correct the model during the interactive training process.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    6. Thus, the integration of both theories enables users to efficiently process and compare variations, leading to more informed decisions and a clearer understanding of the model's behavior.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    7. By helping users see alignable differences, SAT-based rendering helps users focus on key variations that are essential to changing the data item's label, making it easier to interpret the effects of changes and their significance.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    8. We argue that these two theories form a symbiotic relationship (Fig. 6). Variation Theory provides the conceptual basis for generating structurally consistent differences, while Structural Alignment Theory (SAT) enhances the user's ability in recognizing and processing these differences.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    9. Participants were able to efficiently focus on key differences between the original and counterfactual examples, which facilitated more efficient annotations.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    10. The results from our user study suggest that both the participants and the model benefited from the Variation Theory (VT)-based counterfactuals and Structural Alignment Theory (SAT)-based rendering.

      statements that draw general conclusions about humans, computers, and/or human-computer interaction based on the results of the specific experiment done in the paper.

    11. Variation Theory provides the conceptual basis for generating structurally consistent differences, while Structural Alignment Theory (SAT) enhances the user's ability in recognizing and processing these differences.

      return any single sentence that describes an explicit or implicit connection to theory

    12. This finding is consistent with previous work that supports users' sense-making of text, e.g., by modulating text saliency. Specifically, Gu et al. [32] and Gero et al. [29] both found improved reading efficiency and comprehension with saliency-modulating text renderings.

      any single sentence that compares and contrasts this work with prior work.

    13. In decision making, SAT argues that people tend to focus on alignable differences—features that can be directly compared—rather than on differences that cannot be easily aligned.

      return any single sentence that describes an explicit or implicit connection to theory

    14. Specifically, we use Variation Theory of learning [44] which states that for learning to occur, some aspects that define the concept being learned must vary while others are held constant.

      return any single sentence that describes an explicit or implicit connection to theory

    15. According to SAT, humans compare two similar entities by trying to find structural alignments between them, and then comparing corresponding elements, with a special focus on differing aligned elements.

      return any single sentence that describes an explicit or implicit connection to theory

    16. VT posits that human learning occurs when learners experience variation across critical and superficial aspects of a concept—through exposure to contrasting examples that systematically vary along different critical and superficial feature dimensions.

      return any single sentence that describes an explicit or implicit connection to theory

    17. To analyze the annotation efficiency, we first conducted a Kruskal-Wallis rank sum test [39] to determine if there were statistically significant differences in annotation time across the three conditions, because our data violated the homogeneity of variances assumption, making non-parametric methods more appropriate.

      return any single sentence that describes data analysis done on data collected by the authors when running human subjects experiments.

    1. IRK was supported by funding from the Prins Bernhard Cultuurfonds (The Netherlands). This project was also funded by a Canadian Social Sciences and Humanities Research Council Insight Grant (435-2021-0224), a Social Sciences and Humanities Research Council Partnership Grant (895-2018-1023), and a Canada Research Chair (950-231872) to SMc.

      reference to Montreal the city or any institution or author based there

    2. Part of this research was presented at the Society for Music Perception and Cognition Conference, Portland, Oregon (2022). The authors would like to thank Bennett K. Smith for programming the experimental interface and assisting with the experiment execution on Prolific, and Philippe Macnab-Seguin for creating the chromatic scales for the second experiment.

      reference to Montreal the city or any institution or author based there

    3. Grimaud and Eerola (2022) compared instrument ensembles of strings, woodwinds, and brass in a study where participants either rated the emotions they perceived or manipulated musical parameters to produce a certain emotion. They found that strings were associated with increased anger and fear, woodwinds with decreased anger and fear, and brass with decreased fear, in the cases of both emotion perception and production. For the other emotions (joy, sadness, calmness, power, surprise), however, results were less consistent between perception and production, indicating that the emotion-instrument association may also depend on context of the task.

      makes an explicit connection between a music theory concept and congition

    4. This research follows a constructionist approach to musical affect (Cespedes-Guevara & Eerola, 2018). That is, although we are interested in the "bottom-up" influence of certain musical features on musical affect, we believe these cannot be adequately evaluated without considering the "top-down" effects of context and individual differences that are present when affects are constructed. The perception or induction of affect does not merely arise in response to a stimulus but is also formed in relation to the individual and the context.

      makes an explicit connection between a music theory concept and congition

    5. This research follows a constructionist approach to musical affect (Cespedes-Guevara & Eerola, 2018). That is, although we are interested in the \'bottom-up\' influence of certain musical features on musical affect, we believe these cannot be adequately evaluated without considering the \'top-down\' effects of context and individual differences that are present when affects are constructed. The perception or induction of affect does not merely arise in response to a stimulus but is also formed in relation to the individual and the context.

      makes an explicit connection between a music theory concept and congition

    1. Cognitive surrenderA paper that came out this year asked: if you’re working with AI a lot, and you’re using it as a machine to answer all of your questions, what happens with System 1 and System 2?

      Cognitive surrender: what happens to System 1 and System 2 if you offload to AI to get any answers? (Is this diff from other cognitive tools, like writing and Plato's rejection of it?)

      The paper is https://doi.org/10.31234/osf.io/yk25n_v1 and it posits AI offloading as System 3. That is an interesting perspective. Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender by Shaw and Nave, 2026. Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender in Zotero

    1. 2. Validator Another basic role for AI is validating your understanding. To do this, you ask it to review your notes for errors or gaps, do basic fact checking, or critique your reasoning. Again, you can do this via the chat interface, but I also experimented with passing my notes in Obsidian using the Copilot plugin and in Emacs using gptel. Example: After reading The Epic of Gilgamesh, I wrote a note in Obsidian summarizing its plot. When I asked ChatGPT to critique my summary, it pointed out that I’d given the central character a redemption arc that isn’t present in the text. I’m so accustomed to the standard hero’s journey, that I projected it onto the book — and an LLM helped me correct this ‘hallucination.’ Suggested prompt: Here are my notes on [WORK]. What important ideas did I miss or underemphasize? Don’t rewrite my notes — just flag the gaps.

      Role 2 validator of one's understanding, also seen as basic. Might be a good complement to e.g. turning some of my notes into [[Anki]] card decks or combine in another way w spaced repetition. [[Spaced repetition 20201012201559]] [[Connecting my PKM to Anki]]

    1. For the record, my posts aren’t written or conceived with an LLM, although I know an increasing number of people who use one to write a first draft and then edit. I’m not a fan. The whole point of the web — its beauty — is that it’s unrelentingly human and diverse.

      A good case for disfavoring the use of AI/LLMs to write first drafts of blog posts. Implicit I believe is a distinction between using external tools to edit/proofread a human-written draft vs editing/proofreading a machine draft (granting I do not use these tools for either). Related to points I raised in Re; On AI in response to: A Positive Technologist Identity (2/4).

    1. Although there are many idiosyncrasies in what may trigger a person with misophonia, the most common triggers are created by other humans, such as the sound of someone chewing, clearing their throat, tapping their foot, or typing on a keyboard.

      any sentences referring to misophonia verbatim

    2. an fMRI study found that people with misophonia show increased response in the anterior insular cortex (AIC) in response to misophonic sounds, compared to control participants and other unpleasant or neutral sounds (Kumar et al., 2017).

      any sentences referring to misophonia verbatim

    3. Both the subjective judgment of aversiveness and the physiological measure of skin conductance response (SCR) increase when people with misophonia are presented with triggers (Edelstein et al., 2013).

      any sentences referring to misophonia verbatim

    4. The disorder is not yet recognized by the Diagnostic and Statistical Manual − 5th version (DSM-5; American Psychiatric Association, 2013), but there has been an increasing amount of research on the characterization and treatment of misophonia (Vitoratou et al., 2021; see also Brout et al., 2018, for a review).

      any sentences referring to misophonia verbatim

    1. Composers and music researchers had previously analyzed and annotated 65 movements from the Classical, Romantic, and early Modern repertoire in terms of the Taxonomy of Orchestral Grouping Effects (McAdams et al., 2022).

      please find any claims that depend on citations referring to works by any of the present authors

    2. These results confirm with orchestral excerpts the findings of studies on isolated tones with dyads or triads of instruments in which the presence of impulsive instruments reduces the perception of blend (Lembke et al., 2019; Reuter, 1996; Tardieu & McAdams, 2012).

      please find any claims that depend on citations referring to works by any of the present authors

    3. structuring by affecting sequential grouping through the segregation of auditory streams played by different instruments and segmental grouping through timbral contrasts (McAdams et al., 2022).

      please find any claims that depend on citations referring to works by any of the present authors

    4. Several other spectral and spectrotemporal descriptors were found to play a role in blend perception in orchestral works by Fischer et al. (2021). These include spectral flatness and spectral crest (different measures of the degree to which the spectrum is denser or has more emergence of spectral components), and spectral variation (the degree of variation of the spectral shape over time).

      please find any claims that depend on citations referring to works by any of the present authors

    5. Fischer et al. (2021) studied the blends of multi-instrument streams in the context of orchestral stream segregation in predominantly Romantic orchestral excerpts. They found that within-family instrument combinations blended better than between-family combinations. They demonstrated the role played by overlap in timbre correlates of spectral flatness (a measure of the tonalness/noisiness or density of the spectrum), spectral skewness (related to the shape of the spectral envelope), and spectral variation (evolution of the spectral envelope over time), as well as cues derived from the scores such as onset synchrony and the consonance of concurrent pitch relations.

      please find any claims that depend on citations referring to works by any of the present authors

    1. When the sudden drop to a pianissimo occurred towards the ending of the piece, the perceived arousal responses of CHM and WM dropped slightly but rose again immediately to end on a high arousal. These two groups of listeners appear to have anticipated a return to a loud and majestic close and therefore kept their arousal responses higher than those of the NM.

      please highlight anything related to music performance practice

    2. CHM, who are more experienced with the instruments and compositional techniques used in Chinese orchestral music, might have had an idea of which features figure more prominently in the communication of particular intentions, and therefore would have more information available for their judgments.

      please highlight anything related to music performance practice

    3. The perception of affective intentions in music is influenced by the degree of familiarity listeners have with a musical tradition, the content implicated in the music, and the complex sonic environment created by the composer's creation and the musicians' interpretation.

      please highlight anything related to music performance practice

    4. Iqa' (plural iqa'at) is used to describe a rhythmic cycle. Iqa'at are made up of two different basic building blocks, the dum and tak, onomatopoeias derived from the sound produced on membranophones such as the darabuka.

      please highlight anything related to music theory

    5. H5. Being more culturally bound, musical cues that are learned, such as modal structures, metrical relations, and so on, will exert a greater influence on listeners' perceived valence ratings than on their arousal ratings.

      please highlight anything related to music theory

    1. We also ran evaluations of model latency and classification performance under varying false positive rates for the following LLMs by OpenAI: GPT-4o, GPT-4o-mini, and o3-mini.

      sentences describing methods the authors used; one sentence at a time

    2. We ensured each list was 30 items long as our pilot studies suggested this was long enough that manual detection starts to become unwieldy (users need to scroll up and down the document), but short enough that participants could become familiar in a short period.

      sentences describing methods the authors used; one sentence at a time

    3. We adapted two intent specifications from our evals: Mars Game Design Document and Financial Advice AI Agent Memory, as these tasks mapped to the two paradigmatic types covered in Sections 2 and 2.1 (design documents, and AI memory of the user).

      sentences describing methods the authors used; one sentence at a time

    4. We chose OpenAI's ChatGPT Canvas as a baseline for five reasons: (i) it is a popular, commercially available tool, hence it is likely familiar to users; (ii) it provides a document editing view, where users can select text and ask GPT to rewrite it, or chat with an AI to make global edits; (iii) it employs a similar class of model (GPT-4o); (iv) it supports similar editing features as SemanticCommit like inline text selection, conflict highlighting, and a diff view, while adding free-form editing; and (v) similar interfaces like Anthropic Artifacts tended to rewrite the specification entirely, and did not offer Canvas's "diff" view to allow for a fair comparison.

      sentences describing methods the authors used; one sentence at a time

    5. Our explorations went through substantial iterations and prompt prototyping over a period of eight months, evolving in response to two pilot studies and progressing from a card-based interface to a list of texts.

      sentences describing methods the authors used; one sentence at a time

    6. We iterated on prompts using ChainForge [5] by setting up an evaluation pipeline against our datasets, which allowed us to observe the effects of prompt changes and model choices.

      sentences describing methods the authors used; one sentence at a time

    7. For qualitative analysis, the first author performed open coding on participant responses and audio transcripts to identify themes, which were used to interpret the qualitative results.

      sentences describing methods the authors used; one sentence at a time

    8. In the post-task surveys, we collected self-reported NASA Task Load Index (TLX) scores, Likert-scale ratings for ease of use, and responses on how well the AI helped participants identify, understand, and resolve semantic conflicts.

      sentences describing methods the authors used; one sentence at a time

    9. We run end-to-end on our four eval datasets using GPT-4o and GPT-4o-mini and report the mean ± stddev for accuracy, precision, recall, and F1 scores for the three approaches in Figure 5.

      sentences describing methods the authors used; one sentence at a time

    10. We compare our end-to-end system against two simpler methods: (i) DropAllDocs, which adds all documents to the context for conflict classification; and (ii) InkSync [56] which generates a JSON list of string-replace operations.

      sentences describing methods the authors used; one sentence at a time

    11. Through a within-subjects study with 12 participants comparing SemanticCommit to a chat-with-document baseline (OpenAI Canvas), we find differences in workflow: half of our participants adopted a workflow of impact analysis when using SemanticCommit, where they would first flag conflicts without AI revisions then resolve conflicts locally, despite having access to a global revision feature.

      sentences describing methods the authors used; one sentence at a time

    12. We compare our end-to-end system against two simpler methods: (i) DropAllDocs, which adds all documents to the context for conflict classification; and (ii) InkSync [56] which generates a JSON list of string-replace operations.

      sentences describing methods the authors used; one sentence at a time

    13. In the post-task surveys, we collected self-reported NASA Task Load Index (TLX) scores, Likert-scale ratings for ease of use, and responses on how well the AI helped participants identify, understand, and resolve semantic conflicts.

      sentences describing methods the authors used; one sentence at a time

    14. Our explorations went through substantial iterations and prompt prototyping over a period of eight months, evolving in response to two pilot studies and progressing from a card-based interface to a list of texts.

      sentences describing methods the authors used; one sentence at a time

    15. These semantic conflicts require dedicated support to detect, visualize, and resolve. Semantic conflict resolution interfaces must go beyond visualizing what changes were made, to what changes could be made, where they should be made, and what the effects might be. This resembles feedforward: affordances that help the user foresee the impact of an action [67, 93].

      sentences describing connections to theory; one sentence at a time

    16. This reflects the principle of feedforward [67, 93] in communication theory—"a needed prescription or plan for a feedback, to which the actual feedback may or may not confirm" [79]—where a communicator provides "the context of what one was planning to talk about" [64, p. 179-80] in order to "pre-test the impact of [its output]" on the listener [34, p. 65].

      sentences describing connections to theory; one sentence at a time