33 Matching Annotations
  1. Last 7 days
    1. Such goal-driven clustering (Wang et al., 2023) is essentially relevant to many NLP sub-fields, such as topic modeling (Pham et al., 2024), inductive reasoning (Lam et al., 2024), corpus comparison (Zhong et al., 2023), information retrieval (Ni et al., 2025b) etc. In such tasks, LLM plays an important role in understanding users' goal and steering / interpreting the clustering accordingly (Zhang et al., 2023; Viswanathan et al., 2024; Movva et al., 2025).

      Find where this paper describes the use of context to inform clustering

    2. Constrained KMeans ensures that all clusters have 10 to 20 samples, so that the input (i.e., each cluster) to DeepSeek-R1 will not be too large or small, as we empirically find that large clusters (>20) may increase the reasoning burden and lead to hallucination, while small clusters (<5) may generate over-specific edge cases.

      Find where this paper describes the use of context to inform clustering

    3. To avoid over specific [Case Description] that fails to generalize to other samples, we need to aggregate item-level edge cases with similar ambiguity and describe them in a higher level. This is a challenging task requiring (1) covering all item-level edge cases; and (2) strategically finding logical similarities between reasons for ambiguity. Therefore, we employ a SOTA reasoning LLM—DeepSeek-R1 (DeepSeek-AI, 2025) to cluster [Case Description] and generate high-level edge cases and handling rules. Specifically, we extract the item-level [Case Description], embed them with semantic embedding models, and cluster them with constrained KMeans (Levy-Kramer, 2018). Each cluster of [Case Description] and corresponding [Action] are fed to DeepSeek-R1 to generate Cluster-wise Edge Cases.

      Find where this paper describes the use of context to inform clustering

    1. Zero-shot prompts are then constructed with the generated meta information, and used for actual task. This method takes advantage of the self-generated frame of LLMs to successfully carry out a given task.

      Find where this paper describes the use of context to inform clustering

    2. our method works in a text-to-text format, allowing clustering with specific context. For instance, "I love this movie" and "I hate this movie" express opposite sentiment but belong to the same cluster of movie reviews

      Find where this paper describes the use of context to inform clustering

    1. When processing a large document collection, an LLM can be used to assess the compatibility of two text passages (Zhang et al., 2023; Viswanathan et al., 2023; Choi and Ferrara, 2024), potentially in a more nuanced way than vector similarity; this problem arises in workflows for matching, routing, clustering, and fact-checking (Charlin and Zemel, 2013; Harman, 1996; and the papers just mentioned).

      Find where this paper describes the use of context to inform clustering

    1. Step 3 (§3.2.3) leverages the contextual understanding of LLMs to further refine the clusters by removing misclassified points, thereby improving the overall clustering accuracy.

      Find where this paper describes the use of context to inform clustering

    2. Firstly, an instruction inst is crafted to guide the selection process, tailored to the task's context, such as "Select one classification of the banking customer utterances that better corresponds with the query in terms of intent".

      Find where this paper describes the use of context to inform clustering

    3. The motivation for this step is to utilize the advanced contextual analysis capabilities of LLMs to identify and correct misclassified points, thereby improving the overall clustering accuracy.

      Find where this paper describes the use of context to inform clustering

    4. In the second stage, we leverage the advanced text understanding capabilities of LLMs to refine the cluster edges. This involves a soft edge points removal and re-assignment mechanism, where LLMs reassess and reassign edge points based on their semantic context. This step capitalizes on LLMs' ability to comprehend nuanced text relationships, thereby ensuring more accurate and reliable clustering results.

      Find where this paper describes the use of context to inform clustering

    1. They use a BERT encoder (Devlin et al., 2019) to represent the textual context where an entity occurs (called the ''context view''), and a TransE knowledge graph encoder (Bordes et al., 2013) to represent nodes in the open knowledge graph (called the ''fact view'').

      Find where this paper describes the use of context to inform clustering

    2. In the degraded example, we also see that these keyphrases may overly focus on each entity's surface form rather than their textual context. This suggests room for more precise modeling and prompt engineering for leveraging keyphrases for complex documents.

      Find where this paper describes the use of context to inform clustering

    3. We leverage an LLM before clustering by augmenting the textual representation. For each example, we generate keyphrases with an LLM, encode these keyphrases, and add them to the base representation.

      Find where this paper describes the use of context to inform clustering

    4. Before any cluster is produced, experts typically know what aspects of each document they wish to capture during clustering. Instead of forcing clustering algorithms to mine such key factors from scratch, it could be valuable to globally highlight these aspects (and thereby specify the task emphases) beforehand. To do so, we use an LLM to make every document's textual representation task-dependent, by enriching and expanding it with evidence relevant to the clustering need.

      Find where this paper describes the use of context to inform clustering

    1. This is particularly relevant in our domain of interest (enterprise AI and specifically customer support), where Process Owners may know which dimensions to prioritize (inclusion, e.g., customer-reported symptom or severity), or know which aspects are irrelevant to their use case (exclusion, e.g. programming language or system). Our methods enable these preferences to be reflected in the resulting clusters.

      Find where this paper describes the use of context to inform clustering

    2. Our use of prefixing is different (and, to our knowledge, novel): rather than aiming to improve performance on a given ground-truth label or metric, we seek to reshape the embedding space to discover alternative ways to cluster the same data. We achieve this by systematically changing the prefix (e.g., "Cluster by topic" vs. "Cluster by sentiment").

      Find where this paper describes the use of context to inform clustering

    3. We therefore argue that unsupervised clustering is an ill-defined problem, and that clustering is inherently subjective. In a space of many possible "correct" groupings, a generic clustering algorithm is unlikely to organize data in a way relevant to a user's specific use case.

      Find where this paper describes the use of context to inform clustering

    4. Clustering aims to group similar data points; however, similarity for data points often depends on the analyst's perspective. Consider a dataset of social media posts: a content moderator might want to group posts with similar emotional tones and obscene language indicators, whereas a medical researcher might look for groupings based on mentions of drug use or illnesses, irrespective of sentiment.

      Find where this paper describes the use of context to inform clustering

    5. Clustering is a central component of enterprise process analysis. For example, in IT Service Management (ITSM), common asks by Process Owners (POs) include "What are the most common user complaints?", "What are the underlying causes of problems?", and "Why are tickets rerouted to agents?". The answer POs seek involve grouping complex information into buckets that make sense from one's analysis perspective.

      Find where this paper describes the use of context to inform clustering

    1. When fine-tuning our model for Diplomacy game communications, we shifted from the overly broad AMR 3.0 vocabulary to the tailored Diplomacy-AMR corpus introduced above, reducing irrelevant content and focusing on game-specific nuances.

      Please highlight all phrases that discuss the model design

    2. We utilize a state-of-the-art Sequence-to-Sequence model from the Huggingface transformers library, fine-tuned with the AMR 3.0 dataset, for baseline semantic extraction. This approach facilitates the processing of AMR through amrlib, a Python module tailored for such tasks.

      Please highlight all phrases that discuss the model design

    3. Thankfully, Cicero's architecture uses a conditional language model (Bakhtin et al., 2022, Equation S2, section D.2) that generates its natural language messages given a set of moves

      Please highlight all phrases that discuss the model design

    4. Our domain-tuned model using Diplomacy-AMR improves SMATCH by 39.1, to 61.9. Adding data augmentation into the model (e.g., knowing the sender of a message is England and the recipient is Germany) improves SMATCH to 64.6. Adding separate encodings for this information further improves SMATCH by 0.8 (65.4). Additionally, we apply data processing to replace (1) pronouns with country names and (2) provinces in abbreviations with full names, which increases SMATCH to 66.6

      Please highlight all phrases that discuss the model design