Table 3:
Example of more human opener
Table 3:
Example of more human opener
MemGPT utilizes memory to increase engagement: As seen in Table 3 and Figure 6, MemGPTis able to craft engaging openers that perform similarly to and occasionally exceed the hand-writtenhuman openers
GPT can use past interactions to greet you and come up with 'openers' in a much more natural and human way
Table 2: Deep memory retrieval (DMR) performance
This shows that the model excels in finding very specific information from a long time ago
table shows performance between GPT 3/4 and MemGPT
Showing 5 of 50 results (page 1/10):
ranks it similar to how embeddings usually work
Consistency - The agent should maintain conversational coherence. New facts, preferences,and events mentioned should align with prior statements from both the user and agent.• Engagement - The agent should draw on long-term knowledge about the user to personalizeresponses. Referencing prior conversations makes dialogue more natural and engaging
How they measured performance in testing
Figure 4: An example conversation snippet where MemGPT corrects information about the user bywriting to main context (and replacing a section of text in working context).
Example of MemGPT correcting/updating information
SELF-DIRECTED EDITING AND RETRIEVAL
Talks about how they emulate how the OS handles memory via the function calls
Resursive summariza-tion (Wu et al., 2021b) is a simple way to address overflowing context windows, however, recursivesummarization is inherently lossy and eventually leads to large holes in the memory of the system
One method used to solve the context window issue is recursive summarization, but this method is lossy, and iteratively lossy at that.
In LLM-based conversational agents, a significant portion of maincontext tokens is generally used to hold a ‘system message’ or ‘preprompt’ that dictates the natureof the interaction to the system, while the remainder of the tokens can be used to hold conversationdata
A lot of this token limit is taken up by system messages, pre-prompts, and other things like historical information.
Table 1:
Maximum number of tokens for each of the popular models
we treat context windows as a constrained memory resource, and design a memoryhiearchy for LLMs analogous to memory tiers used in traditional OSes (
Context window treated as RAM while the external context acts as hard drives, basically making the LLM its own OS that they call 'LLM OS'
MemGPT manages the control flow between the memory management, theLLM processing module, and user. This design allows for repeated context modifications during asingle task, allowing the agent to more effectively utilize its limited context.
Does everything in Figure 1 completely autonomously
https://memgpt.ai
Has Code, Paper, and Dataset
ABSTRACT
This paper goes over GLORY, (Global-LOcal news Recommendation sYstem), which is a designed to improve personalized news recommendation systems by combining global and local representations.
Most existing methods focus on extracting semantic information from textual data with NLP techniques. However, these approaches lack a global perspective that can provide additional information about user motivations and behaviors.
GLORY was created to address these issues and evaluation results on two public news datasets show that GLORY outperforms existing approaches by offering more diverse recommendations.
7 LIMITATIONSFurthermore, this paper explores the limitations of our approach.First, our approach faces an efficiency issue during training asGLORY needs to acquire and process global information from boththe global news graph and global entity graph for each sample. Con-sequently, this leads to increased memory and time requirementsduring training, compared to using only local information. Secondly,our approach trains and validates using two public news datasetscontaining click data from limited time periods, this restricts usto use a static global graph and precludes testing on dynamicallychanging real-world data.8 CONCLUSION AND FUTURE WORK
Limitations:
Efficiency issue during training: GLORY requires processing global information from both the global news graph and global entity graph for each sample, leading to increased memory and time requirements compared to using only local information. Limited datasets: The model trains and validates on two public news datasets containing click data from limited time periods, restricting the use of a static global graph and preventing testing on dynamically changing real-world data.
Conclusion: GLORY is a novel news recommendation system that uses global graphs to improve news modeling. It captures hidden behavior and patterns in users' reading history using the global news graph and uncovers deeply hidden associations between candidate and historical news articles using the global entity graph. By leveraging both global and local information, the model significantly outperforms state-of-the-art models in experiments on two datasets.
Future Work: Explore dynamic global graphs that consider the freshness of news items and user behaviors, making them suitable for real-time online recommendation systems.
5.2 Ablation StudyWe aim to enhance our understanding of the effectiveness of eachcomponent in GLORY by conducting ablation studies on degradedmodels. Specifically, we conduct four experiments on entity andnews graph: 1) full model; 2) w/o g-news: without global newsgraph enhanced representation; 3) w/o g-entity: without globalentity graph enhanced representation; 4) w/o g-news/entity: with-out both aforementioned components. In each experiment, all othersettings remained unchanged as in the full model, with only onecomponent being either replaced or removed.
The ablation study helps to understand the effectiveness of each component in GLORY by conducting experiments on degraded models.
The main findings are:
The full GLORY model performs the best among all variations. However, each individual component also plays a critical role in achieving optimal results.
The global news graph has the biggest impact, athough not by much. Removing it causes a drop in the AUC metric from 68.15 to 67.53, demonstrating the effectiveness of the graph components in enhancing news recommendations. This may be because the global graphs provide hidden information beyond local semantics and can effectively improve the performance of news recommendations.
The study also shows that GGNN outperforms the other graph models.
In summary, the ablation study highlights the importance of global graph components in enhancing news recommendations and demonstrates the superior performance of GGNN as a graph encoder in the GLORY model.
EXPERIMENTAL SETUP
Next they just lay out how they evaluated the model, but ill just skip to the results of the evaluation.
Global-aware Candidate News Encoder
The Global-aware Candidate News Encoder improves candidate news recommendations using a global entity graph. The Global Entity Graph is an undirected graph is constructed using entities from user reading history. The edges represent co-occurrences of entities in consecutive news articles. Global Entity Encoder: It selects the top neighbor entities for each entity in candidate news based on edge weights and uses the same entity embedding layer as the local-view entity encoder. Candidate News Aggregator: Combines local news representation, local entity representation, and global entity representation for each candidate news article using an attention pooling network to learn the candidate news representation. News Recommendation: Negative sampling technique is employed during the training process, estimating the click score for each news item and optimizing the positive samples' log-likelihood loss over the training dataset.
User Encoder
The User Encoder uses a multi-head attention layer and an attention pooling layer to learn a user representation from the obtained news embeddings of the user's reading history.
Global News Graph.
This next part describes a method for personalized news recommendations using a Global News Graph and several components to process the information.
There are a few components that make this part up: Global News Graph: A directed graph is constructed using users' reading histories in the training dataset. It contains news articles as nodes and edges representing the reading sequence of articles by users. Graph Encoder: For each user, a subgraph is extracted from the Global News Graph to capture a global perspective on their interests. Graph Neural Networks (GNN) are used to encode the subgraph and obtain a global news embedding. Gated Graph Neural Network (GGNN): A specific type of GNN that employs a Gated Recurrent Unit (GRU) to capture sequence-based hidden behavior information. Historical News Aggregator: For each historical news article, the model combines the local news representation, local entity representation, and global title representation using an attention pooling network. This allows the model to learn a historical news representation by aggregating the three different representations.
These components work together to create a powerful model for personalized news recommendations that accounts for both local and global information in users' reading histories.
Local News Representation.
The local news encoder extracts local news representation from the news text. And, local news refers to a news article that is a part of the user's reading history. This next part lays out the math and how it handles the data: For each news article in the user's history, the model first represents the words as vectors using pre-trained embeddings. To obtain a single fixed-size representation for the entire news article, the model uses a text attention mechanism. This mechanism allows the model to focus on the most important words in the article and weigh their contributions accordingly. The model computes an attention score for each word in the article, with higher scores representing more important or relevant words. The score is based on the word's representation and the model's current parameters. The local news representation is obtained by taking a weighted sum of the word representations, using the attention scores as weights.
METHOD
Their new GLORY method first uses a local perspective to learn representation of news text and entities, using the globally aware historical news encoder and the global-aware entity news encoder.to give it global context. A concise user encoder is then applied before the final recommendation is made.
Graph-based News Recommendation
There are a number of cool methods here from other papers related to graph-based news recommendations.
NLP has begun to use graph-based techniques to learn news representations. GERL learns text-based news representations from news titles and topics, and uses user-news graphs to learn graph-based news representations. GNewsRec discovers long-term user interests from heterogeneous user-news-topic graphs. User-as-graph aggregates the local news graph formed by a user’s reading history with a heterogeneous graph pooling method. DIGAT constructs a se-mantic augmentation graph from semantically related methods in order to enrich candidate news content.
INTRODUCTION
News Recommendation (NR) is the process of recommending news articles to users by optimizing the accuracy of predicting their relevancy. This type of recommendation is challenging due to its dynamic nature: timeliness, novelty, rapidly change the relevancy of news articles. Content-based recommendations employing NLP and ML methods to extract user interests have proven effective here. They typically use previously read news article data to locate new articles to recommend.
Deep learning approaches have gained popularity due to their ability to deal with unstructured text like news content and titles. Research on modeling user preferences and graph-based methods have both seen increases in popularity as well.
These methods mainly focus on individual users and lack a global context. Implementing data from other users can uncover more implicit hidden user behaviors, ash shown in Figure 1. The real challenge is how to properly incorporate this global context into the recommendation system.
They proposed GLORY, or Global-LOcal news Recommendation sYstem, to address these issues. It incorporates historical news interaction data for more in depth relational information than that of solely semantic relationships via a global-aware historic news encoder (providing global perspectives via global news graphs). Likewise, it uses a global-aware candidate news news encoder to help address data sparsity issues, using a global entity graph to provide better associations for candidate news. (Candidate news refers to a set of news articles or items that can be recommended to a user).
They then talked about using "the multi-head self-attention mechanism to extract user interests from historical news." I had to break this down to understand. Multi-head self-attention mechanism is a technique used in deep learning models, particularly in the Transformer architecture, to capture complex relationships within a sequence of data. It works by allowing the model to weigh and attend to different parts of the input sequence, capturing dependencies and interactions between different elements. The authors use this to analyze the historical news articles that a user has interacted with or read. By doing so, they can identify patterns and relationships in the user's reading habits, thereby extracting their interests.
From all of this they calculated a matching score from a combination of user and candidate news vectors.
ABSTRACT
The research paper introduces a novel framework called Pairwise Intent Graph Embedding Learning (PING) to tackle the challenges of feature sparsity and interaction sparsity in context-aware recommender systems (CARS). PING efficiently incorporates knowledge graphs into CARS through three key modules: a graph construction module that creates a pairwise intent graph with nodes for users, items, entities, and enhanced intent; a pairwise intent joint graph convolution module that refines feature embeddings using a customized convolution strategy; and a recommendation module that utilizes refined embeddings to improve the performance of downstream recommendation models.
t sometimes producessyntactically invalid or semantically incorrect code, especially for longer or more complex program
True, but this has been vastly improved for the python language now that it has a (somewhat limited) python code interpreter. [Cant make any html calls or access internet]
PT-4 can even execute pseudocode,which requires interpreting informal and vague expressions that are not valid in any programming language.
Some have even had the ability to create their own pseudocode language and then use GPT to convert from it to any language of their choice with minimal errors and post-editing.
RecInDial [244] combines the
Although different, similar to our company recommendations based off the knowledge we get from user data. Could potentially be used to create stronger company recommendations
5.4 LLM-augmented KG-to-text Generation
This step could be useful when explaining to people how we got their recommendations. Explain the data extracted, and then use this to explain how we used this information to get additional information via the knowledge graph. "Looks Like you are interested in Amazon. Amazon is a company in North America that is in the Logistics, Retail, and Technology industries... From this we were able to find companies like [companies] that you may also be interested in."
To furtherexplore advanced LLMs, AutoKG design several promptsfor different KG construction tasks (e.g., entity typing, entitylinking, and relation extraction). Then, it adopts the promptto perform KG construction using ChatGPT and GPT-4.
GREAT FOR PROMPT ENGINEERING!!!
PiVE [166] pro-poses a prompting with an iterative verification frameworkthat utilizes a smaller LLM like T5 to correct the errors inKGs generated by a larger LLM (e.g., ChatGPT)
**Need to look into this: ** https://arxiv.org/pdf/2305.12392.pdf
5.3.4 End-to-End KG Construction
A better approach to adding regions to the KG from Location data. It not only keeps the location data in the KG but also makes it broader and broader as it broadens out to the entire region, adding every step along the way to KG. This could be used to provide closer relations between companies in same city/state/province/country than those simply in the same region.
Coreference Resolution (CR)
BERT may be useful for extracting information directly from text without inferring
Entity Linking (EL), as known as entity disambiguation,involves linking entity mentions appearing in the text totheir corresponding entities in a knowledge graph. [207]
We talked about this exact thing last week.
Unlike the task-specific methods, GenerativeNER [202]uses a sequence-to-sequence LLM with a pointer mecha-nism to generate an entity sequence, which is capable ofsolving all three types of NER sub-tasks.
LOOK INTO
Named Entity Recognition (NER) involves identifyingand tagging named entities in text data with their positionsand classifications. The named entities include people, or-ganizations, locations, and other types of entities.
Could be used for companies
Fig. 21. The general framework of LLM-based KG construction
VERY HELPFUL for showing entity extraction
5.3.1 Entity Discovery
*** Exactly what we are trying to get out of the email and calendar data.
The above methods could effectively fuse knowledge withthe textual representations in the large language models.However, real-world knowledge is subject to change andthe limitation of these approaches is that they do not permitupdates to the incorporated knowledge without retrainingthe model
Valid concern with training or fine-tuning model on existing KG that is subject to change.
1) LLM-augmented KG embedding includes studies thatapply LLMs to enrich representations of KGs byencoding the textual descriptions of entities andrelations.
Currently a part of the KG by using GPT4 to add descriptions of companies from their about pages.
2) KG-enhanced LLM inference includes research thatutilizes KGs during the inference stage of LLMs,which enables LLMs to access the latest knowledgewithout retraining.
This does seem like a very similar approach to that of my fine-tuning stage so far.
ome researchers have proposed toincorporate KGs into LLMs during the pre-training stage,which can help LLMs learn knowledge from KGs [92], [93].Other researchers have proposed to incorporate KGs intoLLMs during the inference stage.
Really good approaches. More difficult but likely more effective than mine. I would be using KGs in the fine-tuning stage by training it on my KG-based generated prompts and associated information.
2.3.3 Domain-specific Knowledge Graphs
This is what my KG is
1) KG-enhanced LLMs, which incorporate KGs during thepre-training and inference phases of LLMs, or for the purpose of enhancing understanding of the knowledge learned by LLMs; 2)LLM-augmented KGs, that leverage LLMs for different KG tasks such as embedding, completion, construction, graph-to-textgeneration, and question answering; and 3) Synergized LLMs + KGs, in which LLMs and KGs play equal roles and work in a mutuallybeneficial way to enhance both LLMs and KGs for bidirectional reasoning driven by both data and knowledge.
• The paper presents a roadmap for the unification of LLMs and KGs, which consists of three general frameworks: 1. KG-enhanced LLMs, which incorporate KGs during the pre-training and inference phases of LLMs or for the purpose of enhancing understanding of the knowledge learned by LLMs. 2. LLM-augmented KGs, which leverage LLMs for different KG tasks such as embedding, completion, construction, graph-to-text generation, and question answering. 3. Synergized LLMs + KGs, in which LLMs and KGs play equal roles and work in a mutually beneficial way to enhance both LLMs and KGs for bidirectional reasoning driven by both data and knowledge.