38 Matching Annotations
  1. Last 7 days
    1. Empirical idealism, as Kant here characterizes it, is the view that all we know immediately (non-inferentially) is the existence of our own minds and our temporally ordered mental states, while we can only infer the existence of objects “outside” us in space. Since the inference from a known effect to an unknown cause is always uncertain, the empirical idealist concludes we cannot know that objects exist outside us in space.
  2. Apr 2024
    1. After 1836 Chaadayev continued to write articles on cultural and political issues "for the desk drawer." Chaadayev defies categorization; he was not a typical Russian Westernizer due to his idiosyncratic interest in religion; nor was he a Slavophile, even though he offered a possible messianic role for Russia in the future. He had no direct followers, aside from his "nephew" and amanuensis, Mikhail Zhikharev, who scrupulously preserved Chaadayev's manuscripts and tried to get some of them published after Chaadayev's death. Chaadayev's lasting heritage was to remind Russian intellectuals to evaluate any of Russia's supposed cultural achievements in comparison with those of the West.
  3. Nov 2023
    1. This illustration shows four alternative ways to nudge an LLM to produce relevant responses:Generic LLM - Use an off-the-shelf model with a basic prompt. The results can be highly variable, as you can experience when e.g. asking ChatGPT about niche topics. This is not surprising, because the model hasn’t been exposed to relevant data besides the small prompt.Prompt engineering - Spend time structuring the prompt so that it packs more information about the desired topic, tone, and structure of the response. If you do this carefully, you can nudge the responses to be more relevant, but this can be quite tedious, and the amount of relevant data input to the model is limited.Instruction-tuned LLM - Continue training the model with your own data, as described in our previous article. You can expose the model to arbitrary amounts of query-response pairs that help steer the model to more relevant responses. A downside is that training requires a few hours of GPU computation, as well as a custom dataset.Fully custom LLM - train an LLM from scratch. In this case, the LLM can be exposed to only relevant data, so the responses can be arbitrarily relevant. However, training an LLM from scratch takes an enormous amount of compute power and a huge dataset, making this approach practically infeasible for most use cases today.

      RAG with a generic LLM - Insert your dataset in a (vector) database, possibly updating it in real time. At the query time, augment the prompt with additional relevant context from the database, which exposes the model to a much larger amount of relevant data, hopefully nudging the model to give a much more relevant response. RAG with an instruction-tuned LLM - Instead of using a generic LLM as in the previous case, you can combine RAG with your custom fine-tuned model for improved relevancy.

    2. OUTBNDSweb: Retrieval-Augmented Generation: How to Use Your Data to Guide LLMs, https://outerbounds.com/blog/retrieval-augmented-generation/ (accessed 13 Nov 2023)

    1. Macaulay claimed that his memory was good enough to enable him to write out the whole of Paradise Lost. But when preparing his History of England, he made extensive notes in a multitude of pocketbooks of every shape and colour.

      Thomas Babington Macaulay, 1st Baron Macaulay, PC, FRS, FRSE 25 October 1800 – 28 December 1859) was a British historian and Whig politician, who served as the Secretary at War between 1839 and 1841, and as the Paymaster General between 1846 and 1848. Macaulay's The History of England, which expressed his contention of the superiority of the Western European culture and of the inevitability of its sociopolitical progress, is a seminal example of Whig history that remains commended for its prose style.

    1. Fine-tuning takes a pre-trained LLM and further trains the model on a smaller dataset, often with data not previously used to train the LLM, to improve the LLM’s performance for a particular task.

      LLMs can be extended with both RAG and Fine-Tuning Fine-tuning is appropriate when you want to customize a LLM to perform well in a particular domain using private data. For example, you can fine-tune a LLM to become better at producing Python programs by further training the LLM on high-quality Python source code.

      In contrast, you should use RAG when you are able to augment your LLM prompt with data that was not known to your LLM at the time of training, such as real-time data, personal (user) data, or context information useful for the prompt.

    2. Vector databases are used to retrieve relevant documents using similarity search. Vector databases can be standalone or embedded with the LLM application (e.g., Chroma embedded vector database). When structured (tabular) data is needed, an operational data store, such as a feature store, is typically used. Popular vector databases and feature stores are Weaviate and Hopsworks that both provide time-unlimited free tiers.
    3. RAG LLMs can outperform LLMs without retrieval by a large margin with much fewer parameters, and they can update their knowledge by replacing their retrieval corpora, and provide citations for users to easily verify and evaluate the predictions.
    4. HopWORKSweb: Retrieval Augmented Generation (RAG) for LLMs, https://www.hopsworks.ai/dictionary/retrieval-augmented-generation-llm (accessed 09 Nov 2023)

    1. The key enablers of this solution are * The embeddings generated with Vertex AI Embeddings for Text * Fast and scalable vector search by Vertex AI Vector Search

      Embeddings space is a map of the context of the meanings. Basically, values are assigned in n-dimensional space tied to the similar semantic inputs - tying meaning between concepts.

      Example of vectorized n-dimensional embedding

    2. With the embedding API, you can apply the innovation of embeddings, combined with the LLM capability, to various text processing tasks, such as:LLM-enabled Semantic Search: text embeddings can be used to represent both the meaning and intent of a user's query and documents in the embedding space. Documents that have similar meaning to the user's query intent will be found fast with vector search technology. The model is capable of generating text embeddings that capture the subtle nuances of each sentence and paragraphs in the document.LLM-enabled Text Classification: LLM text embeddings can be used for text classification with a deep understanding of different contexts without any training or fine-tuning (so-called zero-shot learning). This wasn't possible with the past language models without task-specific training.LLM-enabled Recommendation: The text embedding can be used for recommendation systems as a strong feature for training recommendation models such as Two-Tower model. The model learns the relationship between the query and candidate embeddings, resulting in next-gen user experience with semantic product recommendation.LLM-enabled Clustering, Anomaly Detection, Sentiment Analysis, and more, can be also handled with the LLM-level deep semantics understanding.
    3. Grounded to business facts: In this demo, we didn't try having the LLM to memorize the 8 million items with complex and lengthy prompt engineering. Instead, we attached the Stack Overflow dataset to the model as an external memory using vector search, and used no prompt engineering. This means, the outputs are all directly "grounded" (connected) to the business facts, not the artificial output from the LLM. So the demo is ready to be served today as a production service with mission critical business responsibility. It does not suffer from the limitation of LLM memory or unexpected behaviors of LLMs such as the hallucinations.
    4. GCloudAIweb: Vertex AI Embeddings for Text: Grounding LLMs made easy, https://cloud.google.com/blog/products/ai-machine-learning/how-to-use-grounding-for-your-llms-with-text-embeddings (accessed 09 Nov 2023)

    1. Preparation Steps * Ingest data into a database. The destination may be an array or a JSON data type. * Harmonize data. This is a lightweight data transformation step * Encode data. This step is used to convert the ingested data into embeddings. One option is to use an external API. For example, OpenAI’s ADA and sentence_transformer have many pre-trained models to convert unstructured data like images and audio into vectors. * Load embedding vectors. data is moved to a table that mirrors the original table but has an additional column of type ‘vector, ’ JSON or a blob that stores the vectors. * Performance tuning. SingleStoreDB provides JSON_ARRAY_PACK. And indexing vector using HNSW as mentioned earlier. This allows parallel scans using SIMD.

    2. In the new AI model, you ingest the data in real time, apply your models by reaching to one or multiple GPT services and action on the data while your users are in the online experience. These GPT models may be used for recommendation, classification personalization, etc., services on real-time data. Recent developments, such as LangChain and AutoGPT, may further disrupt how modern applications are deployed and delivered.
    3. Let’s say, for example, you search for a very specific product on a retailer’s website, and the product is not available. An additional API call to an LLM with your request that returned zero results may result in a list of similar products. This is an example of a vector search, which is also known as a similarity or semantic search.
    4. Modes of Private Data consumption: 1. Train Custom LLM - requires massive infrastructure, investment, and deep AI skills 2. Tune the LLM - utilizes model weights to fine-tune an existing model- new category of LLMOps - similar issue to #1 3. Prompt general-purpose LLMs - uses modeled context input with Retrieval Augmented Generation (Facebook)

      For leveraging prompts, there are two options:

      Short-term memory for LLMs that use APIs for model inputs Long-term memory for LLMs that persist the model inputs. Short-term memory is ephemeral while long-term memory introduces persistence.

    5. Conventional search works on keys. However, when the ask is a natural query, that sentence needs to be converted into a structure so that it can be compared with words that have similar representation. This structure is called an embedding. An embedding uses vectors that assign coordinates into a graph of numbers — like an array. An embedding is high dimensional as it uses many vectors to perform semantic search.

      When a search is made on a new text, the model calculates the “distance” between terms. For example, searching for “king” is closer to “man,” than to “woman.” This distance is calculated on the “nearest neighbors” using functions like, cosine, dot product and Euclidean. his is where “approximate nearest neighbors” (ANN) algorithms are used to reduce the vector search space. A very popular way to index the vector space is through a library called ‘Hierarchical Navigable Small World (HNSW).’ Many vector databases and libraries like FAISS use HNSW to speed up vector search.

    6. The different options for storing and querying vectors for long-term memory in AI search. The options include: * Native vector databases - many non-relational DBMSs are adding vectors such as Elastic. Others are Pinecone Qdrant, etc * SingleStoreDB support vector embeddings and semantic search * Apache Parquet or CSV columnar data - slow indicies if used

    7. AIMONKSweb: How to Use Large Language Models (LLMs) on Private Data: A Data Strategy Guide, https://medium.com/aimonks/how-to-use-large-language-models-llms-on-private-data-a-data-strategy-guide-812cfd7c5c79 (accessed 09 Nov 2023)

    1. Retrieval Augmented Generation (RAG) is a method in natural language processing (NLP) that combines the power of both neural language models and information retrieval methods to generate responses or text that are informed by a large body of knowledge. The concept was introduced by Facebook AI researchers and represents a hybrid approach to incorporating external knowledge into generative models.

      RAG models effectively leverage a large corpus of text data without requiring it to be stored in the parameters of the model. This is achieved by utilizing a retriever-generator framework:

      1. The Retriever component is responsible for finding relevant documents or passages from a large dataset (like Wikipedia or a corpus of scientific articles) that are likely to contain helpful information for generating a response. This retrieval is typically based on vector similarity between the query and the documents in the dataset, often employing techniques like dense passage retrieval (DPR).

      2. The Generator component is a large pre-trained language model (like BART or GPT-2) that generates a response by conditioning on both the input query and the documents retrieved by the retriever. It integrates the information from the external texts to produce more informed, accurate, and contextually relevant text outputs.

      The RAG model performs this process in an end-to-end differentiable, meaning it can be trained in a way that updates both the retriever and generator components to minimize the difference between the generated text and the target text. The retriever is typically optimized to select documents that will lead to a correct generation, while the generator is optimized to produce accurate text given the input query and the retrieved documents.

      To summarize, RAG allows a generative model to:

      • Access vast amounts of structured or unstructured external data.
      • Answer questions or generate content that requires specific knowledge not contained within the model itself.
      • Benefit from up-to-date and expansive datasets, assuming the retriever's corpus is kept current.

      RAG addresses the limitation of standard language models that must rely solely on their internal parameters for generating text. By augmenting generation with on-the-fly retrieval of relevant context, RAG-equipped models can produce more detailed, accurate, and nuanced outputs, especially for tasks like question answering, fact-checking, and content creation where detailed world knowledge is crucial.

      This technique represents a significant advancement in generative AI, allowing models to provide high-quality outputs without memorizing all the facts internally, but rather by knowing (GPT4-0web)

    2. GPT4-0web: What is Retrieval Augmented Generation (RAG)?, https://platform.openai.com/playground?mode=chat&model=gpt-4-1106-preview (accessed 09 Nov 2023)

  4. Sep 2023
    1. followers of Spinoza adopted his definition of ultimate substance as that which can exist and can be conceived only by itself. According to the first principle of his system of pantheistic idealism, God (or Nature or Substance) is the ultimate reality given in human experience.
    2. Historically, answers to this question have fallen between two extremes. On the one hand is the skepticism of the 18th-century empiricist David Hume, who held that the ultimate reality given in experience is the moment-by-moment flow of events in the consciousness of each individual. That concept compresses all of reality into a solipsistic specious present—the momentary sense experience of one isolated percipient.
    3. two basic forms of idealism are metaphysical idealism, which asserts the ideality of reality, and epistemological idealism, which holds that in the knowledge process the mind can grasp only the psychic or that its objects are conditioned by their perceptibility.
    4. idealism, in philosophy, any view that stresses the central role of the ideal or the spiritual in the interpretation of experience. It may hold that the world or reality exists essentially as spirit or consciousness, that abstractions and laws are more fundamental in reality than sensory things, or, at least, that whatever exists is known in dimensions that are chiefly mental—through and as ideas.
    1. it is this architecture, the one which is in the heads of those writing the code, that is the most important. In adopting this decentralised approach, where the practice of architectural decision-making is much more dispersed, this problem is in many ways, mitigated

      Only true in software architecture. But, in enterprise architecture - that spans domains decentralized decisions create fragmentations.

    1. For example, productivity and satisfaction are correlated, and it is possible that satisfaction could serve as a leading indicator for productivity; a decline in satisfaction and engagement could signal upcoming burnout and reduced productivity.

      Certainly not necessarily true - the correlation is mostly heuristic. I can be highly productive but dissatisfied that the productive work doesn't have value.

    2. • Design and coding. Volume or count of design documents and specs, work items, pull requests, commits, and code reviews. • Continuous integration and deployment. Count of build, test, deployment/release, and infrastructure utilization. • Operational activity. Count or volume of incidents/issues and distribution based on their severities, on-call participation, and incident mitigation.

      Honestly, a well-oiled team with strong collaboration completely outweighs any measured outputs like this. I would never want my engineers faced with performance observability like this.

    3. The SPACE framework provides a way to logically and systematically think about productivity in a much bigger space and to carefully choose balanced metrics linked to goals—and how they may be limited if used alone or in the wrong context.

      Not sure I would classify this as logical but systematic makes sense - definitely trying to put heuristic dimensions on typically unquantifiable and varied human behaviors. Clearly, this is biased to process experts and program managerial personality types that like trying to frame things into organized buckets.

    1. the brain evolved to be uncertainty-averse. When things become less predictable — and therefore less controllable — we experience a strong state of threat. You may already know that threat leads to “fight, freeze, or flight” responses in the brain. You may not know that it also leads to decreases in motivation, focus, agility, cooperative behavior, self-control, sense of purpose and meaning, and overall well-being. In addition, threat creates significant impairments in your working memory: You can’t hold as many ideas in your mind to solve problems, nor can you pull as much information from your long-term memory when you need it.
  5. Aug 2023
    1. Metrics shape behavior, so by adding and valuing just two metrics, you've helped shape a change in your team and organization. This is why it's so important to be sure to pull from multiple dimensions of the framework: it will lead to much better outcomes at both the team and system levels.

      Probably the best statement here - but, the assumption that metrics lead to better outcomes may be false.

    2. The framework is meant to help individuals, teams, and organizations identify pertinent metrics that present a holistic picture of productivity; this will lead to more thoughtful discussions about productivity and to the design of more impactful solutions

      I will give the paper credit for thinking about the issue in general.

    3. Having too many metrics may also lead to confusion and lower motivation; not all dimensions need to be included for the framework to be helpful.

      Yes

  6. May 2023
    1. The fact that a team's need for a decision to be taken can be met by themselves also leads to appropriate levels of bias-to-action, with accountability acting as a brake when it's required.

      Totally disagree - haven't see this in practice