Guy Korland

Knowledge Graph and LLM Integration: Benefits & Challenges

July 1, 2024 No Comments

What is LLM and Knowledge Graph Integration? In today’s AI landscape, there are two key technologies that are transforming machine understanding, reasoning, and natural language processing: Large Language Models (LLMs) and Knowledge Graphs (KGs). LLMs, like OpenAI’s GPT series or Meta’s Llama series, have shown incredible potential in generating human-like text, answering complex questions, and creating content across diverse fields. Meanwhile, KGs help organize and integrate information in a structured way, allowing machines to understand and infer the relationships between real-world entities. They encode entities (such as people, places, and things) and the relationships between them, making them ideal for tasks such as question-answering and information retrieval. Emerging research has demonstrated that the synergy between LLMs and KGs can help us create AI systems that are more contextually aware and accurate. In this article, we explore different methods for integrating the two, showing how this can help you harness the strengths of both. Source: https://arxiv.org/html/2406.08223v2 Knowledge Graph and LLM Integration Approaches You can think of the interaction between LLMs and KGs in three primary ways. First, there are Knowledge-Augmented Language Models, where KGs can be used to enhance and inform the capabilities of LLMs. Second, you have LLMs-for-KGs, where LLMs are used to strengthen and improve the functionality of KGs. Finally, there are Hybrid Models, where LLMs and KGs work together to achieve more advanced and complex results. Let’s look at all three. 1. Knowledge-Augmented Language Models (KG-Enhanced LLMs) A direct method to integrate KGs with LLMs is through Knowledge-Augmented Language Models (KALMs). In this approach, you augment your LLM with structured knowledge from a KG, thus enabling the model to ground its predictions in reliable data. For example, KALMs can significantly improve tasks like Named Entity Recognition (NER) by using the structured information from a KG to accurately identify and classify entities in text. This method allows you to combine the generative power of LLMs with the precision of KGs, resulting in a model that is both generative and accurate. 2. LLMs for KGs Another approach is to use Large Language Models (LLMs) to simplify the creation of Knowledge Graphs (KGs). LLMs can assist in creating the knowledge graph ontology. You can also use LLMs to automate the extraction of entities and relationships. Additionally, LLMs help with the completion of KGs by predicting missing components based on existing patterns, as seen with models like KG-BERT. They also ensure the accuracy and consistency of your KG by validating and fact-checking information against the corpora. 3. Hybrid Models (LLM-KG Cooperation) Hybrid models represent a more complex integration, where KGs and LLMs collaborate throughout the process of understanding and generating responses. In these models, KGs are integrated into the LLM’s reasoning process. One such approach is where the output generated by an LLM is post-processed using a Knowledge Graph. This ensures that the responses provided by the model align with the structured data in the graph. In this scenario, the KG serves as a validation layer, correcting any inconsistencies or inaccuracies that may arise from the LLM’s generation process. Alternatively, you can build the AI workflow such that the LLM prompt is created by querying the KG for relevant information. This information is then used to generate a response, which is finally cross-checked against the KG for accuracy. Benefits of Knowledge Graph and LLM Integration There are numerous benefits to integrating LLMs with Knowledge Graphs. Here are a few. 1. Enhanced Data Management Integrating KGs with LLMs allows you to manage data more effectively. KGs provide a structured format for organizing information, which LLMs can then access and use to generate informed responses. KGs also allow you to visualize your data, which you can use to identify any inconsistencies. Very few data management systems provide the kind of flexibility and simplicity that KGs offer. 2. Contextual Understanding By combining the structured knowledge of KGs with the language processing abilities of LLMs, you can achieve a deeper contextual understanding of AI systems. This integration allows your models to use the relationships between different pieces of information and helps you build explainable AI systems. 3. Collaborative Knowledge Building The KG-LLM integration also helps create systems where KGs and LLMs continuously improve upon each other. As the LLM processes new information, your algorithm can update the KG with new relationships or facts which, in turn, can be used to improve the LLM’s performance. This adaptive process can ensure that your AI systems continually improve and stay up-to-date. 4. Dynamic Learning By leveraging the structured knowledge that KGs provide, you can build LLM-powered AI systems in fields such as healthcare or finance, where data is dynamic and constantly evolving. Keeping your KG continuously updated with the latest information ensures that LLMs have access to accurate and relevant context. This enhances their ability to generate precise and contextually appropriate responses. 5. Improved Decision-Making One of the most significant benefits of integrating KGs with LLMs is the enhancement of decision-making processes. By grounding its decisions in structured, reliable knowledge, your AI system can make more informed and accurate choices, reducing the likelihood of errors and hallucinations and improving overall outcomes. An example of this is a GraphRAG system, which is increasingly being used to augment LLM responses with factual, grounded data that wasn’t a part of its training dataset. Challenges in LLM and Knowledge Graph Integration 1. Alignment and Consistency One of the main challenges you may face in integrating KGs with LLMs is ensuring alignment and consistency between the two. Since KGs are structured, while LLMs are more flexible and generative, aligning the outputs of an LLM with the structure and rules of a KG can be challenging. To ensure that both systems work together, you will need to create a mitigator that’s responsible for prompting the LLM, as well as issuing KG queries when the LLM needs additional context. 2. Real-Time Querying Another challenge is real-time querying. While KGs can provide highly accurate and structured information, querying them in real-time can be computationally expensive and

Knowledge graph vs vector database: Which one to choose?

July 1, 2024 No Comments

Large Language Models (LLMs) are powerful Generative AI models that can learn statistical relationships between words, which enables them to generate human-like text, translate languages, write different kinds of creative content, and answer questions in an informative way. Since the birth of the Transformer architecture introduced in the “Attention Is All You Need” paper, we have seen the emergence of increasingly powerful LLMs. However, Large Language Models (LLMs) by themselves are not enough. This is because of two key reasons. First, they tend to hallucinate, which means they can “make up” facts and information that are simply untrue. LLMs work by predicting the next token in a sequence and are inherently probabilistic. This means they can generate factually incorrect statements, especially when prompted on topics outside their training data or when the training data itself is inaccurate. This brings us to the second limitation: companies looking to build AI applications using LLMs that leverage their internal data cannot solely rely on these models, as they are limited to the data on which they were originally trained. To bypass the above limitations, the performance of an LLM can be augmented by connecting it to an external data source. Here’s how it works: upon receiving a query, relevant information is fetched from the data source and sent to the LLM before response generation. This way, the behavior of an LLM can be ‘grounded,’ while also harnessing its analytical capabilities. This approach is known as a Retrieval Augmented Generation (RAG) system. Making Your Data AI-Ready Effective AI requires the right data in the right place. Understanding data types is crucial to this process. Here’s what it takes to get data AI-ready and how to handle massive amounts of relevant data: Data Types: Identify and classify different data types your AI will use. Structured data (like databases) and unstructured data (like text and images) need different handling techniques. Data Preparation: Clean, normalize, and format your data to ensure it’s usable by AI systems. This involves removing duplicates, correcting errors, and standardizing formats. Storage Solutions: Opt for scalable storage solutions that can handle large volumes of data efficiently. Cloud-based storage options offer flexibility and scalability. Data Integration: Ensure seamless integration of data from multiple sources. This can involve using APIs, middleware, or data lakes to consolidate information. Ongoing Management: Regularly update and maintain your data to keep it relevant and accurate. Implement data governance policies to manage data quality and compliance. This comprehensive approach ensures that your data is not just available but optimized for AI applications, enhancing both performance and reliability. Whether you’re just beginning your AI journey or are well underway, having AI-ready data is a key component to success. One of the most powerful aspects of the RAG architecture is its ability to unlock the knowledge stored in unstructured data in addition to structured data. In fact, a significant portion of data globally—estimated at 80% to 90% by various analysts—is unstructured, and this is a huge untapped resource for companies to leverage. This makes LLM-powered RAG applications one of the most powerful approaches in the AI domain. Knowledge Graphs and Vector Databases are two widely used technologies for building RAG applications. They differ significantly in terms of the underlying abstractions they use to operate and, as a result, offer different capabilities for data querying and extraction. This blog will help you understand when and why to choose either of these technologies so that you can make the most of using AI to understand and leverage your data. What is a Knowledge Graph? A Knowledge Graph is a structured representation of information. It organizes data into nodes (entities) and edges (the relationships between them). Here’s a simple example of a Knowledge Graph around the game of football. Lionel Messi “plays for” Paris Saint-Germain (PSG) Lionel Messi “represents” Argentina Cristiano Ronaldo “plays for” Manchester United Cristiano Ronaldo “represents” Portugal Paris Saint-Germain (PSG) “competes in” UEFA Champions League Manchester United “competes in” UEFA Champions League Argentina “competes in” FIFA World Cup Portugal “competes in” FIFA World Cup Mauricio Pochettino “manages” Paris Saint-Germain (PSG) Ole Gunnar Solskjær “manages” Manchester United The words in bold represent the entities, and the ones in inverted commas are the relationships between these entities. It has been said that the human brain behaves like a Knowledge Graph and, therefore, this way of structuring data makes it highly human-readable. Knowledge Graphs are very useful in discovering deep and intricate connections in your data. This enables complex querying capabilities. For example, you could ask: “Find all players who have played under Pep Guardiola at both FC Barcelona and Manchester City, and have also scored in a UEFA Champions League final for either of these clubs.” A Knowledge Graph will be able to handle this query by traversing the relationships between different entities such as players, managers, clubs, and match events. There are several popular Knowledge Graphs that are publicly available: Wikidata Freebase YAGO DBpedia You are, however, not limited to using these available Knowledge Graphs to build your applications; any form of unstructured data can be modeled as a Knowledge Graph. For instance, consider a company with a repository of customer service emails. By extracting key entities and relationships from these emails, such as customer names, issues, resolutions, and timestamps, you can create a Knowledge Graph that maps the interactions and solutions. Similarly, you can identify relationships between objects in an image and model these connections as a Knowledge Graph to build image clustering and recognition algorithms. Knowledge Graphs are stored in specialized databases that are designed to handle Cypher, a powerful query language specifically designed for interacting with graph databases. The Knowledge Graph Database, also known as Graph Database, is optimized for traversing and manipulating graph structures, and is the fundamental building block behind Knowledge Graph-powered RAG applications (GraphRAG). Understanding Depth Parameters in Graph Databases Depth parameters play a crucial role in navigating and analyzing relationships within graph databases. They help define the extent of traversal allowed from a starting

Beyond Rows and Columns: Exploring the Missing Third Dimension

May 7, 2024 No Comments

If you are working with data, you might be familiar with the concepts of rows and columns, which are the basic building blocks of most database models. However, there is another dimension that is often overlooked or ignored, which can offer new possibilities and insights for your data analysis. In this blog post, I will compare three common database storage models: row-based, column-based and network-based. These models affect how data is stored, accessed and manipulated in a database system along three dimensions. The first dimension The first dimension, and the most common one, is the row-based (aka Relational) model, where data is stored in rows, where each row represents a record or an entity. For example, a row in a table of customers might contain the name, address, phone number and email of a single customer. Some examples of row-based databases are MySQL, PostgreSQL and Oracle. Row-based databases are good for transactional processing, where you need to insert, update or delete individual records quickly and frequently. They are also good for queries that involve many columns or attributes of a record, such as finding all the customers who live in a certain city and have a certain age range. In a broader view Document databases and Key-Value databases like MongoDB and Redis are also Row-based databases, and are optimized for full entity retrieval. The second dimension The second dimension is the column-based (aka Columnar) model, where data is stored in columns, where each column represents an attribute or a feature of a record. For example, a column in a table of customers might contain the names of all the customers, another column might contain their addresses, and so on. Some examples of column-based databases are Cassandra, ScyllaDB and Amazon Redshift. Column-based databases are good for analytical processing, where you need to perform calculations or aggregations on large amounts of data. They are also good for queries that involve few columns or attributes of a record, such as finding the average age of all the customers. In a broader view Time Series databases like InfluxData and Timescale are also column-based databases, and are optimized for multi-events aggregations. The third model The third model is the network-based model (aka Graph), where data is stored in records that have references to other records. For example, a record in a set of customers might contain the name and address of a single customer, as well as a pointer to another record that contains the phone number and email of the same customer. Some examples of Graph databases are Neo4j, Amazon Neptune and FalkorDB. Found the third dimension Graph databases are not just another type of database, but rather the missing third dimension in the database world. Unlike the first two dimensions, which optimized the way the data is stored for their use cases, most graph databases use Adjacency lists, which are not efficient for retrieving and traversing cross-entity references (edges). FalkorDB is the first and only database that is optimized for the third dimension, putting the edge storage at the center by using GraphBLAS to represent the relations topology. By replacing adjacency lists with adjacency matrices, FalkorDB ensures that the edge storage is the most suitable for use cases that need to access edges and traverse the graph. As you can see, each database storage model has its own advantages and disadvantages depending on the type and purpose of your data. You should choose the model that best suits your needs and requirements. In some cases, you might even use a hybrid approach that combines different models to optimize your performance and functionality.

Survey: GraphRAG and Knowledge Graphs for Large Language Models

May 5, 2024 No Comments

The seminal paper “Unifying Large Language Models and Knowledge Graphs: A Roadmap” published on June 14, 2023, presents a comprehensive framework for integrating the emergent capabilities of Large Language Models (LLMs) with the structured knowledge representation of Knowledge Graphs (KGs) Authored by Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu, the paper outlines three general frameworks for this unification: KG-enhanced LLMs, LLM-augmented KGs, and Synergized LLMs + KGs. These frameworks aim to leverage the strengths of both LLMs and KGs to enhance AI’s inferential and interpretative abilities, address the construction and evolution challenges of KGs, and promote bidirectional reasoning driven by data and knowledge. The paper’s roadmap is a forward-looking guide that reviews existing efforts and pinpoints future research directions, marking a pivotal contribution to the field of natural language processing and artificial intelligence. GraphRAG: A New Frontier for LLMs GraphRAG stands as a significant advancement in enhancing the capabilities of LLMs, particularly in the context of private datasets. A recent publication from Microsoft Research titled “GraphRAG: Unlocking LLM discovery on narrative private data” introduces GraphRAG as a method to improve question-and-answer performance when analyzing complex information. The technique uses LLM-generated knowledge graphs alongside graph machine learning to perform prompt augmentation at query time, showing substantial improvements over baseline RAG approaches. Another notable work is detailed in the arXiv paper “From Local to Global: A Graph RAG Approach to Query-Focused Summarization”. This paper proposes a Graph RAG approach that scales with the generality of user questions and the quantity of source text to be indexed. It uses an LLM to build a graph-based text index, which then aids in generating comprehensive and diverse answers for global sensemaking questions. Knowledge Graphs: Enhancing LLM Precision Knowledge Graphs serve as a structured representation of knowledge, capturing relationships between entities. They have been increasingly used as context sources for LLMs to provide more precise and relevant outputs. The paper “Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations” explores the use of Knowledge Graphs to reduce the risk of model hallucinations and ensure high precision in the context of personalized education. Furthermore, the research titled “Knowledge Graph Large Language Model (KG-LLM) for Link Prediction” investigates the use of Knowledge Graphs for improving the link prediction capabilities of LLMs, showcasing the potential of combining structured knowledge with generative models. The article “Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling” discusses the integration of knowledge graphs (KGs) with large language models (LLMs) to improve their ability to recall facts and provide factually accurate content1. It reviews existing models enhanced with KGs and proposes the development of knowledge graph-enhanced large language models (KGLLMs), which aim to boost the factual reasoning capabilities of LLMs, paving the way for more informed and reliable AI interactions The paper “LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities” provides a comprehensive evaluation of large language models like GPT-4 for tasks in knowledge graph construction and reasoning, highlighting their strengths in inference over few-shot information extraction. It also introduces AutoKG, a novel multi-agent-based approach that leverages LLMs and external sources to enhance the construction and reasoning processes within knowledge graphs. The Synergy of GraphRAG and Knowledge Graphs The intersection of GraphRAG and Knowledge Graphs with LLMs is a burgeoning field of study that promises to unlock new capabilities for AI systems. By leveraging the structured nature of Knowledge Graphs and the dynamic querying ability of GraphRAG, LLMs can achieve a higher level of understanding and reasoning. This synergy is evident in the paper “LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT”, which demonstrates how LLMs can assist in the engineering of Knowledge Graphs, leading to more efficient and effective AI solutions. Conclusion The integration of GraphRAG and Knowledge Graphs with LLMs is a testament to the ongoing innovation in the field of AI. As researchers continue to explore these technologies, we can expect to see AI systems that not only understand and generate text but also exhibit a deeper level of reasoning and knowledge representation. The surveyed publications provide a glimpse into this exciting future, where AI becomes more intertwined with structured data and complex problem-solving. This survey provides a snapshot of the current state of research at the intersection of GraphRAG, Knowledge Graphs, and LLMs. For developers and researchers like yourself, these advancements offer a wealth of opportunities to enhance the capabilities of your projects and applications. Keep an eye on these developments as they are likely to influence the next generation of AI technologies significantly. To read more about this topic see: RAG Battle: Vector Database Vs Knowledge Graph

FalkorDB 4.0 Beta released – Major improvements and Critical bug fixes!

November 2, 2023 No Comments

We’re excited to announce that FalkorDB 4.0 Beta is now available for download and testing. FalkorDB is a graph database that builds on the legacy of RedisGraph, which was discontinued by Redis a few months ago. FalkorDB aims to provide a fast, scalable and reliable graph solution for your data needs. Docker container: docker run -it -p 6379:6379 -p 7687:7687 falkordb/falkordb:v4.0.1 Try it on Docker: sudo docker run -it -p 6379:6379 -p 7687:7687 falkordb/falkordb Free FalkorDB Cloud: https://app.falkordb.cloud/ FalkorDB 4.0.0 first beta introduces some major features and enhancements over RedisGraph: Vector indexing support: You can now index your graph nodes and edges based on a vector representation. Bolt protocol support: You can now connect to FalkorDB using the popular Bolt protocol, which is widely used by Neo4J clients and tools. This makes it easier to migrate from Neo4J to FalkorDB with almost no changing your code or queries. Note, Bolt Websocket is not supported yet. Bug fixes and stability improvements: We have fixed several critical bugs that could cause crashes and data loss in RedisGraph, and improved the overall performance and stability of FalkorDB. Some of the notable critical bug fixes: Zero length traversals are now treated as variable length traversals, which prevents server crashes and data loss. (#478) Properties function now returns a new map instead of modifying the original one, which prevents data corruption and inconsistency. (#462) CREATE clauses should not access their own entities, which prevents server crashes and data loss (#425) We encourage all RedisGraph users to test FalkorDB 4.0 as soon as possible, as we plan to fix more critical bugs and release the final version of FalkorDB 4.0 soon. We hope you enjoy using FalkorDB 4.0 beta and we welcome your feedback and suggestions. Please feel free to contact us on Discord or GitHub issues/discussions if you have any questions.

Released! FalkorDB 4.0-a1 – Vector Search Index & Bolt Protocol

November 2, 2023 No Comments

We are thrilled to announce the release of FalkorDB version 4.0.0-a1, a major update that brings two exciting features to our graph database platform. Check the new version docker container (we plan to release a cloud sandbox soon) docker run -it -p 6379:6379 -p 7687:7687 falkordb/falkordb:4.0.0-alpha.1 Notice: the examples bellow are in Java but if Java is not your cup of tea you can find the same examples in other languages here: https://github.com/FalkorDB/demos Vector Index The first feature is Vector Index support, which allows you to find nodes using vector similarity search. This means you can store and query high-dimensional vectors, such as embeddings or image features, and find the most similar nodes based on cosine similarity or Euclidean distance. This opens up new possibilities for applications such as recommendation systems, natural language processing, computer vision, and RAG (we’ll extend in future blogs). Vector Index example package com.falkordb; import redis.clients.jedis.UnifiedJedis; import redis.clients.jedis.graph.ResultSet; import redis.clients.jedis.graph.Record; public class FalkorDBVectorDemo { public static void main(String args[]) { try (UnifiedJedis jedis = new UnifiedJedis("redis://localhost:6379")) { // Create Vector index on field description in Character jedis.graphQuery("Books", "CREATE VECTOR INDEX FOR (c:Character) ON (c.description) OPTIONS {dimension:5, similarityFunction:'euclidean'}"); // Fill in the Graph with some data on books and characters jedis.graphQuery("Books", "CREATE " + "(:Character {name:'Bastian Balthazar Bux', description:vecf32([0.1, 0.3, 0.3, 0.4, 0.7])})-[:in]->(book1:Book {name:'The Neverending Story'}), " + "(:Character {name:'Atreyu', description:vecf32([0.3, 0.6, 0.2, 0.1, 0.4])})-[:in]->(book1), " + "(:Character {name:'Jareth', description:vecf32([0.1, 0.3, 0.1, 0.2, 0.9])})-[:in]->(book2:Book {name:'Labyrinth'}), " + "(:Character {name:'Hoggle', description:vecf32([0.3, 0.2, 0.5, 0.7, 0.9])})-[:in]->(book2)"); // Find the book with the character description that is most similar (k=1) to the user's query ResultSet result = jedis.graphQuery("Books", "CALL db.idx.vector.queryNodes(" + "'Character', 'description', 1, vecf32([0.1, 0.4, 0.3, 0.2, 0.7])) " + "YIELD entity " + "MATCH (entity)-[]->(b:Book) " + "RETURN b.name AS name"); // Print out the name for (Record record : result) { System.out.println(record.getString("name")); } } } } Bolt Protocol The second feature is support for Bolt protocol, which allows seamless transition from Neo4J to FalkorDB. If you are already using Neo4J and want to switch to FalkorDB, you can do so without changing your code or your data model. You can use the same drivers and tools that you are familiar with, and enjoy the benefits of FalkorDB’s scalability, performance, and flexibility. Bolt protocol example package com.falkordb; import redis.clients.jedis.UnifiedJedis; import redis.clients.jedis.graph.ResultSet; import redis.clients.jedis.graph.Record; public class FalkorDBVectorDemo { public static void main(String args[]) { try (UnifiedJedis jedis = new UnifiedJedis("redis://localhost:6379")) { // Create Vector index on field description in Character jedis.graphQuery("Books", "CREATE VECTOR INDEX FOR (c:Character) ON (c.description) OPTIONS {dimension:5, similarityFunction:'euclidean'}"); // Fill in the Graph with some data on books and characters jedis.graphQuery("Books", "CREATE " + "(:Character {name:'Bastian Balthazar Bux', description:vecf32([0.1, 0.3, 0.3, 0.4, 0.7])})-[:in]->(book1:Book {name:'The Neverending Story'}), " + "(:Character {name:'Atreyu', description:vecf32([0.3, 0.6, 0.2, 0.1, 0.4])})-[:in]->(book1), " + "(:Character {name:'Jareth', description:vecf32([0.1, 0.3, 0.1, 0.2, 0.9])})-[:in]->(book2:Book {name:'Labyrinth'}), " + "(:Character {name:'Hoggle', description:vecf32([0.3, 0.2, 0.5, 0.7, 0.9])})-[:in]->(book2)"); // Find the book with the character description that is most similar (k=1) to the user's query ResultSet result = jedis.graphQuery("Books", "CALL db.idx.vector.queryNodes(" + "'Character', 'description', 1, vecf32([0.1, 0.4, 0.3, 0.2, 0.7])) " + "YIELD entity " + "MATCH (entity)-[]->(b:Book) " + "RETURN b.name AS name"); // Print out the name for (Record record : result) { System.out.println(record.getString("name")); } } } } We hope you enjoy the new version of FalkorDB and we look forward to hearing your feedback. Please let us know if you have any questions or issues on our GitHub discussions or our Discord server Happy graphing!

What is RAG (Retrieval Augmented Generation)?

September 6, 2023 No Comments

In this blog post, I will explain what is RAG, why it is useful, and how to build it using Vector Database and Knowledge Graph as a leading option for RAG. I will also give some examples of use cases that need RAG and how it can improve the quality and accuracy of the generated text. Large Language Models (LLMs) are powerful tools for natural language processing, capable of generating fluent and coherent text for various tasks. However, LLMs also have some limitations, such as their knowledge base being stale, incomplete, or inaccurate. To overcome these challenges, we can use a technique called Retrieval Augmented Generation (RAG), which allows us to provide LLMs with relevant and up-to-date information from external data sources. In this blog post, I will explain what RAG is, why it is useful, and how to build it using Vector Database and Knowledge Graph as a leading option for RAG. I will also give some examples of use cases that need RAG and how it can improve the quality and accuracy of the generated text. From: https://gpt-index.readthedocs.io/en/latest/getting_started/concepts.html What is RAG? RAG is a process for retrieving information relevant to a task, providing it to the language model along with a prompt, and relying on the model to use this specific information when responding. For example, if we want to generate a summary of a news article, we can use RAG to retrieve related articles or facts from a database and feed them to the LLM as additional context. The LLM can then use this information to generate a more accurate and informative summary. RAG is different from fine-tuning, which involves training the LLM on new data to adapt it to a specific domain or task. Fine-tuning can be time-consuming, expensive, and not offer a significant advantage in many scenarios. RAG, on the other hand, allows us to use the same LLM as a general reasoning and text engine, while providing it with the necessary data in real time. This way, we can achieve customized solutions while maintaining data relevance and optimizing costs. What should a RAG provide? To implement RAG, we need two components: an LLM and a data source. The LLM can be any pretrained model that supports text generation, such as GPT-3 or T5. The data source can be any collection of documents or facts that are relevant to our task or domain. However, not all data sources are equally suitable. Ideally, we want a data source that is: – Up-to-date: The data should reflect the latest information available on the topic of interest. – Comprehensive: The data should cover all the aspects and details that are relevant to the task or domain. – Accurate: The data should be reliable and trustworthy, free of errors or biases. – Efficient: The data should be easy to access and query, with low latency and high throughput. How to build RAG using Vector Database and Knowledge Graph? One of the leading options for building such a data source is using a combination of Vector Database and Knowledge Graph. A vector database is a database that stores data as vectors, which are numerical representations of objects or concepts. A knowledge graph is a graph that stores data as nodes and edges, which represent entities and their relationships. By combining these two technologies, we can create a powerful data source that meets all the criteria above. A vector database allows us to store and retrieve data based on similarity or relevance. For example, if we want to find documents that are related to a given query, we can use a vector database to compare the query vector with the document vectors and return the most similar ones. A vector database also enables fast and scalable queries, as it can leverage efficient indexing and search algorithms. A knowledge graph allows us to store and retrieve data based on semantics or meaning. For example, if we want to find facts that are related to a given entity, we can use a knowledge graph to traverse the graph from the entity node and return the connected nodes and edges. A knowledge graph also enables rich and structured queries, as it can leverage logical inference and reasoning. By combining a vector database and a knowledge graph, we can create a data source that can answer both similarity-based and semantics-based queries. For example, if we want to find information about COVID-19 vaccines, we can use a vector database to find documents that are similar to our query, and then use a knowledge graph to extract facts from those documents. This way, we can obtain both relevant and informative data for our task. To build RAG using these data sources, we need to follow these steps: 1. Preprocess the data: We need to transform our raw data (e.g., text documents) into vectors and graphs. We can use various methods for this step, such as word embeddings, sentence embeddings, document embeddings, entity extraction, relation extraction, etc. 2. Store the data: We need to store our vectors and graphs in a vector database and a knowledge graph respectively. 3. Query the data: We need to query our data source based on our task or prompt. We can use various methods for this step, such as natural language queries, keyword queries, vector queries, graph queries, etc. 4. Generate the text: We need to provide the LLM with the query and the retrieved data as context, and ask it to generate a response. We can use various methods for this step, such as prompt engineering, few-shot learning, zero-shot learning, etc. Examples of use cases that need RAG RAG can be useful for many use cases that involve text generation, especially when the LLM’s knowledge is insufficient or outdated. Here are some examples of such use cases: – Summarization: RAG can help generate summaries of long or complex texts, such as news articles, research papers, books, etc. We can provide the LLM with additional information from related sources, such as other articles, facts, opinions, etc. This can help the LLM generate more accurate and informative summaries that capture the main points and perspectives of the text. – Question answering: RAG can

Building a Q&A System

August 31, 2023 No Comments

If you are looking for a simple way to build a Q&A system based on your knowledge graph, you should check out LangChain. Langchain allows you to easily query your Knowledge Graph using natural language. In this blog post, I will show you in a simple 4 steps how to use Langchain query a knowledge graph based on FalkorDB. 1. Installing LangChain First, you need to install LangChain on your machine. You can download it from the official website or use the command line: > pip install langchain 2. Starting FalkorDB server locally Staring a local FalkorDB is as simple as running a local docker you can go read on the documentation other ways to run it > docker run -p 6379:6379 -it –rm falkordb/falkordb:latest 6:C 26 Aug 2023 08:36:26.297 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 6:C 26 Aug 2023 08:36:26.297 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=6, just started … … 6:M 26 Aug 2023 08:36:26.322 * <graph> Starting up FalkorDB version 99.99.99. 6:M 26 Aug 2023 08:36:26.324 * <graph> Thread pool created, using 8 threads. 6:M 26 Aug 2023 08:36:26.324 * <graph> Maximum number of OpenMP threads set to 8 6:M 26 Aug 2023 08:36:26.324 * <graph> Query backlog size: 1000 6:M 26 Aug 2023 08:36:26.324 * Module 'graph' loaded from /FalkorDB/bin/linux-x64-release/src/falkordb.so 6:M 26 Aug 2023 08:36:26.324 * Ready to accept connections Running the demo The rest of this blog will cover the simple steps you can take to get started, you can find the notebook as part of the LlamaIndex repository: falkordb.ipynb 3. Create a Knowledge Graph Now, let’s create a demo knowledge graph of movies and the leading actors. from langchain.chat_models import ChatOpenAI from langchain.graphs import FalkorDBGraph from langchain.chains import FalkorDBQAChain graph = FalkorDBGraph(database="movies") graph.query(""" CREATE (al:Person {name: 'Al Pacino', birthDate: '1940-04-25'}), (robert:Person {name: 'Robert De Niro', birthDate: '1943-08-17'}), (tom:Person {name: 'Tom Cruise', birthDate: '1962-07-3'}), (val:Person {name: 'Val Kilmer', birthDate: '1959-12-31'}), (anthony:Person {name: 'Anthony Edwards', birthDate: '1962-7-19'}), (meg:Person {name: 'Meg Ryan', birthDate: '1961-11-19'}), (god1:Movie {title: 'The Godfather'}), (god2:Movie {title: 'The Godfather: Part II'}), (god3:Movie {title: 'The Godfather Coda: The Death of Michael Corleone'}), (top:Movie {title: 'Top Gun'}), (al)-[:ACTED_IN]->(god1), (al)-[:ACTED_IN]->(god2), (al)-[:ACTED_IN]->(god3), (robert)-[:ACTED_IN]->(god2), (tom)-[:ACTED_IN]->(top), (val)-[:ACTED_IN]->(top), (anthony)-[:ACTED_IN]->(top), (meg)-[:ACTED_IN]->(top) """) 4. Creating the FalkorDB QA Chain Last step, before we start querying the graph is setting up the Langchain’s chain, first we need to set our OpenAI key and then connect it all using FalkorDBQAChain. import os os.environ['OPENAI_API_KEY']='API_KEY_HERE' graph.refresh_schema() chain = FalkorDBQAChain.from_llm( ChatOpenAI(temperature=0), graph=graph, verbose=True) 5. Querying the Graph You are all set, you can start querying the Knowledge Graph… Let’s try a couple of questions. chain.run("Who played in Top Gun?") > Entering new FalkorDBQAChain chain… Generated Cypher: MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Top Gun' RETURN p.name Full Context: [['Tom Cruise'], ['Val Kilmer'], ['Anthony Edwards'], ['Meg Ryan'], ['Tom Cruise'], ['Val Kilmer'], ['Anthony Edwards'], ['Meg Ryan']] > Finished chain. 'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.' chain.run("Robert De Niro played in which movies?") > Entering new FalkorDBQAChain chain… Generated Cypher: MATCH (p:Person {name: 'Robert De Niro'})-[:ACTED_IN]->(m:Movie) RETURN m.title Full Context: [['The Godfather: Part II'], ['The Godfather: Part II']] > Finished chain. 'Robert De Niro played in "The Godfather: Part II".'

Building and Querying a Knowledge Graph

August 26, 2023 No Comments

If you are looking for a simple way to create and query a knowledge graph based on your internal documents, you should check out LlamaIndex. LlamaIndex is a tool that allows you to easily build and search a knowledge graph using natural language queries. In this blog post, I will show you in a simple 6 steps how to use LlamaIndex to create and explore a knowledge graph based on FalkorDB. Installing LlamaIndex First, you need to install LlamaIndex on your machine. You can download it from the official website or use the command line: Copy > pip install llama-index Starting FalkorDB server locally Staring a local FalkorDB is as simple as running a local docker you can go read on the documentation other ways to run it Copy > docker run -p 6379:6379 -it –rm falkordb/falkordb:latest 6:C 26 Aug 2023 08:36:26.297 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 6:C 26 Aug 2023 08:36:26.297 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=6, just started … … 6:M 26 Aug 2023 08:36:26.322 * <graph> Starting up FalkorDB version 99.99.99. 6:M 26 Aug 2023 08:36:26.324 * <graph> Thread pool created, using 8 threads. 6:M 26 Aug 2023 08:36:26.324 * <graph> Maximum number of OpenMP threads set to 8 6:M 26 Aug 2023 08:36:26.324 * <graph> Query backlog size: 1000 6:M 26 Aug 2023 08:36:26.324 * Module 'graph' loaded from /FalkorDB/bin/linux-x64-release/src/falkordb.so 6:M 26 Aug 2023 08:36:26.324 * Ready to accept connections Running the demo The rest of this blog will cover the simple steps you can take to get started, you can find the notebook as part of the LlamaIndex repository: FalkorDBGraphDemo.ipynb Set your OpenAI key Get you OpenAI key from the https://platform.openai.com/account/api-keys and set in the code bellow Copy import os os.environ["OPENAI_API_KEY"] = "API_KEY_HERE" Connecting to FalkorDB with FalkorDBGraphStore Notice you might need to install the redis pytong client if it’s missing Copy #> pip install redis from llama_index.graph_stores import FalkorDBGraphStore graph_store = FalkorDBGraphStore("redis://localhost:6379", decode_responses=True) #… INFO:numexpr.utils:NumExpr defaulting to 8 threads. Building the Knowledge Graph Next, we’ll load some sample data using SimpleDirectoryReader Copy from llama_index import ( SimpleDirectoryReader, ServiceContext, KnowledgeGraphIndex, ) from llama_index.llms import OpenAI from IPython.display import Markdown, display # loading some local document documents = SimpleDirectoryReader( "../../../../examples/paul_graham_essay/data" ).load_data() Now all that is left to do is let LlamaIndex utilize the LLM to generate the Knowledge Graph Copy from llama_index.storage.storage_context import StorageContext # define LLM llm = OpenAI(temperature=0, model="gpt-3.5-turbo") service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512) storage_context = StorageContext.from_defaults(graph_store=graph_store) # NOTE: can take a while! index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, storage_context=storage_context, service_context=service_context, ) Checking behind scenes I If you would like to learn more about how the Knowledge Graph is built behind the scenes you can run MONITOR command in advance and watch the Cypher commands flowing in. Copy > redis-cli monitor 127.0.0.1:6379>"GRAPH.QUERY" "falkor" "CYPHER subj="we" obj="way to scale startup funding" MERGE (n1:`Entity` {id:$subj}) MERGE (n2:`Entity` {id:$obj}) MERGE (n1)-[:`STUMBLED_UPON`]->(n2)" "–compact" 127.0.0.1:6379>"GRAPH.QUERY" "falkor" "CYPHER subj="startups" obj="isolation" MERGE (n1:`Entity` {id:$subj}) MERGE (n2:`Entity` {id:$obj}) MERGE (n1)-[:`FACED`]->(n2)" "–compact" 127.0.0.1:6379>"GRAPH.QUERY" "falkor" "CYPHER subj="startups" obj="initial set of customers" MERGE (n1:`Entity` {id:$subj}) MERGE (n2:`Entity` {id:$obj}) MERGE (n1)-[:`GET`]->(n2)" "–compact" Querying the Knowledge Graph Now you can easily query the Knowledge Graph using free speech e.g. Copy query_engine = index.as_query_engine(include_text=False, response_mode="tree_summarize") response = query_engine.query( "Tell me more about Interleaf", ) display(Markdown(f"<b>{response}</b>")) … Interleaf is a software company that was founded in 1981. It specialized in developing and selling desktop publishing software. The company's flagship product was called Interleaf, which was a powerful tool for creating and publishing complex documents. Interleaf's software was widely used in industries such as aerospace, defense, and government, where there was a need for creating technical documentation and manuals. The company was acquired by BroadVision in 2000. Checking behind scenes II Once again If you would like to learn more about how the Knowledge Graph is queried behind the scenes you can run MONITOR command in advance and watch the Cypher commands flowing in. > redis-cli monitor 127.0.0.1:6379>"GRAPH.QUERY" "falkor" "CYPHER subjs=["Interleaf"] MATCH (n1:Entity) WHERE n1.id IN $subjs WITH n1 MATCH p=(n1)-[e*1..2]->(z) RETURN p" "–compact"

Building & Querying a Knowledge Graph from Unstructured Data

August 16, 2023 No Comments

Diffbot API, FalkorDB, and LangChain are a great combination for building intelligent applications that can understand and answer questions from unstructured data. Diffbot API has a powerful API that can extract structured data from unstructured documents, such as web pages, PDFs, or emails. With Diffbot API, you can create a Knowledge graph that represents the entities and relationships in your documents, and store it in FalkorDB. Then, you can use Langchain, to query your Knowledge graph and get answers to your questions. Langchain can handle complex and natural queries, and return relevant and accurate answers from your Knowledge graph. 1. Installing LangChain First, you need to install LangChain and some dependencies on your machine. You can download it from the official website or use the command line: pip install langchain langchain-experimental openai redis wikipedia 2. Starting FalkorDB server locally Staring a local FalkorDB is as simple as running a local docker you can go read on the documentation other ways to run it > docker run -p 6379:6379 -it –rm falkordb/falkordb:latest 6:C 26 Aug 2023 08:36:26.297 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 6:C 26 Aug 2023 08:36:26.297 # Redis version=7.2.1, bits=64, commit=00000000, modified=0, pid=6, just started … … 6:M 26 Aug 2023 08:36:26.322 * <graph> Starting up FalkorDB version 99.99.99. 6:M 26 Aug 2023 08:36:26.324 * <graph> Thread pool created, using 8 threads. 6:M 26 Aug 2023 08:36:26.324 * <graph> Maximum number of OpenMP threads set to 8 6:M 26 Aug 2023 08:36:26.324 * <graph> Query backlog size: 1000 6:M 26 Aug 2023 08:36:26.324 * Module 'graph' loaded from /FalkorDB/bin/linux-x64-release/src/falkordb.so 6:M 26 Aug 2023 08:36:26.324 * Ready to accept connections Running the demo The rest of this blog will cover the simple steps you can take to get started, you can also find try the Google Colab notebook 3. Create a Knowledge Graph Now, let’s create a demo knowledge graph of Warren Buffett using Wikipedioa from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer from langchain.document_loaders import WikipediaLoader diffbot_api_key = "DIFFBOT_API_KEY" diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key) query = "Warren Buffett" raw_documents = WikipediaLoader(query=query).load() graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents) 4. Storing the Knowledge Graph in FalkorDB Last step storing the knowledge Graph to FalkorDB from langchain.graphs import FalkorDBGraph graph = FalkorDBGraph( "falkordb", ) graph.add_graph_documents(graph_documents) graph.refresh_schema() 5. Querying the Graph You are all set, you can start querying the Knowledge Graph… Let’s try a couple of questions. %env OPENAI_API_KEY=OPENAI_API_KEY from langchain.chains import GraphCypherQAChain from langchain.chat_models import ChatOpenAI chain = GraphCypherQAChain.from_llm( cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4"), qa_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"), graph=graph, verbose=True, ) chain.run("Which university did Warren Buffett attend?") > Entering new GraphCypherQAChain chain… Generated Cypher: MATCH (p:Person {name: "Warren Buffett"})-[:EDUCATED_AT]->(o:Organization) RETURN o.name Full Context: [['Woodrow Wilson High School'], ['Alice Deal Junior High School'], ['Columbia Business School'], ['New York Institute of Finance']] > Finished chain. 'Warren Buffett attended Columbia Business School.' chain.run("Who is or was working at Berkshire Hathaway?") > Entering new GraphCypherQAChain chain… Generated Cypher: MATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name Full Context: [['Warren Buffett'], ['Charlie Munger'], ['Howard Buffett'], ['Susan Buffett'], ['Howard'], ['Oliver Chace']] > Finished chain. 'Warren Buffett, Charlie Munger, Howard Buffett, Susan Buffett, Howard, and Oliver Chace are or were working at Berkshire Hathaway.'

USE CASES

SOLUTIONS

GraphRAG-SDK

Code Graph

Browser

Ultra-fast, multi-tenant graph database using sparse matrix representations and linear algebra, ideal for highly technical teams that handle complex data in real-time, resulting in fewer hallucinations and more accurate responses from LLMs.

COMPARE

FalkorDB reduces computational overhead by leveraging sparse matrices and linear algebra operations, minimizing vCPU usage, lowering infrastructure costs, and reducing licensing expenses.

RESOURCES

COMMUNITY