Guy Korland

How to choose the right graph clustering algorithm

Graph Clustering Algorithms: Usage and Comparison

March 5, 2025 No Comments

Learn about graph clustering algorithms: hierarchical, modularity-based, label propagation, spectral, and edge betweenness. Analyze their strengths, weaknesses, and optimal use cases.

Prepare for RedisGraph EOL: A Practical Guide to Transition to FalkorDB

February 16, 2025 No Comments

Explore RedisGraph EOL in this dev guide with practical migration steps and technical validation tips for switching to FalkorDB.

FalkorDB Integrates with Cognee for Smarter AI Systems

December 24, 2024 No Comments

FalkorDB’s integration with Cognee empowers developers to build smarter AI systems with improved query precision and reduced hallucinations.

Welcome to FalkorDB – The Future of Graph Databases

November 8, 2024 No Comments

At FalkorDB, we are redefining the boundaries of what’s possible with graph databases. Our advanced, ultra-low latency solution is designed to empower your data-driven applications with unparalleled performance, scalability, and ease of use. Whether you’re managing complex relationships, conducting deep analytics, or building the next generation of AI-driven applications, FalkorDB is the database you’ve been waiting for. Why Choose FalkorDB? 1. Ultra Low Latency Experience performance like never before. FalkorDB is up to 200x faster than other graph databases, ensuring that your queries return results in the blink of an eye. Whether you’re dealing with millions of nodes or billions, FalkorDB’s optimized engine ensures ultra-low latency at every scale. 2. Multi-Graph Support FalkorDB is the only graph database that fully supports multiple graphs within a single instance: Multi-Tenancy Ready: Run multiple isolated graphs on the same platform with full isolation, ensuring security and performance across tenants. Linear Scalability: Easily scale your database across clusters, distributing multiple graphs seamlessly and maintaining consistent performance as your data grows. 3. Built-In Vector Indexing & Full-Text Search Go beyond simple graph queries. With integrated vector indexing and full-text search, FalkorDB allows you to perform complex searches and similarity matching with ease, all within the same database environment. 4. Full Property Graph with Cypher Support Leverage the power of property graphs and write expressive queries with full Cypher support. FalkorDB provides a rich set of features for defining, querying, and analyzing graph data, making it easier than ever to uncover insights hidden in your data. 5. High Availability with Live Replication Never worry about downtime. FalkorDB’s high-availability architecture ensures your data is always accessible, with live replication across multiple nodes to prevent any single point of failure. 6. Fully Managed Cloud Support Deploy your graph database in the cloud with ease. FalkorDB offers fully managed cloud services, taking the hassle out of infrastructure management so you can focus on building great applications. 7. FalkorDB Browser – Graph Visualization Made Easy Visualize your data with the FalkorDB Browser, a powerful tool that provides an intuitive interface for exploring and interacting with your graphs. Understand complex relationships and uncover patterns with just a few clicks. 8. Language Support for Every Developer No matter what language you code in, FalkorDB has you covered. We offer comprehensive support for Java, Python, JavaScript, Rust, Go, and more, ensuring seamless integration with your existing tech stack. Ready to experience the power of FalkorDB? Explore our platform today and see how we’re pushing the limits of what’s possible with graph databases. Whether you’re a developer, data scientist, or enterprise architect, FalkorDB has the tools and performance you need to succeed.

FalkorDB vs Neo4j: Choosing the Right Graph Database for AI

November 5, 2024 No Comments

When building AI-driven systems, FalkorDB vs Neo4j graph databases offer different advantages. Find the best fit for your AI needs.

What Makes FalkorDB Browser a Great Tool for Knowledge Graph Visualization

Knowledge Graph Visualization: Uses, Challenges and Benefits

November 4, 2024 No Comments

Knowledge graph visualization offers deep insights, enhancing decision-making for AI applications with FalkorDB.

Vector Database vs Graph Database by falkordb

Vector Database vs Graph Database: Key Technical Differences

October 27, 2024 No Comments

Unstructured data is all the data that isn’t organized in a predefined format but is stored in its native form. Due to this lack of organization, it becomes more challenging to sort, extract, and analyze. More than 80% of all enterprise data is unstructured, and this number is growing. This type of data comes from various sources such as emails, social media, customer reviews, support queries, or product descriptions, which businesses seek to extract meaningful insights from. The rapid growth of unstructured data presents both a challenge and an opportunity for businesses. To extract insights from unstructured data, the modern approach involves leveraging large language models (LLMs) along with one of two powerful database systems for efficient data retrieval: vector databases or graph databases. These systems, combined with LLMs, enable organizations to structure, search, and analyze unstructured data. Understanding the difference between the two is crucial for developers looking to build modern AI applications or architectures like Retrieval-Augmented Generation (RAG). In this article, we dive deep into the concepts of vector databases and graph databases, exploring the key differences between them. We also examine their technical advantages, limitations, and use cases to help you make an informed decision when selecting your technology stack. What is a Vector Database? Vector databases excel at handling numerical representations of unstructured data — called embeddings — which are generated by machine learning models known as embedding models, unlike traditional databases that focus on structured data like rows and columns. These embeddings capture the semantic meaning (or, features) of the underlying data. Vector databases store, index, and retrieve data that has been transformed into these high-dimensional vectors or embeddings. You can convert any type of unstructured or higher-dimensional data into a vector embedding – text, image, audio, or even protein sequences – and this makes vector databases extremely flexible. When this data is converted into vector embeddings, the data points that are similar to each other are embedded closer in the embedding space. This allows for similarity (or, dissimilarity) searches, where you can find similar data using their corresponding vector representations. In that sense, vector databases are search engines designed to efficiently search through the higher dimensional vector space. For example, in a word embedding space, words with similar meanings or those that are often used in similar contexts would be closer together. The words “cat” and “kitten” would likely be near each other, while “automobile” would be farther away. In contrast, “automobile” might be close to words like “car” and “vehicle”. The vector representation of these words might look like this: In this context, the vector representations of the words “cat” and “kitten” are closer to each other in the vector space due to their semantic similarity, while “automobile” and “car” would be farther from them but positioned closer to each other. How does this help build retrieval systems in LLM-powered applications? An example is a Vector RAG system, where a user’s query is first converted into a vector and then compared against the vector embeddings in the database of existing data. The vectors closest to the query vector are retrieved through a similarity search algorithm, along with the data they represent. This result data is then presented to the LLM to generate a response for the user. Vector databases are valuable because they help uncover patterns and relationships between high-dimensional data points. However, they have a significant limitation: interpretability. The high-dimensional nature of vector spaces makes them difficult to visualize and understand. As a result, when a vector search yields incorrect or suboptimal results, it becomes challenging to diagnose and troubleshoot the underlying issues. What is a Graph Database? Graph databases work fundamentally differently from vector databases. Rather than using numerical embeddings to represent data, graph databases rely on knowledge graphs to capture the relationships between entities. In a knowledge graph, nodes represent entities, and edges represent the relationships between them. This structure allows for complex queries about relationships and connections, which is invaluable when the links between entities are as important as the entities themselves. In the context of our earlier example involving “cat,” “kitten,” “automobile,” and “car,” each of these concepts would be stored as nodes in a knowledge graph. The relationship between “cat” and “kitten” (e.g., “is a type of”) would be represented as an edge connecting those two nodes. Similarly, “automobile” and “car” might have an edge representing a “synonym” relationship. This would capture the “subject”-“object”-“predicate” triples that form the backbone of knowledge graphs. Graph databases are ideal when your data contains a high degree of interconnectivity and where understanding these relationships is key to answering business questions. Also, unlike vector databases, knowledge graphs stored in a graph database can be easily visualized. This allows you to explore intricate relationships within your data. Modern graph databases support a query language known as Cypher, which allows you to query the knowledge graph and retrieve results. Let’s look at how Cypher works using the example of a slightly more complex knowledge graph. To create the graph shown in the above image, you will need to construct the nodes and relationships that represent the different entities and their connections. You can use a graph database like FalkorDB to test the queries below. Here’s how we create the nodes: You can now create the relationships using Cypher in the following way: As you can see, Cypher queries are easily readable and self-explanatory. You can query the graph using the following example, where we search for players who play for Barcelona, along with their nationalities. Here’s the example output you will get: Player Nationality Lamine Yamal Spain Pedri Spain Graph databases are purpose-built to efficiently store, query, and navigate complex knowledge graphs. Designed for handling large-scale knowledge graphs, they offer advanced search and querying capabilities. These databases are especially effective for applications requiring deep relationship analysis, such as GraphRAG systems, where knowledge graphs can be integrated with LLMs. Key Differences between Vector Database and Graph Database As we saw above, vector databases are

How to Build a Knowledge Graph: A Step-by-Step Guide

September 30, 2024 No Comments

Driving meaningful insights from vast amounts of unstructured data has often been a daunting task. As data volume and variety continue to explode, businesses are increasingly seeking technologies that can effectively capture and interpret the information contained within these datasets to inform strategic decisions. Recent advancements in large language models (LLMs) have opened new avenues for uncovering the meanings behind unstructured data. However, LLMs typically lack long-term memory, necessitating the use of external storage solutions to retain the insights derived from data. One of the most effective methods for achieving this is through Knowledge Graphs. Knowledge graphs help structure information by capturing relationships between disparate data points. They allow users to integrate data from diverse sources and discover hidden patterns and connections. Recent research has shown that the use of knowledge graphs in conjunction with LLMs has led to a substantial reduction in LLM ‘hallucinations’ while improving recall and enabling better performance of AI systems. Due to their flexibility, scalability, and versatility, knowledge graphs are now being used to build AI in several domains, including healthcare, finance, and law. This article explores the concept of knowledge graphs in detail and offers a step-by-step guide to help you build an effective graph from unstructured datasets. What is a Knowledge Graph? A knowledge graph is a structured representation of information that connects entities through meaningful relationships. Entities can be any concept, idea, event, or object, while relationships are edges that connect these entities meaningfully. For instance, a knowledge graph regarding Argentina’s football team can have “Lionel Messi” and “Argentina Football Team” as distinct entities, with “Team Captain” as their relationship. The graph would mean that Lionel Messi is Argentina’s football team captain. Knowledge graphs help organize information from unstructured datasets as structured relationships, using nodes (entities) and edges (relationships) to capture data semantics. Since knowledge graph databases like FalkorDB are optimized for graph traversal and querying, you can use them not only to model relationships but also to discover hidden patterns in your data. More importantly, you can use knowledge graphs in conjunction with LLMs to build advanced AI workflows like GraphRAG. These systems enable enterprises to use unstructured data from the company knowledgebase and build LLM-powered AI systems for a wide range of use cases. In such systems, the knowledge graph stores both the data and the underlying graph, while LLMs bring natural language understanding and generation capabilities. Why Does Your Organization Need a Knowledge Graph? Organizations today must manage and extract insights from extensive datasets. Traditionally, relational and NoSQL databases were used to store structured data. However, these technologies struggle with unstructured data, such as textual information, which isn’t organized in tabular or JSON formats. To address this, vector databases emerged as a solution, representing unstructured data as numerical embeddings. These embeddings, generated by machine learning models, are high-dimensional vectors that capture the features of the underlying data, enabling searchability. Despite their advantages, vector databases present two main challenges. First, the vector representations are opaque, making them difficult to interpret or debug. Second, they rely solely on similarity between data points, lacking understanding of the underlying knowledge within the data. For instance, when large language models (LLMs) use vector databases to retrieve context-relevant information, they convert queries into embeddings. The system then finds vectors in the database that are similar to the query vector, generating responses based on these similarities. However, this process lacks explicit, meaningful relationships, making it unsuitable for scenarios that require deeper knowledge modeling. This is where knowledge graphs provide a powerful alternative. Knowledge graphs offer explainable, compact representations of data, leveraging the benefits of relational databases while overcoming the limitations of vector databases. They also work effectively with unstructured data. Consider an example of an e-commerce company analyzing unstructured data, such as customer reviews, support queries, and social media posts. While an AI system using vector databases would focus on semantic similarities, a knowledge graph would map how a user’s query relates to products, reviews, transactions, and user personas, offering a more meaningful understanding of the data. Another example is Google, which has transformed its search capabilities through the effective use of knowledge graphs. These advanced data structures allow the search engine to understand and process queries with a level of sophistication that mimics human understanding. By leveraging knowledge graphs, Google enhances the user experience significantly. When you search for “Paris,” for instance, you’re not just inundated with links that mention the name. You get insights into its landmarks, historical figures associated with it, and even connections to cultural elements like art or cuisine. This not only makes finding information quicker but also enriches the search experience with layers of context. Through these sophisticated structures, knowledge graphs enable Google to provide search results that are not only relevant but also insightful, transforming the way users interact with information on the internet. In summary, knowledge graphs can help organizations build AI systems that are: Explainable: Knowledge graphs provide clear, interpretable relationships between data points, allowing users to understand how information is connected. Contextual: They model explicit relationships within the data, offering a deeper, context-aware understanding compared to simple vector similarities. Cross-Domain: Knowledge graphs can integrate diverse data sources (structured, semi-structured, and unstructured) into a unified representation, enabling holistic analysis. Searchable: By structuring relationships and entities, knowledge graphs facilitate more accurate and meaningful search results beyond pattern matching or vector comparisons. Scalable: They are capable of scaling with the increasing volume of unstructured data, organizing it into a structured format that’s easier to query and analyze. Able to Handle Complex Queries: Knowledge graphs can answer complex, multi-step queries that require an understanding of hierarchical or multi-level relationships, which relational or vector databases cannot handle as effectively. Key Features and Components of Knowledge Graphs The following sections will list and explain a few components required for building knowledge graphs. Understanding how they work will help you improve your graph depending on the nature of your data. Entities Entities are discrete fundamental concepts required for building knowledge graphs. They represent abstract

Advanced RAG Techniques: What They Are & How to Use Them

August 14, 2024 No Comments

Retrieval-Augmented Generation (RAG) has become a mainstream approach for working with large language models (LLMs) since its introduction in early research. At its core, RAG gathers knowledge from various sources and generates answers using a language model. However, with basic RAG, also known as Naive RAG, you may encounter challenges in obtaining accurate results for complex queries and face slow response times and higher costs when dealing with large datasets. To address these challenges, researchers have developed several advanced RAG techniques. This article provides an overview of these advanced methods to help you achieve better results when Naive RAG falls short. Understanding Retrieval-Augmented Generation (RAG) Every RAG application can be broken down into two phases: retrieval and generation. First, RAG retrieves relevant documents or knowledge snippets from external sources, such as knowledge graphs or vector stores, using search and indexing techniques. This retrieved data is then fed into a language model, which generates contextually rich and accurate responses by synthesizing the retrieved information with its pre-trained knowledge. RAG systems have evolved as the requirements have become more complex. You can now classify a RAG system into one of the following categories. Naive RAG This is the most basic form of RAG, where the system directly uses the retrieved data as input for the generation model without applying advanced techniques to refine the information. It also doesn’t incorporate any enhancements during the generation step. Modular RAG This architecture separates the retrieval and generation components into distinct, modular parts. It allows for flexibility in swapping out different retrieval or generation models without disturbing the entire code. Advanced RAG In advanced RAG, complex techniques like re-ranking, auto-merging, and advanced filtering are used to improve either the retrieval step or the generation step. The goal is to ensure that the most relevant information is retrieved in the shortest time possible. Advanced RAG techniques improve upon the efficiency, accuracy, and relevance of information retrieval and subsequent content generation. By applying these methods, you can tackle complex queries, handle diverse data sources, and create more contextually aware AI systems. Let’s explore some of these techniques in detail. Advanced RAG Techniques In this section, we’ll categorize advanced RAG techniques into four areas: Pre-Retrieval and Data-Indexing Techniques, Retrieval Techniques, Post-Retrieval Techniques, and Generation Techniques. Pre-Retrieval and Data-Indexing Techniques Pre-retrieval techniques focus on improving the quality of the data in your Knowledge Graph or Vector Store before it is searched and retrieved. You can use them where data cleaning, formatting, and organization of the information is needed. Clean, well-formatted data improves the quality of the data retrieved which, in turn, influences the final response generated by the LLM. Noisy data, on the other hand, can significantly degrade the quality of the retrieval process, leading to irrelevant or inaccurate responses from the LLM. Below are some of the ways you can pre-process your data. #1 – Increase Information Density Using LLMs When working with raw data, you often encounter extraneous information or irrelevant content that can introduce noise into the retrieval process. For example, consider a large dataset of customer support interactions. It would include lengthy transcripts of useful insights as well as off-topic or irrelevant content. You would want to increase information density before the data ingestion step, to achieve higher-quality retrieval. One approach to achieve this is by leveraging LLMs. LLMs can extract useful information from raw data, summarize overly verbose text, or isolate key facts, thereby increasing the information density. You can then convert this denser, cleaner data into a knowledge graph using Cypher queries or embeddings for more effective retrieval. #2 – Deduplicate Information in Your Data Index Using LLMs Data duplication in your dataset can affect retrieval accuracy and response quality, but you can address this with a targeted technique. One approach is to use clustering algorithms like K-means, which group together chunks of data with the same semantic meaning. These clusters can then be merged into single chunks using LLMs, effectively eliminating duplicate information. Consider the example of a corporate document repository that contains multiple policy documents related to customers. The same information might be present in the following way: Document 1: “Employees must ensure all customer data is stored securely. Customer data should not be shared without consent.” Document 2: “All customer data must be encrypted. Consent is required before sharing.” Document 3: “Ensure customer data is securely stored. Do not share customer data without obtaining consent from the right stakeholders.” The deduplicated text will be: Consolidated Text: “Customer data must be securely encrypted and stored, and sharing of customer data is prohibited without explicit consent.” Researchers have employed similar techniques to produce high-quality pre-training data for LLMs. #3 – Improve Retrieval Symmetry With a Hypothetical Question Index Hypothetical Question Indexing uses a language model to generate one or more questions for each data chunk stored in the database. These questions can later be used to inform the retrieval step. During retrieval, the user query is semantically matched with all the questions generated by the model. Similar questions to the user query are then retrieved, and the chunk pointing to the most similar question is then passed on to the LLM to generate a response. The key to this method is allowing the LLM to pre-generate the questions and store them along with the document chunks. Retrieval Techniques These techniques involve optimizing the process of retrieving relevant information from the underlying data store. This includes implementing indexing strategies to efficiently organize and store data, utilizing ranking algorithms to prioritize results based on relevance, and applying filtering mechanisms to refine search outputs. #1 – Optimize Search Queries Using LLMs This technique restructures the user’s query into a format that is more understandable by the LLM and usable by retrievers. Here, you first process the user query through a fine-tuned language model to optimize and structure it. This process removes any irrelevant context and adds necessary metadata, ensuring the query is tailored to the underlying data store. GraphRAG applications already utilize this technique which

GraphRAG system venn diagram from FalkorDB

What is GraphRAG? Different Types, Limitations, and When to Use

July 31, 2024 No Comments

Retrieval-augmented generation (RAG) has emerged as a powerful technique to address key limitations of large language models (LLMs). By augmenting LLM prompts with relevant data retrieved from various sources, RAG ensures that LLM responses are factual, accurate, and free from hallucinations. However, the accuracy of RAG systems heavily relies on their ability to fetch relevant, verifiable information. Naive RAG systems, built using vector store-powered semantic search, often fail in doing so, especially with complex queries that require reasoning. Additionally, these systems are opaque and difficult to troubleshoot when errors occur. In this article, we explore GraphRAG, a superior approach for building RAG systems. GraphRAG is explainable, leverages graph relationships to discover and verify information, and has emerged as a frontier technology in modern AI applications. Explainability Knowledge graphs offer a clear advantage in making AI decisions more understandable. By visualizing data as a graph, users can navigate and query information seamlessly. This clarity allows for tracing errors and understanding provenance and confidence levels—critical components in explaining AI decisions. Unlike traditional LLMs, which often provide inscrutable outputs, knowledge graphs illuminate the reasoning logic, ensuring that even complex decisions are comprehensible. What is GraphRAG? GraphRAG is a RAG system that combines the strengths of knowledge graphs and large language models (LLMs). In GraphRAG, the knowledge graph serves as a structured repository of factual information, while the LLM acts as the reasoning engine, interpreting user queries, retrieving relevant knowledge from the graph, and generating coherent responses. Emerging research shows that GraphRAG significantly outperforms vector store-powered RAG systems. Research has also shown that GraphRAG systems not only provide better answers but are also cheaper and more scalable. To understand why, let’s look at the underlying mechanics of how knowledge is represented in vector stores versus knowledge graphs. Understanding RAG: The Foundation of GraphRAG RAG, a term first coined in a 2020 paper, has now become a common architectural pattern for building LLM-powered applications. RAG systems use a retriever module to find relevant information from a knowledge source, such as a database or a knowledge base, and then use a generator module (powered by LLMs) to produce a response based on the retrieved information. How RAG Works: Retrieval and Generation During the retrieval process in RAG, you find the most relevant information from a knowledge source based on the user’s query. This is typically achieved using techniques like keyword matching or semantic similarity. You then prompt the generator module with this information to generate a response using LLMs. In semantic similarity, for instance, data is represented as numerical vectors generated by AI embeddings models, which try to capture its meaning. The premise is that similar vectors lie closer to each other in vector space. This allows you to use the vector representation of a user query to fetch similar information using an approximate nearest neighbor (ANN) search. Keyword matching is more straightforward, where you use exact keyword matches to find information, typically using algorithms like BM25. Limitations of RAG and How GraphRAG Addresses Them Naive RAG systems built with keyword or similarity search-based retrieval fail in complex queries that require reasoning. Here’s why: Suppose the user asks a query: Who directed the sci-fi movie where the lead actor was also in The Revenant? A standard RAG system might: Retrieve documents about The Revenant. Find information about the cast and crew of The Revenant. But fail to identify that the lead actor, Leonardo DiCaprio, starred in other movies and subsequently determine their directors. Queries such as the above require the RAG system to reason over structured information instead of relying purely on keyword or semantic search. The process should ideally be: Identify the lead actor. Traverse the actor’s movies. Retrieve directors. To effectively create systems that can answer such queries, you need a retriever that can reason over information. Enter GraphRAG. GraphRAG Benefits: What Makes It Unique? Knowledge graphs capture knowledge through interconnected nodes and entities, representing relationships and information in a structured form. Research has shown that it is similar to how the human brain structures information. Continuing the above example, the knowledge graph system would use the following graph to arrive at the right answer: The GraphRAG response would then be: “Leonardo DiCaprio, the lead actor in ‘The Revenant,’ also starred in ‘Inception,’ directed by Christopher Nolan.” Complex queries are natural to human interaction. They can arise in myriad domains, from customer chatbots to search engines, or when building AI agents. GraphRAG, therefore, has gained prominence as we build more user-facing AI systems. GraphRAG systems offer numerous benefits over traditional RAG: Enhanced Knowledge Representation: GraphRAG can capture complex relationships between entities and concepts. Explainable and Verifiable: GraphRAG allows you to visualize and understand how the system arrived at its response. This helps with debugging when you get incorrect results. Complex Reasoning: The integration of LLMs enables GraphRAG to better understand the user’s query and provide more relevant and coherent responses. Flexibility in Knowledge Sources: GraphRAG can be adapted to work with various knowledge sources, including structured databases, semi-structured data, and unstructured text. Scalability and Efficiency: GraphRAG systems, built with fast knowledge graph stores like FalkorDB, can handle large amounts of data and provide quick responses. Researchers found that GraphRAG-based systems required between 26% and 97% fewer tokens for LLM response generation by providing more relevant data. Common RAG Use Cases and Challenges Does GraphRAG solve the use cases that typical RAG systems have to handle? Traditional RAG systems have found applications across various domains, including: Question Answering: Addressing user queries by retrieving relevant information and generating comprehensive answers. Summarization: Condensing lengthy documents into concise summaries. Text Generation: Creating different text formats (e.g., product descriptions, social media posts) based on given information. Recommendation Systems: Providing personalized recommendations based on user preferences and item attributes. However, these systems often encounter challenges such as: Inaccurate Retrieval: Vector-based similarity search might retrieve irrelevant or partially relevant documents. Limited Context Understanding: Difficulty in capturing the full context of a query or document. Factuality and Hallucination: Potential generation of

USE CASES

SOLUTIONS

GraphRAG-SDK

Code Graph

Browser

Ultra-fast, multi-tenant graph database using sparse matrix representations and linear algebra, ideal for highly technical teams that handle complex data in real-time, resulting in fewer hallucinations and more accurate responses from LLMs.

COMPARE

FalkorDB reduces computational overhead by leveraging sparse matrices and linear algebra operations, minimizing vCPU usage, lowering infrastructure costs, and reducing licensing expenses.

RESOURCES

COMMUNITY