Vector Database vs Graph Database: Key Technical Differences

Vector Database vs Graph Database by falkordb

Unstructured data is all the data that isn’t organized in a predefined format but is stored in its native form. Due to this lack of organization, it becomes more challenging to sort, extract, and analyze. More than 80% of all enterprise data is unstructured, and this number is growing. This type of data comes from various sources such as emails, social media, customer reviews, support queries, or product descriptions, which businesses seek to extract meaningful insights from. The rapid growth of unstructured data presents both a challenge and an opportunity for businesses. To extract insights from unstructured data, the modern approach involves leveraging large language models (LLMs) along with one of two powerful database systems for efficient data retrieval: vector databases or graph databases. These systems, combined with LLMs, enable organizations to structure, search, and analyze unstructured data.  Understanding the difference between the two is crucial for developers looking to build modern AI applications or architectures like Retrieval-Augmented Generation (RAG).  In this article, we dive deep into the concepts of vector databases and graph databases, exploring the key differences between them. We also examine their technical advantages, limitations, and use cases to help you make an informed decision when selecting your technology stack. What is a Vector Database? Vector databases excel at handling numerical representations of unstructured data — called embeddings — which are generated by machine learning models known as embedding models, unlike traditional databases that focus on structured data like rows and columns. These embeddings capture the semantic meaning (or, features) of the underlying data. Vector databases store, index, and retrieve data that has been transformed into these high-dimensional vectors or embeddings.  You can convert any type of unstructured or higher-dimensional data into a vector embedding – text, image, audio, or even protein sequences – and this makes vector databases extremely flexible. When this data is converted into vector embeddings, the data points that are similar to each other are embedded closer in the embedding space. This allows for similarity (or, dissimilarity) searches, where you can find similar data using their corresponding vector representations.  In that sense, vector databases are search engines designed to efficiently search through the higher dimensional vector space.  For example, in a word embedding space, words with similar meanings or those that are often used in similar contexts would be closer together. The words “cat” and “kitten” would likely be near each other, while “automobile” would be farther away. In contrast, “automobile” might be close to words like “car” and “vehicle”. The vector representation of these words might look like this: “cat”: [0.43, -0.22, 0.75, 0.12, …] “kitten”: [0.41, -0.21, 0.76, 0.13, …] “automobile”: [0.01, 0.62, -0.33, 0.94, …] “car”: [0.02, 0.60, -0.30, 0.91, …] In this context, the vector representations of the words “cat” and “kitten” are closer to each other in the vector space due to their semantic similarity, while “automobile” and “car” would be farther from them but positioned closer to each other. How does this help build retrieval systems in LLM-powered applications? An example is a Vector RAG system, where a user’s query is first converted into a vector and then compared against the vector embeddings in the database of existing data. The vectors closest to the query vector are retrieved through a similarity search algorithm, along with the data they represent. This result data is then presented to the LLM to generate a response for the user. Vector databases are valuable because they help uncover patterns and relationships between high-dimensional data points.  However, they have a significant limitation: interpretability. The high-dimensional nature of vector spaces makes them difficult to visualize and understand. As a result, when a vector search yields incorrect or suboptimal results, it becomes challenging to diagnose and troubleshoot the underlying issues. What is a Graph Database? Graph databases work fundamentally differently from vector databases.  Rather than using numerical embeddings to represent data, graph databases rely on knowledge graphs to capture the relationships between entities.  In a knowledge graph, nodes represent entities, and edges represent the relationships between them. This structure allows for complex queries about relationships and connections, which is invaluable when the links between entities are as important as the entities themselves. In the context of our earlier example involving “cat,” “kitten,” “automobile,” and “car,” each of these concepts would be stored as nodes in a knowledge graph. The relationship between “cat” and “kitten” (e.g., “is a type of”) would be represented as an edge connecting those two nodes. Similarly, “automobile” and “car” might have an edge representing a “synonym” relationship. This would capture the “subject”-“object”-“predicate” triples that form the backbone of knowledge graphs. Nodes: “cat”, “kitten”, “automobile”, “car” Edges: (kitten) -[: IS_A]-> (cat) (automobile) -[: SYNONYM]-> (car) Graph databases are ideal when your data contains a high degree of interconnectivity and where understanding these relationships is key to answering business questions. Also, unlike vector databases, knowledge graphs stored in a graph database can be easily visualized. This allows you to explore intricate relationships within your data.  Modern graph databases support a query language known as Cypher, which allows you to query the knowledge graph and retrieve results. Let’s look at how Cypher works using the example of a slightly more complex knowledge graph. To create the graph shown in the above image, you will need to construct the nodes and relationships that represent the different entities and their connections. You can use a graph database like FalkorDB to test the queries below.  Here’s how we create the nodes: // Creating Player nodes CREATE (:PLAYER {name: ‘Pedri’}), (:PLAYER {name: ‘Lamine Yamal’}); // Creating Manager node CREATE (:MANAGER {name: ‘Hansi Flick’}); // Creating Team node CREATE (:TEAM {name: ‘Barcelona’}); // Creating League node CREATE (:LEAGUE {name: ‘La Liga’}); // Creating Country node CREATE (:COUNTRY {name: ‘Spain’}); // Creating Stadium node CREATE (:STADIUM {name: ‘Camp Nou’}); You can now create the relationships using Cypher in the following way:  // Players play for a team MATCH (p:PLAYER {name: ‘Lamine Yamal’}), (t:TEAM {name: ‘Barcelona’}) CREATE (p)-[:PLAYS_FOR]->(t); MATCH (p:PLAYER

Efficient State Machine Modeling Using FalkorDB

img 3 FalkorDB

The latest release of FalkorDB V4.0.5 includes a new ability to easily clone graphs. In this blog post we’ll be developing a state machine framework where a machine is represented by a graph. Whenever a FSM (finite state machine) is executed a copy of the initial graph is created and the execution is bound to that dedicated clone. This approach is extremely flexible, as one can easily adjust a machine and changes will be applied to all future executions seamlessly, or in case a modification needs to be A/B tested a clone of the graph is easily made and compared to a baseline. Let’s get started, we’ll create a simple machine which: 1. Download a source file 2. Count the number of lines in that file 3. Delete the file State Machine Representation Our graph representation of a state machine is a simple DAG (directed acyclic graph) Nodes represent states, in each machine there’s a START state and an END state in addition to intermediate states. Every state node contains the following attributes: Cmd – the shell command to run Description – short description of the state Output – command output (available after execution) ExitCode – command exit code (available after execution) The states are connected to one another via a NEXT directed edge. Pipeline Running the State Machine Once we’re ready to execute our FSM, a copy of the DAG is created, this is done automatically via the handy new GRAPH.COPY command, as we don’t want to taint our machine “template” with specific execution information. The execution begins at the START state, our runner executes the state’s command, once the command completes the runner updates the states’s Output and ExitCode attributes and proceed to the next state, this process repeats itself until the last state is executed. Output: Conclusions With very little effort we’ve been able to build a simple state machine system, which takes advantage of a number of FalkorDB unique features: the ability to store, maintain and switch effortlessly between thousandths of different graphs our new feature to quickly create copies of graphs The source code of this demo is available on Github. Continuing with this demo we would love to explore an integration with one of the already established FSM frameworks, it’s our belief that FalkorDB can be an interesting backend for such systems.