What criteria should I use to decide whether information should be modeled as a node or an attribute in FalkorDB?

Deciding between modeling information as a node or an attribute in FalkorDB depends on three main factors: (1) Memory efficiency—attributes duplicate string values, but FalkorDB's string interning feature can mitigate this; (2) Traversal requirements—nodes enable outward traversal, while attributes require traversal through the parent node; (3) Query patterns—frequent filtering by a data point suggests node representation, while occasional display suggests attribute representation. Start with the most natural conceptual model and validate against your intended query patterns. [Source: FalkorDB Blog]

What level of document granularity should I store in FalkorDB for Retrieval-Augmented Generation (RAG) systems?

FalkorDB supports storing sentences, paragraphs, summaries, and full documents as nodes with defined relationships. This enables hierarchical context retrieval—semantic search can identify relevant fragments, and graph traversal can access parent documents or related content. This approach extends LLM context beyond isolated chunks through relationship-based navigation. [Source: FalkorDB Blog]

How can I improve attribute extraction accuracy for strict ontology schemas in FalkorDB?

To improve attribute extraction accuracy, use few-shot prompting with domain-specific examples, inject hierarchical context for document chunks, define structured output with parameter descriptions and validation rules, and apply JSON schema constraints for extraction consistency. Chunking strategies should preserve semantic context across document segments. [Source: FalkorDB Blog]

Does FalkorDB support graph embedding models or GNNs?

Currently, FalkorDB does not support graph neural networks (GNNs) or graph embedding models. The only type of embeddings supported are vector embeddings, which can be generated with LLMs or AI vendors and used for semantic search via FalkorDB's semantic index. [Source: FalkorDB Blog]

How does string interning in FalkorDB help with memory optimization?

String interning in FalkorDB allows frequently repeated values (such as country names) to be stored only once, preventing redundant storage across millions of nodes. This significantly improves memory efficiency in large-scale knowledge graphs. [Source: FalkorDB Blog]

What strategies are recommended for entity resolution and deduplication in large-scale knowledge graphs?

Deduplication is a primary challenge in large-scale knowledge graphs. Approaches include deterministic matching (using unique identifiers) and LLM-based similarity assessment to resolve entities that may be represented differently across data sources. [Source: FalkorDB Blog]

How does the property graph model in FalkorDB support schema flexibility?

The property graph model in FalkorDB allows schema evolution while maintaining existing data, supporting iterative development approaches. This flexibility enables you to adapt your knowledge graph as requirements change without losing historical data. [Source: FalkorDB Blog]

What are the benefits of graph-based data retrieval in FalkorDB compared to relational databases?

Graph-based data retrieval in FalkorDB, using edge traversal, provides superior context gathering compared to multi-table relational database joins. This enables more efficient and flexible querying of complex relationships. [Source: FalkorDB Blog]

How should I handle multilingual data in knowledge graphs built with FalkorDB?

One approach is to store the original language data (e.g., tweets) and translate the output as needed. This preserves the original context and allows for flexible localization strategies. [Source: FalkorDB Blog]

What is the recommended approach for updating ontologies when adding new entity types in FalkorDB?

When adding new entity types, you should manually extend your ontology to capture the new entities, their attributes, and relationships. FalkorDB does not enforce alignment between the underlying graph and the ontology, so this maintenance is the user's responsibility. [Source: FalkorDB Blog]

How can I use hierarchical context to improve entity extraction in FalkorDB?

Injecting higher-level context (such as document-level information) into chunked data can help resolve references (like pronouns) and improve the accuracy of entity extraction. This is especially useful when processing large documents for knowledge graph construction. [Source: FalkorDB Blog]

What is the role of explicit ontology definition in improving LLM query accuracy in FalkorDB?

Explicit ontology definition constrains LLM interactions with graphs, improving query accuracy by providing clear entity and relationship type boundaries. This helps maintain schema integrity and consistent data extraction. [Source: FalkorDB Blog]

How does FalkorDB support both single-graph and multi-graph architectures?

FalkorDB allows you to maintain multiple disjoint graphs within a single database instance, similar to having multiple tables in SQL. This enables you to keep domains separated or join them as needed, with one or multiple ontologies per graph. [Source: FalkorDB Blog]

What is the impact of chunking strategies on attribute extraction in FalkorDB?

Chunking strategies must preserve semantic context across document segments to ensure accurate attribute extraction. Providing higher-level context and clear parameter descriptions in structured outputs can help mitigate issues with incomplete or incorrect extraction. [Source: FalkorDB Blog]

What are the key features of FalkorDB?

FalkorDB offers ultra-low latency (up to 496x faster than Neo4j), 6x better memory efficiency, support for over 10,000 multi-graphs (tenants), open-source licensing, linear scalability, advanced AI optimization (GraphRAG & agent memory), and flexible cloud/on-prem deployment. It also provides built-in multi-tenancy, regulatory compliance tools, and enhanced user experience through dashboards and custom views. [Source: FalkorDB]

Does FalkorDB support multi-tenancy?

Yes, FalkorDB supports robust multi-tenancy in all plans, enabling management of over 10,000 multi-graphs. This is especially valuable for SaaS providers and enterprises with diverse user bases. [Source: FalkorDB]

What integrations does FalkorDB offer?

FalkorDB integrates with frameworks such as Graphiti (by ZEP) for AI agent memory, g.v() for knowledge graph visualization, Cognee for mapping knowledge graphs, LangChain and LlamaIndex for LLM integration, and more. FalkorDB is open to new integrations—contact the team to discuss your needs. [Source: FalkorDB]

Does FalkorDB provide an API and technical documentation?

Yes, FalkorDB provides a comprehensive API and technical documentation, including setup guides and advanced configuration references. Access the documentation at docs.falkordb.com and the latest release notes on the GitHub Releases Page. [Source: FalkorDB]

Is FalkorDB open source?

Yes, FalkorDB is open source, encouraging community collaboration and transparency. This differentiates it from proprietary solutions like AWS Neptune. [Source: FalkorDB]

What security and compliance certifications does FalkorDB have?

FalkorDB is SOC 2 Type II compliant, meeting rigorous standards for security, availability, processing integrity, confidentiality, and privacy. This demonstrates FalkorDB's commitment to maintaining the highest standards of security and compliance. [Source: FalkorDB]

How does FalkorDB optimize for AI applications?

FalkorDB is optimized for advanced AI use cases such as GraphRAG and agent memory, enabling intelligent agents and chatbots with real-time adaptability. It combines graph traversal with vector search for personalized user experiences and supports agentic AI applications. [Source: FalkorDB]

What is the GraphRAG-SDK and how does it help with regulatory compliance?

The GraphRAG-SDK helps organizations stay ahead of financial regulations by mapping regulations to workflows, identifying compliance gaps, and providing actionable recommendations. This is particularly valuable for enterprises in regulated industries. [Source: FalkorDB]

How does FalkorDB enhance user experience for data analysis?

FalkorDB enables fast, interactive analysis of complex data through dashboards and custom views, providing a frictionless and differentiated user experience for security, financial, and AI-driven applications. [Source: FalkorDB]

What are the primary use cases for FalkorDB?

FalkorDB is used for Text2SQL (natural language to SQL queries), security graphs (CNAPP, CSPM & CIEM), GraphRAG (advanced graph-based retrieval), agentic AI & chatbots, fraud detection (real-time pattern detection), and as a high-performance graph database for complex relationships. [Source: FalkorDB]

Who can benefit from using FalkorDB?

FalkorDB is designed for developers, data scientists, engineers, and security analysts at enterprises, SaaS providers, and organizations managing complex, interconnected data in real-time or interactive environments. [Source: FalkorDB]

What business impact can customers expect from using FalkorDB?

Customers can expect improved scalability and data management, enhanced trust and reliability, reduced alert fatigue in cybersecurity, faster time-to-market, enhanced user experience, regulatory compliance, and support for advanced AI applications. [Source: FalkorDB]

What pain points does FalkorDB address for its customers?

FalkorDB addresses pain points such as trust and reliability in LLM-based applications, scalability and data management, alert fatigue in cybersecurity, performance limitations of competitors, interactive data analysis, regulatory compliance, and the need for real-time adaptability in agentic AI and chatbots. [Source: FalkorDB]

What industries are represented in FalkorDB's case studies?

FalkorDB's case studies represent industries such as healthcare (AdaptX), media and entertainment (XR.Voyage), and artificial intelligence/ethical AI development (Virtuous AI). [Source: FalkorDB]

Can you share specific customer success stories using FalkorDB?

Yes. AdaptX uses FalkorDB to analyze high-dimensional medical data, XR.Voyage overcame scalability challenges in immersive media, and Virtuous AI created a high-performance, multi-modal data store for ethical AI development. Read their stories in the FalkorDB case studies. [Source: FalkorDB]

What feedback have customers given about FalkorDB's ease of use?

Customers like AdaptX and 2Arrows have praised FalkorDB for its rapid access to insights, ease of running non-traversal queries, and intuitive design. Anthony Ray, CTO of 2Arrows, called FalkorDB a 'game-changer' for their startup. [Source: FalkorDB]

How quickly can I implement FalkorDB and get started?

FalkorDB is built for rapid deployment, enabling teams to go from concept to enterprise-grade solutions in weeks, not months. You can sign up for FalkorDB Cloud, try it for free, run it locally with Docker, or schedule a demo. Comprehensive documentation and community support are available. [Source: FalkorDB]

What pricing plans does FalkorDB offer?

FalkorDB offers four pricing plans: FREE (for MVPs with community support), STARTUP (from /1GB/month, includes TLS and automated backups), PRO (from 0/8GB/month, includes cluster deployment and high availability), and ENTERPRISE (custom pricing with enterprise-grade features like VPC, custom backups, and 24/7 support). [Source: FalkorDB]

What features are included in the FREE plan?

The FREE plan is designed for building a powerful MVP with community support. It provides access to core FalkorDB features for early-stage projects. [Source: FalkorDB]

What features are included in the STARTUP plan?

The STARTUP plan starts from /1GB/month and includes TLS encryption and automated backups, making it suitable for growing teams and production workloads. [Source: FalkorDB]

What features are included in the PRO plan?

The PRO plan starts from 0/8GB/month and includes advanced features such as cluster deployment and high availability, making it ideal for mission-critical applications. [Source: FalkorDB]

What features are included in the ENTERPRISE plan?

The ENTERPRISE plan offers tailored pricing and includes enterprise-grade features such as VPC, custom backups, and 24/7 support, suitable for large organizations with advanced requirements. [Source: FalkorDB]

How does FalkorDB compare to Neo4j?

FalkorDB offers up to 496x faster latency, 6x better memory efficiency, flexible horizontal scaling, and includes multi-tenancy in all plans. Neo4j uses an on-disk storage model, is written in Java, and offers multi-tenancy only in premium plans. [Source: FalkorDB]

How does FalkorDB compare to AWS Neptune?

FalkorDB is open source, supports multi-tenancy, provides highly efficient vector search, and offers better latency performance compared to AWS Neptune, which is proprietary, closed-source, and lacks multi-tenancy support. [Source: FalkorDB]

How does FalkorDB compare to TigerGraph?

FalkorDB delivers faster latency, more efficient memory usage, and flexible horizontal scaling compared to TigerGraph, which has limited horizontal scaling and moderate memory efficiency. [Source: FalkorDB]

How does FalkorDB compare to ArangoDB?

FalkorDB demonstrates superior latency and memory efficiency, with flexible horizontal scaling, compared to ArangoDB, which has limited horizontal scaling and moderate memory efficiency. [Source: FalkorDB]

Why should I choose FalkorDB over other graph databases?

FalkorDB stands out for its exceptional performance (up to 496x faster latency), memory efficiency, open-source licensing, multi-tenancy in all plans, advanced AI integration, regulatory compliance tools, and proven customer success in demanding industries. [Source: FalkorDB]

Building Knowledge Graphs: Lessons from VCPedia and Fractal KG

Dan Shalev
Date Published: June 19, 2025
Date Updated: June 19, 2025

Knowledge graphs represent complex relationships as nodes and edges. This workshop demonstrated practical implementations through two systems: VCPedia for venture capital data extraction and Fractal KG for self-organizing graph structures. The following analysis examines technical decisions, implementation patterns, and production considerations based on firsthand development experience.

Insights

Insight Category	Technical Details
Graph Construction Automation	LLMs enable automated entity extraction and relationship mapping from unstructured data, reducing manual effort in knowledge graph construction
Structured Output Methodology	Converting ontology definitions into structured output formats for LLMs ensures consistent data extraction and maintains schema integrity
Entity Resolution Strategy	Deduplication represents the primary challenge in large-scale knowledge graphs; approaches include deterministic matching and LLM-based similarity assessment
Traversal Efficiency	Graph-based data retrieval through edge traversal provides superior context gathering compared to multi-table relational database joins
Ontology-Driven Accuracy	Explicit ontology definition constrains LLM interactions with graphs, improving query accuracy by providing clear entity and relationship type boundaries
Memory Optimization	String interning for frequently repeated values (e.g., country names) prevents redundant storage across millions of nodes
Schema Flexibility	Property graph model allows schema evolution while maintaining existing data, supporting iterative development approaches

System Architecture: VCPedia Case Study

Q&A

What criteria determine whether information should be modeled as a node versus an attribute?

Apply three decision criteria: (1) Memory efficiency—attributes duplicate string values across entities, though string interning mitigates this; (2) Traversal requirements—nodes enable outward traversal from the entity, attributes require traversal through the parent node; (3) Query patterns—frequent filtering by the data point suggests node representation, occasional display suggests attribute representation. Start with the natural conceptual model, then validate against intended query patterns.

What granularity of document storage should be implemented in FalkorDB for RAG systems?

All granularities are viable: sentences, paragraphs, summaries, and full documents can coexist as nodes with defined relationships. Graph structure enables hierarchical context retrieval—semantic search identifies relevant fragments, then graph traversal accesses parent documents or related content. This approach extends LLM context beyond isolated chunks through relationship-based navigation.

What automated ontology enforcement capabilities exist in schemaless knowledge graph systems?

FalkorDB implements two constraint types: unique constraints (ensuring attribute uniqueness within entity types) and exists constraints (mandating required attributes). Schema enforcement for relationship types, edge directions, and new labels remains user responsibility through application logic or LLM guidance. Future releases will expand automated enforcement capabilities.

Should multiple domains be consolidated into a single knowledge graph or separated into domain-specific graphs?

Both approaches are supported. Single-graph architecture enables cross-domain relationship discovery with multiple ontologies describing different graph portions. Multi-graph architecture maintains domain separation within a single database instance, similar to SQL table structures. Domain interconnection requirements and query patterns determine optimal architecture.

How should ontologies scale with data updates?

Two update categories require different approaches: (1) Instance updates (new entities of existing types) require no ontology changes; (2) Schema updates (new entity types) require manual ontology extension. No automated alignment enforcement exists between graph structure and ontology definitions—maintenance remains developer responsibility.

How can attribute extraction accuracy be improved for strict ontology schemas?

Implement four optimization techniques: (1) Few-shot prompting with domain-specific examples; (2) Hierarchical context injection for document chunks to resolve pronoun references; (3) Structured output definitions with parameter descriptions and validation rules; (4) JSON schema constraints for extraction consistency. Chunking strategies must preserve semantic context across document segments.

Discussion Transcript

Question 1: Node vs Attribute Design

Question: “What rule or heuristic should one use to determine whether some information should be a node versus an attribute of a node? An individual might be a node in a graph and their name would likely be an attribute, but what about their country? I can see a case for that being either a node or an attribute. Perhaps this is where upfront definition comes into play.”

Roy’s Answer: “It’s a good question, and I think it’s a question that many have yet to ask themselves whenever they’re trying to model their data. There isn’t a clear and correct answer—for some, treating a piece of information as an attribute might be the right way to go; for others, it might make sense to treat this piece of information as a node. My general way of thinking about this is: if you’re trying to visualize data as a graph, whatever comes to mind naturally is the data model that you probably want to start experimenting with. Also, if the data point we’re thinking of seems trivial, like at the beginning of the question, then there’s not much to think about.

Let’s try and think about the country piece of information. A person might be a resident or have a certain nationality, and so you might represent the country as an attribute for every person in your graph. For example, if you’re just maintaining the string of that person, then one thing that comes to mind is memory consumption. If you have millions of different persons, then this small set of strings representing countries would duplicate itself over and over again, so memory might come to mind. Actually, we’ve addressed this in our latest release—we have this feature called string interning where you can specify, ‘Hey, this is a string which comes pretty often; don’t duplicate it, use a unique value of it,’ and everything is shared. So this is one thing to take into consideration.

Another thing is the way in which you’re going about traversing or searching for knowledge. You need to think about the different types of questions that your system is intended to answer. For example, if you want to do traversal from a country outward, it would make sense to represent the country as a node. But if you are more focused on a person—you’re trying to locate a person via their unique identifier or social security number and you just want to see, for example, in which state that person lives—then it might make sense to have this as an attribute. You can still go with a node, but then you would have to traverse from that person to a country node and retrieve the name of the country.

To summarize: go with the most natural way of thinking about the data as a graph and see if the data model that you have chosen fits the different types of queries and questions that you’re intending to answer.”

Yohi’s Answer: “I would say the same thing, especially the last thing you said. For me, I’m not as deep on understanding how to optimize the queries piece, but I think starting from the queries is important. If you’re going to constantly filter your data by country, I would guess that it would be intuitive to turn that into a node. But if it’s something you just want to see when you visit their page as a side note, then you might not need to add that as a node. That’s kind of how I would think about it.”

Question 2: Document Storage Detail

Question: “When working with documents, what level of detail are we saving in FalkorDB? Would you store sentences, paragraphs, or summaries? If only storing summaries, are you also storing entire documents in a separate database?”

Roy’s Answer: “This is a classic RAG question, I would say, and once again there’s no right answer. I think all options are viable, and it really depends on the type of questions and the level of accuracy that you are aiming for. We’ve seen all different options—some people are actually analyzing the documents, for example with LLMs, creating embeddings, extracting sentences, creating summaries, and all of the relationships between the different components: paragraphs, sentences, summaries, entire documents. These can be represented as nodes in a graph.

For example, whenever you’re trying to answer a question via the classical RAG approach where you are creating embeddings for the question and then you’re doing semantic search to get to maybe the summary or maybe to get to the relevant paragraphs, then you can utilize the graph in order to get a hold of the original document. Maybe there’s a hierarchy of documents saying, ‘Oh, this document is a page from a broader corpus,’ or ‘Oh, this is a Wikipedia article that is linked to other Wikipedia pages. Maybe I want to extend my context and get some extra data by traversing the graph.’

So there’s no clear answer. I think you should experiment with it, but the benefits of storing this information in a graph allow you to retrieve additional context, which usually is relevant to extending the context with information that would help your LLM generate the correct answer.”

Question 3: Ontology Enforcement

Question: “If I understand correctly from the earlier slides, despite the knowledge graph system being schemaless, developing an ontology is a useful exercise to help guide the development and maintenance of the model that the system develops. Are there any useful mechanisms for enforcing that ontology within the knowledge graph system automatically?”

Roy’s Answer: “There are some mechanisms, yes, but the overall model that we’re following is known as the property graph model, which in its core is schemaless. The way in which you can start enforcing the schema—the feature that FalkorDB offers—is with constraints. Currently, we have two different constraints: one is unique constraints, and the other one is exists constraints.

The unique constraint makes sure that an attribute is unique throughout an entire entity type. For example, a social security number should be unique for every person. In addition, the exists constraint makes sure that every entity of a certain type—let’s say a country—must have a population attribute. It doesn’t have to be unique, but it must exist. That’s what we currently offer.

It would not enforce, for example, introducing new labels or new relationship types that are not part of the ontology. It would not enforce the type of edges coming in or going out from a certain node type. So for example, if your ontology says that there should not be an edge of type ‘driven to,’ that would not be enforced.

I think that over time, the plan is to introduce schema enforcement mechanisms and features into FalkorDB, but at the moment, the responsibility is for the user to enforce the ontology to make sure that the underlying graph follows the ontology. That could be done either via code that has been written by the user or by guiding an LLM to strictly follow the ontology whenever doing updates, deletions, or ingesting.”

Question 4: Graph Embedding Models

Question: “What graph embedding models are supported, and do they specify node, edge, or graph embeddings?”

Roy’s Answer: “That’s easy—none. We do not have GNN capabilities or graph embeddings. The only type of embeddings that we’re supporting at the moment are vector embeddings, the ones which you can generate with different LLMs or AI vendors. Once you have those vector embeddings, then you can utilize our semantic index to do vector searches. But I know that there’s interest in GNNs to do both node label prediction, edge prediction, and subgraph classification. Unfortunately, that’s not supported at the moment.”

Question 5: Domain Separation

Question: “Should I use a knowledge graph per domain or a single knowledge graph containing all the domain data?”

Roy’s Answer: “I think that it really depends on the use case, but let me say this: if you do see value in storing all of your information, which might span across multiple domains, in a single graph, you can do that. The nice thing about this is that you can have multiple ontologies all describing different portions of your graph. So you would have one ontology, for example, defining one domain which might be around driving sessions, but your graph expands further—it doesn’t contain only driving sessions with roads and intersections. It might contain other information that is still related to that, let’s say cities and population and universities, whatever have you. But then you can still maintain two different ontologies: one which is focusing on the driving sessions and the other one which focuses on, let’s say, population and employment. But those two all coexist within a single graph.

It is the LLM and the ontology that is represented to it that would know how to filter or focus on the relevant pieces of information within this larger graph. The other option is—if you don’t see value in connecting everything together—another very cool feature of FalkorDB that allows you to maintain multiple graphs in a single database. So similar to SQL where you have multiple tables within a single database, here you can have multiple disjoint graphs all within a single database. So you can keep them separated, you can join them together, you can have one ontology per graph, or you can have multiple ontologies capturing data from subgraphs within a single one.”

Question 6: Multiple Languages

Question: “If VCPedia supported multiple languages, would you add a translation step or store the tweets in their original language?”

Yohi’s Answer: “I would probably store the tweets in the original language and simply translate the output of the site, but that’s just my gut reaction. I don’t really have a strong reason for it.”

Question 7: Ontology Scaling

Question: “How does an ontology scale when my data updates?”

Roy’s Answer: “It really depends on the update. I mean, you can think—for this particular question, you can think about two different types of updates. One: updates that don’t change the ontology, meaning that you’re just adding a new city or a new university or a new person into your graph. That doesn’t change the ontology; it remains the same. The other type of update is if you are adding a new type of entity. So for example, maybe you are adding a truck to your driving session which only had cars in it, but now there is a truck.

We do not enforce alignment between the underlying graph and the ontology. This is something that the user needs to maintain, just like the way in which we’re not enforcing schema. Whenever you’re introducing a new type of entity, you should extend your ontology to capture that. So with this truck example, you should add a new entity to your ontology describing what the truck looks like, meaning the attributes associated with it and the different types of relationships that are interacting with this new truck entity.”

Question 8: Attribute Extraction Improvement

Question: “How can I improve the attribute extraction for each entity from my strict ontology schema for my knowledge graph? For example, I would get non-relevant context assigned to an entity or incomplete extraction. I hypothesize my chunking is wrong or I need to employ more natural language processing steps.”

Yohi’s Answer: “It’s hard to say. I feel like with graphs, it’s hard to solve a problem without knowing the specifics because it is really case-by-case in terms of what data you’re working with. But a few things that popped to mind that have worked for me in different cases:

Few-shot is always helpful—giving a couple of examples. If you’re converting large documents and you need to chunk them, again this is a simple example, but sometimes you need higher-level context of the document for each chunk. A simple example might be: the book starts by using people’s names, but later on it uses ‘he’ or ‘she’ instead. If you grab that paragraph and try to convert it, the ‘he’—the name might not be matching. So you might need to have higher-level context that you feed into the chunk if you’re chunking and then extracting.

And then just—I think a lot of—I found that adding clear descriptions when you use structured output for each parameter, you can provide descriptions, you can provide hard-coded rules on options, and using the structured output, defining JSON, and improving the prompt there, I found is also helpful. So those are just some techniques that popped into my mind.”

Dan Shalev

Leads FalkorDB's marketing & DevEx efforts. Interested in GenAI, knowledge graphs, and mitigating LLM hallucinations.

Frequently Asked Questions

Knowledge Graph Modeling & Implementation