Building & Querying a Knowledge Graph from Unstructured Data

Diffbot API, FalkorDB, and LangChain are a great combination for building intelligent applications that can understand and answer questions from unstructured data.

Diffbot API has a powerful API that can extract structured data from unstructured documents, such as web pages, PDFs, or emails. With Diffbot API, you can create a Knowledge graph that represents the entities and relationships in your documents, and store it in FalkorDB. Then, you can use Langchain, to query your Knowledge graph and get answers to your questions. Langchain can handle complex and natural queries, and return relevant and accurate answers from your Knowledge graph.

1. Installing LangChain

First, you need to install LangChain and some dependencies on your machine. You can download it from the official website or use the command line:

pip install langchain langchain-experimental openai redis wikipedia

2. Starting FalkorDB server locally

Staring a local FalkorDB is as simple as running a local docker you can go read on the documentation other ways to run it

            > docker run -p 6379:6379 -it --rm falkordb/falkordb:latest

6:C 26 Aug 2023 08:36:26.297 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

6:C 26 Aug 2023 08:36:26.297 # Redis version=7.2.1, bits=64, commit=00000000, modified=0, pid=6, just started

...

...

6:M 26 Aug 2023 08:36:26.322 * <graph> Starting up FalkorDB version 99.99.99.

6:M 26 Aug 2023 08:36:26.324 * <graph> Thread pool created, using 8 threads.

6:M 26 Aug 2023 08:36:26.324 * <graph> Maximum number of OpenMP threads set to 8

6:M 26 Aug 2023 08:36:26.324 * <graph> Query backlog size: 1000

6:M 26 Aug 2023 08:36:26.324 * Module 'graph' loaded from /FalkorDB/bin/linux-x64-release/src/falkordb.so

6:M 26 Aug 2023 08:36:26.324 * Ready to accept connections

Running the demo

The rest of this blog will cover the simple steps you can take to get started, you can also find try the Google Colab notebook

3. Create a Knowledge Graph

Now, let’s create a demo knowledge graph of Warren Buffett using Wikipedioa

            from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer

from langchain.document_loaders import WikipediaLoader

diffbot_api_key = "DIFFBOT_API_KEY"

diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)

query = "Warren Buffett"

raw_documents = WikipediaLoader(query=query).load()

graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)

4. Storing the Knowledge Graph in FalkorDB

Last step storing the knowledge Graph to FalkorDB

            from langchain.graphs import FalkorDBGraph

graph = FalkorDBGraph(

   "falkordb",

)

graph.add_graph_documents(graph_documents)

graph.refresh_schema()

5. Querying the Graph

You are all set, you can start querying the Knowledge Graph… Let’s try a couple of questions.

            %env OPENAI_API_KEY=OPENAI_API_KEY

from langchain.chains import GraphCypherQAChain

from langchain.chat_models import ChatOpenAI

chain = GraphCypherQAChain.from_llm(

   cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),

   qa_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),

   graph=graph, verbose=True,

)


chain.run("Which university did Warren Buffett attend?")

> Entering new GraphCypherQAChain chain...

Generated Cypher:

MATCH (p:Person {name: "Warren Buffett"})-[:EDUCATED_AT]->(o:Organization)

RETURN o.name

Full Context:

[['Woodrow Wilson High School'], ['Alice Deal Junior High School'], ['Columbia Business School'], ['New York Institute of Finance']]

> Finished chain.

'Warren Buffett attended Columbia Business School.'

chain.run("Who is or was working at Berkshire Hathaway?")

> Entering new GraphCypherQAChain chain...

Generated Cypher:

MATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name

Full Context:

[['Warren Buffett'], ['Charlie Munger'], ['Howard Buffett'], ['Susan Buffett'], ['Howard'], ['Oliver Chace']]

> Finished chain.

'Warren Buffett, Charlie Munger, Howard Buffett, Susan Buffett, Howard, and Oliver Chace are or were working at Berkshire Hathaway.'

Guy Korland

Guy Korland serves as CEO at FalkorDB, where he drives graph database architecture for generative AI and retrieval-augmented generation workflows. He holds a PhD in Computer Science from Tel Aviv University and brings over 20 years of experience in database engineering. He previously led Redis’ incubation arm as SVP & CTO, oversaw platform architecture as GM & CTO at Stor.ai (Self-Point), co-founded and served as CTO of Shopetti, and directed R&D as VP at GigaSpaces.

Build fast and accurate GenAI apps with GraphRAG-SDK at scale

FalkorDB offers an accurate, multi-tenant RAG solution based on our low-latency, scalable graph database technology. It’s ideal for highly technical teams that handle complex, interconnected data in real-time, resulting in fewer hallucinations and more accurate responses from LLMs.

USE CASES

SOLUTIONS

GraphRAG-SDK

Code Graph

Browser

Ultra-fast, multi-tenant graph database using sparse matrix representations and linear algebra, ideal for highly technical teams that handle complex data in real-time, resulting in fewer hallucinations and more accurate responses from LLMs.

COMPARE

FalkorDB reduces computational overhead by leveraging sparse matrices and linear algebra operations, minimizing vCPU usage, lowering infrastructure costs, and reducing licensing expenses.

RESOURCES

COMMUNITY