Building & Querying a Knowledge Graph from Unstructured Data

Blog-5

Table of Contents

Diffbot API, FalkorDB, and LangChain are a great combination for building intelligent applications that can understand and answer questions from unstructured data.

Diffbot API has a powerful API that can extract structured data from unstructured documents, such as web pages, PDFs, or emails. It achieves this by utilizing advanced models designed to transform text into structured graph information. These models analyze the text to identify and organize entities and relationships, laying the foundation for a comprehensive knowledge graph.


With Diffbot API, you can create a knowledge graph that represents the entities and relationships in your documents, and store it in FalkorDB. This process involves extracting structured graph information directly from the text, ensuring that the data is both accurate and easily navigable.


Then, you can use Langchain to query your knowledge graph and get answers to your questions. Langchain can handle complex and natural queries, returning relevant and accurate answers from your knowledge graph. By integrating these tools, you streamline the process of extracting, storing, and querying structured information, making it easier to leverage your data effectively.

1. Installing LangChain

First, you need to install LangChain and some dependencies on your machine. You can download it from the official website or use the command line:

pip install langchain langchain-experimental openai redis wikipedia

2. Starting FalkorDB server locally

Staring a local FalkorDB is as simple as running a local docker you can go read on the documentation other ways to run it

            > docker run -p 6379:6379 -it --rm falkordb/falkordb:latest

6:C 26 Aug 2023 08:36:26.297 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

6:C 26 Aug 2023 08:36:26.297 # Redis version=7.2.1, bits=64, commit=00000000, modified=0, pid=6, just started

...

...

6:M 26 Aug 2023 08:36:26.322 * <graph> Starting up FalkorDB version 99.99.99.

6:M 26 Aug 2023 08:36:26.324 * <graph> Thread pool created, using 8 threads.

6:M 26 Aug 2023 08:36:26.324 * <graph> Maximum number of OpenMP threads set to 8

6:M 26 Aug 2023 08:36:26.324 * <graph> Query backlog size: 1000

6:M 26 Aug 2023 08:36:26.324 * Module 'graph' loaded from /FalkorDB/bin/linux-x64-release/src/falkordb.so

6:M 26 Aug 2023 08:36:26.324 * Ready to accept connections
        

Running the demo

The rest of this blog will cover the simple steps you can take to get started, you can also find try the Google Colab notebook

Constructing a Knowledge Graph from Unstructured Text

To construct a knowledge graph from unstructured text, follow these core steps:

  1. Extracting Structured Information: The initial task involves extracting structured graph information from the unstructured text. This process transforms raw data into a format suitable for graph representation.

  2. Storing into a Graph Database: Once the information is structured, the next step is storing it in a graph database. This storage enables various applications to utilize the knowledge graph effectively.

3. Create a Knowledge Graph

Now, let’s create a demo knowledge graph of Warren Buffett using Wikipedioa

            from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer

from langchain.document_loaders import WikipediaLoader

diffbot_api_key = "DIFFBOT_API_KEY"

diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)

query = "Warren Buffett"

raw_documents = WikipediaLoader(query=query).load()

graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)
        

When utilizing the LLM Graph Transformer to extract meaningful data from text, one can identify a variety of nodes and relationships. Here are some examples:

Nodes

  • Individuals: Distinct names or identifiers for people, such as Marie Curie and Pierre Curie.
  • Organizations: Entities or institutions, like the University of Paris.

Relationships

  • Personal Connections: Relationships between individuals, for example, a marriage between Marie Curie and Pierre Curie.
  • Professional Associations: Links between individuals and organizations, such as Marie Curie’s role as a professor at the University of Paris.

These examples illustrate how textual data can be effectively mapped into nodes and relationships, highlighting connections and interactions between different entities.

4Storing the Knowledge Graph in FalkorDB

Last step storing the knowledge Graph to FalkorDB

            from langchain.graphs import FalkorDBGraph

graph = FalkorDBGraph(

   "falkordb",

)

graph.add_graph_documents(graph_documents)

graph.refresh_schema()
        

5Querying the Graph

You are all set, you can start querying the Knowledge Graph… Let’s try a couple of questions.

            %env OPENAI_API_KEY=OPENAI_API_KEY

from langchain.chains import GraphCypherQAChain

from langchain.chat_models import ChatOpenAI

chain = GraphCypherQAChain.from_llm(

   cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),

   qa_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),

   graph=graph, verbose=True,

)


chain.run("Which university did Warren Buffett attend?")

> Entering new GraphCypherQAChain chain...

Generated Cypher:

MATCH (p:Person {name: "Warren Buffett"})-[:EDUCATED_AT]->(o:Organization)

RETURN o.name

Full Context:

[['Woodrow Wilson High School'], ['Alice Deal Junior High School'], ['Columbia Business School'], ['New York Institute of Finance']]

> Finished chain.

'Warren Buffett attended Columbia Business School.'

chain.run("Who is or was working at Berkshire Hathaway?")

> Entering new GraphCypherQAChain chain...

Generated Cypher:

MATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name

Full Context:

[['Warren Buffett'], ['Charlie Munger'], ['Howard Buffett'], ['Susan Buffett'], ['Howard'], ['Oliver Chace']]

> Finished chain.

'Warren Buffett, Charlie Munger, Howard Buffett, Susan Buffett, Howard, and Oliver Chace are or were working at Berkshire Hathaway.'
        

Security Considerations When Constructing Knowledge Graphs

When building knowledge graphs, prioritizing security is essential due to the need to manage extensive write access to databases, which holds potential risks. Here’s what you should keep in mind:

  1. Data Verification and Validation
    Before integrating any data into your graph, ensure it’s thoroughly verified and validated. This step is critical to prevent the introduction of inaccurate or malicious data that could compromise the integrity of the entire system.

  2. Backups
    Maintain regular backups of the database. This precaution ensures that you can quickly restore data in case of a security breach or data corruption.

  3. Continuous Updates
    Keep your database software and security protocols up to date. Regular updates help protect against emerging threats and vulnerabilities.

By addressing these security considerations, you can enhance the robustness of your knowledge graph infrastructure. For comprehensive details on security best practices, consult resources from reputable cybersecurity organizations like the CISA or NIST.

GraphRAG, CodeGraph and Graph DBMS news, guides and opinions delivered weekly. No spam, cancel anytime.