Diffbot API, FalkorDB, and LangChain are a great combination for building intelligent applications that can understand and answer questions from unstructured data.
Diffbot API has a powerful API that can extract structured data from unstructured documents, such as web pages, PDFs, or emails. It achieves this by utilizing advanced models designed to transform text into structured graph information. These models analyze the text to identify and organize entities and relationships, laying the foundation for a comprehensive knowledge graph.
With Diffbot API, you can create a knowledge graph that represents the entities and relationships in your documents, and store it in FalkorDB. This process involves extracting structured graph information directly from the text, ensuring that the data is both accurate and easily navigable.
Then, you can use Langchain to query your knowledge graph and get answers to your questions. Langchain can handle complex and natural queries, returning relevant and accurate answers from your knowledge graph. By integrating these tools, you streamline the process of extracting, storing, and querying structured information, making it easier to leverage your data effectively.
1. Installing LangChain
First, you need to install LangChain and some dependencies on your machine. You can download it from the official website or use the command line:
pip install langchain langchain-experimental openai redis wikipedia
2. Starting FalkorDB server locally
Staring a local FalkorDB is as simple as running a local docker you can go read on the documentation other ways to run it
> docker run -p 6379:6379 -it --rm falkordb/falkordb:latest
6:C 26 Aug 2023 08:36:26.297 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
6:C 26 Aug 2023 08:36:26.297 # Redis version=7.2.1, bits=64, commit=00000000, modified=0, pid=6, just started
...
...
6:M 26 Aug 2023 08:36:26.322 * <graph> Starting up FalkorDB version 99.99.99.
6:M 26 Aug 2023 08:36:26.324 * <graph> Thread pool created, using 8 threads.
6:M 26 Aug 2023 08:36:26.324 * <graph> Maximum number of OpenMP threads set to 8
6:M 26 Aug 2023 08:36:26.324 * <graph> Query backlog size: 1000
6:M 26 Aug 2023 08:36:26.324 * Module 'graph' loaded from /FalkorDB/bin/linux-x64-release/src/falkordb.so
6:M 26 Aug 2023 08:36:26.324 * Ready to accept connections
Running the demo
The rest of this blog will cover the simple steps you can take to get started, you can also find try the Google Colab notebook
Constructing a Knowledge Graph from Unstructured Text
To construct a knowledge graph from unstructured text, follow these core steps:
Extracting Structured Information: The initial task involves extracting structured graph information from the unstructured text. This process transforms raw data into a format suitable for graph representation.
Storing into a Graph Database: Once the information is structured, the next step is storing it in a graph database. This storage enables various applications to utilize the knowledge graph effectively.
3. Create a Knowledge Graph
Now, let’s create a demo knowledge graph of Warren Buffett using Wikipedioa
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
from langchain.document_loaders import WikipediaLoader
diffbot_api_key = "DIFFBOT_API_KEY"
diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)
query = "Warren Buffett"
raw_documents = WikipediaLoader(query=query).load()
graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)
When utilizing the LLM Graph Transformer to extract meaningful data from text, one can identify a variety of nodes and relationships. Here are some examples:
Nodes
- Individuals: Distinct names or identifiers for people, such as Marie Curie and Pierre Curie.
- Organizations: Entities or institutions, like the University of Paris.
Relationships
- Personal Connections: Relationships between individuals, for example, a marriage between Marie Curie and Pierre Curie.
- Professional Associations: Links between individuals and organizations, such as Marie Curie’s role as a professor at the University of Paris.
These examples illustrate how textual data can be effectively mapped into nodes and relationships, highlighting connections and interactions between different entities.
4. Storing the Knowledge Graph in FalkorDB
Last step storing the knowledge Graph to FalkorDB
from langchain.graphs import FalkorDBGraph
graph = FalkorDBGraph(
"falkordb",
)
graph.add_graph_documents(graph_documents)
graph.refresh_schema()
5. Querying the Graph
You are all set, you can start querying the Knowledge Graph… Let’s try a couple of questions.
%env OPENAI_API_KEY=OPENAI_API_KEY
from langchain.chains import GraphCypherQAChain
from langchain.chat_models import ChatOpenAI
chain = GraphCypherQAChain.from_llm(
cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),
qa_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),
graph=graph, verbose=True,
)
chain.run("Which university did Warren Buffett attend?")
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person {name: "Warren Buffett"})-[:EDUCATED_AT]->(o:Organization)
RETURN o.name
Full Context:
[['Woodrow Wilson High School'], ['Alice Deal Junior High School'], ['Columbia Business School'], ['New York Institute of Finance']]
> Finished chain.
'Warren Buffett attended Columbia Business School.'
chain.run("Who is or was working at Berkshire Hathaway?")
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name
Full Context:
[['Warren Buffett'], ['Charlie Munger'], ['Howard Buffett'], ['Susan Buffett'], ['Howard'], ['Oliver Chace']]
> Finished chain.
'Warren Buffett, Charlie Munger, Howard Buffett, Susan Buffett, Howard, and Oliver Chace are or were working at Berkshire Hathaway.'
Security Considerations When Constructing Knowledge Graphs
When building knowledge graphs, prioritizing security is essential due to the need to manage extensive write access to databases, which holds potential risks. Here’s what you should keep in mind:
Data Verification and Validation
Before integrating any data into your graph, ensure it’s thoroughly verified and validated. This step is critical to prevent the introduction of inaccurate or malicious data that could compromise the integrity of the entire system.Backups
Maintain regular backups of the database. This precaution ensures that you can quickly restore data in case of a security breach or data corruption.Continuous Updates
Keep your database software and security protocols up to date. Regular updates help protect against emerging threats and vulnerabilities.
By addressing these security considerations, you can enhance the robustness of your knowledge graph infrastructure. For comprehensive details on security best practices, consult resources from reputable cybersecurity organizations like the CISA or NIST.