Knowledge Graph Builder
The Knowledge Graph Builder in Nappai is designed to turn unstructured text (like PDFs, Word documents, or web pages) into a structured, visual network. Instead of just storing text as plain documents, this component uses AI to identify important “entities” (such as Names, Organizations, or Locations) and the “relationships” between them (such as “works for,” “located in,” or “owns”).
This creates a Knowledge Graph, which allows you to see how different pieces of information are connected. It makes it easier to answer complex questions by finding relationships between data points that standard search engines might miss.
How it Works
- Ingestion: You provide documents or text data to the component.
- AI Analysis: The component uses Large Language Models (LLMs) to read and understand the content. It doesn’t just read word-for-word; it looks for meaning.
- Extraction: It identifies specific entities (nouns) and relationships (connections). For example, in the sentence “John Smith works at Acme Corp,” it identifies “John Smith” and “Acme Corp” as entities and “works at” as the relationship.
- Storage: These entities and relationships are stored in a connected database, forming a graph structure.
- Search & Query: Once built, you can query this graph to find answers based on connections (e.g., “Who works at Acme Corp?” or “What companies are related to John Smith?”).
Connection & Credentials
This component requires specific connections to function properly. You must configure the following in the Nappai panel:
- Database: Connect a PostgreSQL or MySQL database component. This is where the graph data (nodes and edges) is stored.
- LLM: Select a Language Model to power the extraction and reasoning.
- Embeddings: Select an Embedding Model to generate vector representations for semantic search.
Operations
This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:
- Add Document: ingest and extract entities.
- Resolve Entities: deduplicate entities.
- Search: query the graph using standard agentic search.
- Deep Search: autonomous multi-step research with customizable role & format.
- Retriever: get retriever for RAG chains.
To use the component, first select the operation you need in the “Operation” field.
Inputs
The following fields are available to configure this component. Each field may be visible in different operations:
-
Documents: Document(s) to process. Accepts Binary data from file uploads, URLs, or other sources.
- Visible in: Add Document, Resolve Entities, Search, Deep Search
-
Database: Database connection info. Connect a PostgreSQL, MySQL, or other database component configured with ‘DatabaseInfo’ operation.
- Visible in: Add Document, Resolve Entities, Search, Deep Search
-
Llm: Language model for entity and relationship extraction
- Visible in: Add Document, Resolve Entities, Search, Deep Search
-
Embeddings: Embedding model for generating vectors. Embeddings will be stored directly in the database for native vector search.
- Visible in: Add Document, Resolve Entities, Search, Deep Search
-
Search Query: Query text to search in the knowledge graph
- Visible in: Search, Deep Search
-
Search Metadata: Metadata to search in the knowledge graph
- Visible in: Search, Deep Search
-
Agent Role: Persona and backstory for the Deep Agent (e.g. ‘You are an expert Auditor…’)
- Visible in: Deep Search
-
Output Format Instructions: Specific instructions for the final output format (e.g. ‘Report with Executive Summary’, ‘Legal Brief’)
- Visible in: Deep Search
-
Fast Mode: Use fast embedding-only entity detection (21% faster, same quality)
- Visible in: Add Document
-
Max Rewrite Attempts: Maximum number of question rewrite attempts if documents are not relevant
- Visible in: Search, Deep Search
-
Enable Stream Progress: Enable streaming progress messages during search.
- Visible in: Search, Deep Search
-
Chunk Size: Size of text chunks for processing
- Visible in: Add Document
-
Chunk Overlap: Overlap between consecutive chunks
- Visible in: Add Document
-
Section-Aware Chunking: Split markdown documents by section headers (## Header) to preserve semantic boundaries. Improves retrieval quality.
- Visible in: Add Document
-
Semantic Hierarchy: Comma-separated hierarchy patterns for section-aware chunking (e.g. ‘TÍTULO,CAPÍTULO,Sección,Artículo’). Assigns virtual hierarchy levels even when all markdown headers use the same ## level.
- Visible in: Add Document
-
Extract Document Metadata: Use LLM to extract structured metadata (title, dates, type) from the document
- Visible in: Add Document
-
Metadata Extraction Prompt: Custom prompt for document metadata extraction. Must return JSON with fields like document_id, title, effective_date, issuing_body, keywords, etc.
- Visible in: Add Document
-
Enable Table Extraction: Automatically detect tables in documents and extract atomic facts from them
- Visible in: Add Document
-
Context Before Table: Number of characters before table to use as context
- Visible in: Add Document
-
Context After Table: Number of characters after table to include as context
- Visible in: Add Document
-
Minimum Table Rows: Minimum number of rows to consider as a valid table
- Visible in: Add Document
-
Minimum Table Columns: Minimum number of columns to consider as a valid table
- Visible in: Add Document
-
Generate Chunk Atomic Facts: Generate pseudo-atomic facts from entity contexts and relationship evidence (zero LLM cost)
- Visible in: Add Document
-
LLM Chunk Atomic Facts: Extract high-quality atomic facts from each chunk using LLM (1 call per chunk). Produces self-contained, precise facts ideal for retrieval.
- Visible in: Add Document
-
Max Facts per Chunk: Maximum number of atomic facts to extract per chunk (only for LLM extraction)
- Visible in: Add Document
-
Similarity Threshold: Threshold for entity resolution (0.0-1.0). Higher values = stricter matching.
- Visible in: Resolve Entities
-
Enable Entity Normalization: Use LLM to normalize entity names and types after extraction (e.g., ‘nia 200’ → ‘NIA-ES 200’)
- Visible in: Resolve Entities
-
Entity Normalization Prompt: Custom prompt for LLM-based entity normalization. Must contain {entities_json} placeholder.
- Visible in: Resolve Entities
-
Top K Results: Number of results to return
- Visible in: Search, Deep Search
-
Enable Graph Search: Enable entity-based graph search in addition to vector search
- Visible in: Search, Deep Search
-
Graph Hops: Number of hops for graph neighborhood expansion
- Visible in: Search, Deep Search
-
Retrieval Only: Return retrieved documents without LLM generation. Useful when integrating with other agents.
- Visible in: Search, Deep Search
-
Include Grading: Include document relevance grading (sufficient/partial/insufficient) in the response. Only applies when Retrieval Only is enabled.
- Visible in: Search, Deep Search
-
Metadata Fields: Comma-separated metadata fields to include from documents (e.g., ‘document_id,title,effective_date’). Always includes file_path and score. Only applies when Retrieval Only is enabled.
- Visible in: Search, Deep Search
-
Date Order Field: Metadata key for date-based ordering (e.g., ‘publication_date’). When set, results are sorted by this date as the final step after relevance ranking. Leave empty to use default score-based ordering.
- Visible in: Search, Deep Search
-
Date Order Direction: Sort direction for date ordering: ‘desc’ (newest first) or ‘asc’ (oldest first).
- Visible in: Search, Deep Search
-
Section Expansion: When multiple chunks from the same section score high, expand to include all sibling chunks from that section (e.g. return the full CAPÍTULO).
- Visible in: Search, Deep Search
-
Section Expansion Level: Breadcrumb level to expand (e.g. ‘CAPÍTULO’, ‘Sección’, ‘TÍTULO’). Only used when Section Expansion is enabled.
- Visible in: Search, Deep Search
-
Graph Extraction Prompt: Custom prompt for graph entity and relationship extraction
- Visible in: Add Document
-
Document Summary Prompt: Custom prompt template for the initial document summary
- Visible in: Add Document
-
Query Entity Extraction Prompt: Custom prompt for extracting entities from search queries
- Visible in: Search, Deep Search
-
Generate Graph Image: Generate a visual image of the knowledge graph
- Visible in: Add Document
-
Domain: Domain label for multi-graph support (e.g., ‘legal_documents’, ‘technical_docs’)
- Visible in: Add Document, Resolve Entities, Search, Deep Search
-
Domain Description: Description of the knowledge domain for context injection (e.g. ‘Spanish Auditing Standards’, ‘Technical Documentation’)
- Visible in: Add Document, Resolve Entities, Search, Deep Search
-
Table Prefix: Prefix for all database tables (e.g., ‘kgraph_’, ‘myapp_kg_’). Allows multiple knowledge graphs in the same database.
- Visible in: Add Document
-
Document Type: Type of document for optimized extraction
- Visible in: Add Document
-
Graph Export Format: Format for graph serialization in output
- Visible in: Add Document
Outputs
The component produces a structured result depending on the selected operation. Generally, it outputs a Knowledge Graph object containing nodes (entities) and edges (relationships), or a set of retrieved documents/answers with metadata.
Output Data Example (JSON)
This example shows the structure of the output when performing a Search operation, returning relevant entities and their connections: json { “status”: “success”, “results”: [ { “entity”: { “name”: “Acme Corp”, “type”: “Organization”, “id”: “node_acme_123” }, “relationships”: [ { “target”: “John Smith”, “type”: “employed_by”, “confidence”: 0.95 } ], “relevance_score”: 0.88 } ], “metadata”: { “total_matches”: 1, “processing_time_ms”: 120 } }
Connectivity
Typically, this component is connected as follows:
- Input: Connects to Document inputs from file upload components, web scraping tools, or previous data processing steps. It also connects to LLM and Embedding model components for AI processing.
- Database: Must connect to a configured Database component (PostgreSQL/MySQL) to store the graph data.
- Output: Feeds into other AI agents, search interfaces, or visualization components that need to display relationships or answer questions based on the graph.
Usage Example
Scenario: Building a Corporate Knowledge Base
- Setup: Connect your company’s HR documents and Public Filings to the Knowledge Graph Builder.
- Operation: Select Add Document.
- Configuration:
- Upload PDFs of employee handbooks.
- Set Domain to
hr_documents. - Enable Extract Document Metadata to capture dates and document types.
- Result: The system builds a graph linking employees to their roles, departments, and reporting structures.
- Search: Later, select Search operation.
- Query: Enter “Who reports to the CFO?”.
- Outcome: The graph navigates the relationships and returns the specific employees linked to the CFO via “reports_to” edges.
Tips and Best Practices
- Use Fast Mode: If you are processing a large volume of documents quickly and don’t need complex semantic nuance, enable Fast Mode for a 21% speed boost.
- Entity Normalization: If you find duplicate entities (e.g., “Inc.” vs “Inc”), enable Enable Entity Normalization to clean up the data automatically.
- Deep Search for Complex Queries: Use the Deep Search operation when your question requires reasoning across multiple documents or steps, rather than simple keyword matching.
- Define Domains Clearly: When working with specialized data (like legal or medical texts), use the Domain and Domain Description inputs to help the AI understand the specific context and jargon.
Important Notes
This component is currently in development. Features and interfaces may change as the system evolves.
Related Components
None