Knowledge Graph Builder

The Knowledge Graph Builder in Nappai is designed to turn unstructured text (like PDFs, Word documents, or web pages) into a structured, visual network. Instead of just storing text as plain documents, this component uses AI to identify important “entities” (such as Names, Organizations, or Locations) and the “relationships” between them (such as “works for,” “located in,” or “owns”).

This creates a Knowledge Graph, which allows you to see how different pieces of information are connected. It makes it easier to answer complex questions by finding relationships between data points that standard search engines might miss.

How it Works

Ingestion: You provide documents or text data to the component.
AI Analysis: The component uses Large Language Models (LLMs) to read and understand the content. It doesn’t just read word-for-word; it looks for meaning.
Extraction: It identifies specific entities (nouns) and relationships (connections). For example, in the sentence “John Smith works at Acme Corp,” it identifies “John Smith” and “Acme Corp” as entities and “works at” as the relationship.
Storage: These entities and relationships are stored in a connected database, forming a graph structure.
Search & Query: Once built, you can query this graph to find answers based on connections (e.g., “Who works at Acme Corp?” or “What companies are related to John Smith?”).

Connection & Credentials

This component requires specific connections to function properly. You must configure the following in the Nappai panel:

Database: Connect a PostgreSQL or MySQL database component. This is where the graph data (nodes and edges) is stored.
LLM: Select a Language Model to power the extraction and reasoning.
Embeddings: Select an Embedding Model to generate vector representations for semantic search.

Operations

This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:

Add Document: ingest and extract entities.
Resolve Entities: deduplicate entities.
Search: query the graph using standard agentic search.
Deep Search: autonomous multi-step research with customizable role & format.
Retriever: get retriever for RAG chains.

To use the component, first select the operation you need in the “Operation” field.

Inputs

The following fields are available to configure this component. Each field may be visible in different operations:

Documents: Document(s) to process. Accepts Binary data from file uploads, URLs, or other sources.
- Visible in: Add Document, Resolve Entities, Search, Deep Search
Database: Database connection info. Connect a PostgreSQL, MySQL, or other database component configured with ‘DatabaseInfo’ operation.
- Visible in: Add Document, Resolve Entities, Search, Deep Search
Llm: Language model for entity and relationship extraction
- Visible in: Add Document, Resolve Entities, Search, Deep Search
Embeddings: Embedding model for generating vectors. Embeddings will be stored directly in the database for native vector search.
- Visible in: Add Document, Resolve Entities, Search, Deep Search
Search Query: Query text to search in the knowledge graph
- Visible in: Search, Deep Search
Search Metadata: Metadata to search in the knowledge graph
- Visible in: Search, Deep Search
Agent Role: Persona and backstory for the Deep Agent (e.g. ‘You are an expert Auditor…’)
- Visible in: Deep Search
Output Format Instructions: Specific instructions for the final output format (e.g. ‘Report with Executive Summary’, ‘Legal Brief’)
- Visible in: Deep Search
Fast Mode: Use fast embedding-only entity detection (21% faster, same quality)
- Visible in: Add Document
Max Rewrite Attempts: Maximum number of question rewrite attempts if documents are not relevant
- Visible in: Search, Deep Search
Enable Stream Progress: Enable streaming progress messages during search.
- Visible in: Search, Deep Search
Chunk Size: Size of text chunks for processing
- Visible in: Add Document
Chunk Overlap: Overlap between consecutive chunks
- Visible in: Add Document
Section-Aware Chunking: Split markdown documents by section headers (## Header) to preserve semantic boundaries. Improves retrieval quality.
- Visible in: Add Document
Semantic Hierarchy: Comma-separated hierarchy patterns for section-aware chunking (e.g. ‘TÍTULO,CAPÍTULO,Sección,Artículo’). Assigns virtual hierarchy levels even when all markdown headers use the same ## level.
- Visible in: Add Document
Extract Document Metadata: Use LLM to extract structured metadata (title, dates, type) from the document
- Visible in: Add Document
Metadata Extraction Prompt: Custom prompt for document metadata extraction. Must return JSON with fields like document_id, title, effective_date, issuing_body, keywords, etc.
- Visible in: Add Document
Enable Table Extraction: Automatically detect tables in documents and extract atomic facts from them
- Visible in: Add Document
Context Before Table: Number of characters before table to use as context
- Visible in: Add Document
Context After Table: Number of characters after table to include as context
- Visible in: Add Document
Minimum Table Rows: Minimum number of rows to consider as a valid table
- Visible in: Add Document
Minimum Table Columns: Minimum number of columns to consider as a valid table
- Visible in: Add Document
Generate Chunk Atomic Facts: Generate pseudo-atomic facts from entity contexts and relationship evidence (zero LLM cost)
- Visible in: Add Document
LLM Chunk Atomic Facts: Extract high-quality atomic facts from each chunk using LLM (1 call per chunk). Produces self-contained, precise facts ideal for retrieval.
- Visible in: Add Document
Max Facts per Chunk: Maximum number of atomic facts to extract per chunk (only for LLM extraction)
- Visible in: Add Document
Similarity Threshold: Threshold for entity resolution (0.0-1.0). Higher values = stricter matching.
- Visible in: Resolve Entities
Enable Entity Normalization: Use LLM to normalize entity names and types after extraction (e.g., ‘nia 200’ → ‘NIA-ES 200’)
- Visible in: Resolve Entities
Entity Normalization Prompt: Custom prompt for LLM-based entity normalization. Must contain {entities_json} placeholder.
- Visible in: Resolve Entities
Top K Results: Number of results to return
- Visible in: Search, Deep Search
Enable Graph Search: Enable entity-based graph search in addition to vector search
- Visible in: Search, Deep Search
Graph Hops: Number of hops for graph neighborhood expansion
- Visible in: Search, Deep Search
Retrieval Only: Return retrieved documents without LLM generation. Useful when integrating with other agents.
- Visible in: Search, Deep Search
Include Grading: Include document relevance grading (sufficient/partial/insufficient) in the response. Only applies when Retrieval Only is enabled.
- Visible in: Search, Deep Search
Metadata Fields: Comma-separated metadata fields to include from documents (e.g., ‘document_id,title,effective_date’). Always includes file_path and score. Only applies when Retrieval Only is enabled.
- Visible in: Search, Deep Search
Date Order Field: Metadata key for date-based ordering (e.g., ‘publication_date’). When set, results are sorted by this date as the final step after relevance ranking. Leave empty to use default score-based ordering.
- Visible in: Search, Deep Search
Date Order Direction: Sort direction for date ordering: ‘desc’ (newest first) or ‘asc’ (oldest first).
- Visible in: Search, Deep Search
Section Expansion: When multiple chunks from the same section score high, expand to include all sibling chunks from that section (e.g. return the full CAPÍTULO).
- Visible in: Search, Deep Search
Section Expansion Level: Breadcrumb level to expand (e.g. ‘CAPÍTULO’, ‘Sección’, ‘TÍTULO’). Only used when Section Expansion is enabled.
- Visible in: Search, Deep Search
Graph Extraction Prompt: Custom prompt for graph entity and relationship extraction
- Visible in: Add Document
Document Summary Prompt: Custom prompt template for the initial document summary
- Visible in: Add Document
Query Entity Extraction Prompt: Custom prompt for extracting entities from search queries
- Visible in: Search, Deep Search
Generate Graph Image: Generate a visual image of the knowledge graph
- Visible in: Add Document
Domain: Domain label for multi-graph support (e.g., ‘legal_documents’, ‘technical_docs’)
- Visible in: Add Document, Resolve Entities, Search, Deep Search
Domain Description: Description of the knowledge domain for context injection (e.g. ‘Spanish Auditing Standards’, ‘Technical Documentation’)
- Visible in: Add Document, Resolve Entities, Search, Deep Search
Table Prefix: Prefix for all database tables (e.g., ‘kgraph_’, ‘myapp_kg_’). Allows multiple knowledge graphs in the same database.
- Visible in: Add Document
Document Type: Type of document for optimized extraction
- Visible in: Add Document
Graph Export Format: Format for graph serialization in output
- Visible in: Add Document

Outputs

The component produces a structured result depending on the selected operation. Generally, it outputs a Knowledge Graph object containing nodes (entities) and edges (relationships), or a set of retrieved documents/answers with metadata.

Output Data Example (JSON)

This example shows the structure of the output when performing a Search operation, returning relevant entities and their connections: json { “status”: “success”, “results”: [ { “entity”: { “name”: “Acme Corp”, “type”: “Organization”, “id”: “node_acme_123” }, “relationships”: [ { “target”: “John Smith”, “type”: “employed_by”, “confidence”: 0.95 } ], “relevance_score”: 0.88 } ], “metadata”: { “total_matches”: 1, “processing_time_ms”: 120 } }

Connectivity

Typically, this component is connected as follows:

Input: Connects to Document inputs from file upload components, web scraping tools, or previous data processing steps. It also connects to LLM and Embedding model components for AI processing.
Database: Must connect to a configured Database component (PostgreSQL/MySQL) to store the graph data.
Output: Feeds into other AI agents, search interfaces, or visualization components that need to display relationships or answer questions based on the graph.

Usage Example

Scenario: Building a Corporate Knowledge Base

Setup: Connect your company’s HR documents and Public Filings to the Knowledge Graph Builder.
Operation: Select Add Document.
Configuration:
- Upload PDFs of employee handbooks.
- Set Domain to hr_documents.
- Enable Extract Document Metadata to capture dates and document types.
Result: The system builds a graph linking employees to their roles, departments, and reporting structures.
Search: Later, select Search operation.
Query: Enter “Who reports to the CFO?”.
Outcome: The graph navigates the relationships and returns the specific employees linked to the CFO via “reports_to” edges.

Tips and Best Practices

Use Fast Mode: If you are processing a large volume of documents quickly and don’t need complex semantic nuance, enable Fast Mode for a 21% speed boost.
Entity Normalization: If you find duplicate entities (e.g., “Inc.” vs “Inc”), enable Enable Entity Normalization to clean up the data automatically.
Deep Search for Complex Queries: Use the Deep Search operation when your question requires reasoning across multiple documents or steps, rather than simple keyword matching.
Define Domains Clearly: When working with specialized data (like legal or medical texts), use the Domain and Domain Description inputs to help the AI understand the specific context and jargon.

Important Notes

This component is currently in development. Features and interfaces may change as the system evolves.

None