Elasticsearch
The Elasticsearch component acts as a central hub for storing and retrieving your documents using advanced vector search technology. In simple terms, it allows you to save chunks of text into an Elasticsearch database and then find the most relevant information based on semantic meaning (what the text means) rather than just matching keywords.
This component is ideal for building AI-powered knowledge bases, searching through large document collections, or connecting your data to an AI assistant. It supports three main ways of interacting with your data: adding documents, searching them, or using them as a source for an AI agent.
How it Works
Internally, this component connects to an Elasticsearch cluster (a database system designed for speed and scalability). Here is the simplified process:
- Connection: It connects to your Elasticsearch server using credentials (like an API Key) that you provide.
- Storage (Indexing): When you add documents, it converts the text into numerical representations (vectors) using an “Embedding” model. It then stores these vectors in a specific “Index” (a folder or table) within Elasticsearch.
- Search: When you ask a question or search, it converts your query into a vector and finds the documents in the database that are mathematically closest to it. It can also filter these results based on specific metadata (like date, author, or category).
Note on Connections: You can connect to a local Elasticsearch server on your computer or a cloud-hosted Elasticsearch service (like Elastic Cloud). You must choose one method: either a specific URL for local servers or a Cloud ID for cloud services. You cannot use both at the same time.
Connection & Credentials
This component requires configuring a credential in the Nappai panel before interacting with the external service:
- Go to the Credentials section in your Nappai panel.
- Create a new credential of the type Elasticsearch and fill in the required fields (API Key, Username, Password, or Base URL).
- In your workflow, select the saved credential in the Credential input field of this node.
Tip: For cloud deployments, it is recommended to use an API Key for better security and simplicity.
Operations
This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:
- Add: Use this operation to ingest new documents into your Elasticsearch index. This is typically used when building a knowledge base.
- Search: Use this operation to query your stored documents. You provide a text query, and it returns the most relevant document chunks.
- Retriever: Use this operation when you want an AI agent to automatically search for relevant documents to answer a user’s question. This connects the search functionality directly to an AI model.
To use the component, first select the operation you need in the “Operation” field.
Inputs
Mapping Mode
This component has a special mode called “Mapping Mode”. When you enable this mode using the toggle switch, an additional input called “Mapping Data” is activated, and each input field offers you three different ways to provide data:
- Fixed: You type the value directly into the field.
- Mapped: You connect the output of another component to use its result as the value.
- Javascript: You write Javascript code to dynamically calculate the value.
This flexibility allows you to create more dynamic and connected workflows.
Input Fields
The following fields are available to configure this component. Each field may be visible in different operations:
- Embedding: The AI model responsible for converting text into vectors. Essential for both storing and searching.
- Visible in: Add, Search, Retriever
- Ingest Data: The actual text or document content you want to save.
- Visible in: Add
- Operation: Selects which action the component performs (Add, Search, or Retriever).
- Visible in: Add, Search, Retriever
- Elastic Cloud ID: Use this for Elastic Cloud deployments. Do not use together with ‘Elasticsearch URL’.
- Visible in: Add, Search, Retriever
- Index Name: The index name where the vectors will be stored in Elasticsearch cluster.
- Visible in: Add, Search, Retriever
- Number of Results: Number of results to return.
- Visible in: Add, Search, Retriever
- Search Query: Enter a search query. Leave empty to retrieve all documents.
- Visible in: Search
- Search Score Threshold: Minimum similarity score threshold for search results.
- Visible in: Add, Search, Retriever
- Search Type: Defines how the search is performed (e.g., basic similarity or diverse results).
- Visible in: Add, Search, Retriever
Outputs
This component produces data based on the selected operation:
- Retriever: A configured search tool that can be connected to an AI Agent to fetch relevant context for answering questions.
- Results: A list of document chunks found during a search operation, including the text, metadata, and relevance score.
- Vector Store: The internal storage object, which can be used by other advanced components in the workflow.
Output Data Example (JSON)
When using the Search operation, the output typically looks like this: json { “page_content”: “This is the text content of the first document chunk found.”, “metadata”: { “source”: “document.pdf”, “author”: “Jane Doe”, “date”: “2023-10-01” }, “score”: 0.95 }
Connectivity
This component is typically used in the following ways within a Nappai workflow:
- For Knowledge Bases: Connect the Embedding output of a “Text Splitter” or “File Reader” to the Ingest Data input of this component (Operation: Add) to store documents.
- For AI Q&A: Connect this component (Operation: Retriever) to an AI Agent or Chat Model. The AI will use this component to find relevant information before answering a user’s prompt.
- For Manual Search: Connect a Text input or Prompt to the Search Query input (Operation: Search) to manually find specific documents.
Usage Example
Scenario: Building a Document Search Engine
-
Store Documents:
- Set Operation to
Add. - Connect your file reader component to Ingest Data.
- Set Index Name to
my_company_docs. - Provide an Embedding model.
- Click “Build” or “Run” to save documents to Elasticsearch.
- Set Operation to
-
Search Documents:
- Set Operation to
Search. - In the Search Query field, type “Project Timeline” or connect it to a user’s chat message.
- Adjust Number of Results to
5. - Run the component to get the most relevant document chunks related to “Project Timeline”.
- Set Operation to
-
AI Assistant:
- Set Operation to
Retriever. - Connect this component to an LLM Chain. When a user asks a question, the LLM Chain will automatically trigger this component to find the best answer from your stored documents.
- Set Operation to
Important Notes
🔒 Secure Credential Storage Store credentials (username, password, API key) in secure secrets or environment variables rather than hard‑coding them in scripts.
⚠️ Cloud ID and URL Conflict You cannot set both the Elastic Cloud ID and a local Elasticsearch URL at the same time. Choose the correct connection method for your deployment.
⚠️ No Incremental Updates Documents are added to the store only during the initial build. Subsequent changes to the input data will not be reflected unless the component is rebuilt.
⚠️ Development‑Stage Component The component is marked as in development and may contain bugs or untested features. Use with caution in production environments.
📋 Elasticsearch Cluster Availability An Elasticsearch cluster must be running and reachable at the provided URL or Cloud ID before using this component.
📋 Vector Field Configuration The target index must contain a properly configured vector field to store embeddings. Check the index mapping before ingestion.
💡 Use API Key for Elastic Cloud When connecting to Elastic Cloud, provide the API Key and omit username/password. This is more secure and simplifies authentication.
💡 Adjust Score Threshold Carefully Keep the search score threshold low to capture relevant results. Increase it gradually if you need stricter filtering.
💡 Choose Search Type Wisely Use “similarity” for standard nearest‑neighbor queries. Use “mmr” to diversify results when relevance and variety are both important.
ℹ️ Document Ingestion Timing Documents are added to the store during the build step. Changing the input data after the build requires rebuilding the component.
ℹ️ Search Result Structure The search output includes page_content, metadata, and a similarity score for each document, facilitating easy consumption by downstream processes.
Tips and Best Practices
- Use Unique Index Names: Avoid using the default “langflow” index name for all projects. Use unique names (e.g.,
project_a_docs,project_b_docs) to prevent data mixing. - Limit Results: Keep the “Number of Results” reasonable (e.g., 4-10) to avoid overwhelming the AI or slowing down the response.
- Check Connectivity: Ensure your network allows outbound traffic to the Elasticsearch host, as firewalls may block access.
- Validate Embeddings: Ensure the embedding model you choose produces vectors compatible with your Elasticsearch index settings.
Security Considerations
- Always use API Keys for authentication in production environments instead of usernames and passwords when possible, as they can be more granularly scoped and easily rotated.
- Never hardcode sensitive information like API keys directly into your workflow code; always use the Credential input field.
- If using self-managed Elasticsearch, ensure SSL certificates are verified to protect data in transit.