Deeplake Database
Deeplake Database is a component in Nappai that connects to a DeepLake dataset. It lets you add documents, search for similar ones, or create a retriever that can be used later in your workflow. All the heavy lifting—turning text into vectors, storing them, and finding the closest matches—is handled by DeepLake, so you can focus on building your automation logic.
How it Works
When you use Deeplake Database, Nappai talks to the DeepLake API.
- Add – You give it a set of documents and an embedding model. The component converts each document into a vector and uploads those vectors to the specified DeepLake dataset.
- Search – You provide a query text. The component turns that query into a vector and asks DeepLake for the most similar vectors in the dataset. The matching documents are returned.
- Retriever – The component builds a retriever object that can be reused later in the workflow to fetch similar documents on demand.
All communication with DeepLake is authenticated with a DeepLake API credential that you set up in Nappai’s credentials section. The component itself only needs the dataset path and the embedding model; the credential is selected automatically.
Operations
This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:
- Add: Store new documents in the DeepLake dataset.
- Search: Find documents that are most similar to a given query.
- Retriever: Create a reusable retriever object for later use.
To use the component, first select the operation you need in the “Operation” field.
Inputs
-
Embedding: The embedding model that turns text into vectors.
- Visible in: Add, Search, Retriever
-
Ingest Data: The documents you want to add to the vector store.
- Visible in: Add
-
Operation: Choose which operation to perform.
- Visible in: Add, Search, Retriever
-
Dataset path: The path or URL to your DeepLake dataset.
- Visible in: Add, Search, Retriever
-
Number of Results: How many similar documents to return when searching.
- Visible in: Add, Search, Retriever
-
Search Query: The text you want to search for. Leave empty to retrieve all documents.
- Visible in: Search
Important: This component requires a DeepLake API credential.
- First, configure the credential in Nappai’s credentials section.
- Then, select that credential in the component’s “Credential” field.
The credential fields (Username, DeepLake API Token) are not shown in the input list.
Outputs
- Retriever: A retriever object that can be used later to fetch similar documents.
- Results: The list of documents returned by a search operation.
- Vector Store: The underlying DeepLake vector store object.
Usage Example
Adding Documents
- Set Operation to Add.
- Choose an Embedding model (e.g.,
OpenAIEmbeddings
). - Provide a list of documents in Ingest Data.
- Enter the Dataset path (e.g.,
https://deep lake.com/my_dataset
). - Click Run.
The component will upload the documents and return the updated Vector Store.
Searching for Similar Documents
- Set Operation to Search.
- Choose the same Embedding model used when adding.
- Enter the Dataset path.
- Set Number of Results (e.g., 4).
- Type a Search Query (e.g., “customer churn analysis”).
- Click Run.
The component returns the most similar documents in Results.
Related Components
- OpenAI Vector Store – Stores embeddings using OpenAI’s vector store backend.
- FAISS Vector Store – Uses the FAISS library for fast similarity search.
- Chroma Vector Store – A lightweight vector store that runs locally.
Tips and Best Practices
- Use the same embedding model for both adding and searching to ensure consistent vector space.
- Keep the Number of Results reasonable (e.g., 4–10) to avoid overwhelming downstream components.
- Store your dataset in a location that is accessible from all nodes in your Nappai deployment.
- Regularly back up your DeepLake dataset to prevent accidental data loss.
Security Considerations
- The DeepLake API credential contains sensitive tokens. Store it in Nappai’s secure credential store and never expose it in logs or UI.
- Ensure that the dataset path points to a private or properly secured DeepLake instance to prevent unauthorized access.
- When sharing workflows, avoid including the credential name in public documentation; instead, instruct users to set up their own credentials.