Skip to content

DeepLake Reader

The DeepLake Reader is a bridge between your Nappai automation workflows and data stored in Activeloop DeepLake. It allows you to pull information directly from your datasets, either by grabbing a set list of records or by asking a specific question to find the most relevant answers.

Think of this component as a “library assistant” for your data. You can tell it to “bring me the first 10 books” (retrieve records) or “find me books about AI” (run a semantic query). It handles the connection to DeepLake, ensuring you get the right data for your downstream tasks.

How it Works

This component connects to the DeepLake platform using your specific username and API token. Once connected, it operates in two main ways depending on how you set it up:

  1. Direct Retrieval: You can ask it to fetch a specific number of records from your dataset. This is like opening a folder and taking the first few files. It returns a list of data objects that you can then use in other parts of your workflow.
  2. Semantic Search (Querying): If you connect an Embedding model, you can ask a natural language question (e.g., “What are the customer complaints from last week?”). The component uses artificial intelligence to understand the meaning of your question and find the data records that best match it, rather than just looking for exact keyword matches.

The component is designed to be efficient, limiting the number of records it fetches to prevent slowing down your system.

Connection & Credentials

To access your DeepLake data securely, this component requires a pre-configured credential. You cannot enter your username and password directly into the component fields; instead, you must set them up in the Nappai credential manager first.

  1. Go to the Credentials section in your Nappai panel.
  2. Create a new credential of the type DeepLake API.
  3. Fill in the required fields:
    • Username: Your DeepLake username.
    • DeepLake API Token: Your secure API token (format: <DEEPLAKE_API_TOKEN>).
  4. Save the credential.
  5. In your workflow, click the Credential input field on the DeepLake Reader node and select the credential you just created.

Operations

This component does not have selectable operations. It dynamically decides how to fetch data based on the inputs you provide (specifically, whether you provide a Query and Embedding or just a Dataset path).

  • Records Mode: Triggered when you provide a Dataset path and optionally a Vector Store. It fetches raw data records.
  • Query Mode: Triggered when you also provide a Query string and an Embedding. It performs a semantic search to answer your question.

Inputs

The following fields are available to configure this component.

  • Vector Store: Reference to an existing Vector Store instance. Use this if you want to read from a specific stored vector index rather than the raw dataset path.

    • Visible in: Records, Query Answer
  • Dataset path: The URL or path to your DeepLake dataset. This is the primary location where the component looks for your data.

    • Visible in: Records, Query Answer
  • Max Records: An integer value that sets the maximum number of records the component will return. This helps control performance and memory usage. Default is 10.

    • Visible in: Records, Query Answer
  • Query: A text string representing a question or search term. When used with an Embedding, this allows for semantic search (finding answers by meaning, not just keywords).

    • Visible in: Query Answer
  • Embedding: A connection to an Embedding model. This is required if you want to perform semantic search (Query Answer). It converts your text question into a format the AI can understand to find similar data.

    • Visible in: Query Answer

Outputs

The component produces data that can be connected to other nodes in your workflow.

  • Records: A list of data objects extracted directly from the dataset. This output contains the raw fields of the records you requested. Use this for basic data retrieval tasks.
  • Query Answer: A list of data objects that are the best matches for your specific question. This output is ideal if you used the Query and Embedding inputs to find specific insights from your data.

Output Data Example (JSON)

json [ { “id”: “rec_12345”, “customer_name”: “John Doe”, “issue”: “Login failure”, “severity”: “High”, “date”: “2023-10-01” }, { “id”: “rec_12346”, “customer_name”: “Jane Smith”, “issue”: “Billing error”, “severity”: “Medium”, “date”: “2023-10-02” } ] Note: The actual fields will depend on the structure of your DeepLake dataset.

Connectivity

  • Connect FROM: This component is typically connected from another component that provides data or configuration, such as a DeepLake Vector Store component (for the Vector Store input) or an Embedding model component (for the Embedding input). It acts as a data source node.
  • Connect TO: The outputs (Records or Query Answer) should be connected to components that process data, such as:
    • LLM (Large Language Model): To analyze the retrieved records or generate a summary.
    • Data Processor / Transformer: To clean or format the data for reporting.
    • Output / Writer: To save the retrieved data to another system.

Usage Example

Scenario: You want to find recent customer support tickets related to “billing errors” in your DeepLake dataset.

  1. Configure Credential: Set up your DeepLake API credential in Nappai.
  2. Setup DeepLake Reader:
    • Dataset path: Enter your dataset URL (e.g., app://my-dataset).
    • Credential: Select your saved DeepLake API credential.
    • Max Records: Set to 5 to limit the results to the 5 most relevant items.
    • Query: Enter billing errors.
    • Embedding: Connect this input to your Embedding model (e.g., Embeddings).
    • Vector Store: Leave blank if using the dataset path directly.
  3. Connect Output: Connect the Query Answer output to an LLM node.
  4. LLM Prompt: Configure the LLM to say: “Here are the records related to billing errors: {Query Answer}. Please summarize the common issues.”
  5. Result: The LLM will generate a summary of the billing issues based on the retrieved data.

Important Notes

🔒 Protect Your API Token Do not expose the DeepLake API token in shared flows or logs. Treat it like a password and keep it confidential.

⚠️ Requires DeepLake Credentials A valid DeepLake username and API token must be supplied. Without them, the component cannot access the dataset and will fail.

⚠️ Maximum Record Count The component only returns up to the number specified in Max Records (default 10). Larger datasets will be truncated, so adjust this value if you need more records.

⚠️ DeepLake‑Only Data Source This component reads only from a DeepLake dataset. It cannot process local files or other storage types, so ensure your data is hosted on DeepLake.

📋 DeepLake Library Installation Before using the component, install the DeepLake Python library and its dependencies. This is required for the component to communicate with the dataset.

💡 Use Specific Queries for Faster Results Provide a focused query string to retrieve only relevant records. Broad or empty queries will return many rows and slow down the process.

💡 Provide Embeddings for Semantic Search If you supply an Embeddings input, the component will use semantic similarity to answer queries. Without embeddings, queries rely on simple keyword matching.

📋 Correct Dataset Path Provide the exact DeepLake dataset URL or local path. A wrong path will result in a fetch error and no records returned.

⚙️ vector_store Input Usage If you pass a vector_store, the component uses it instead of the dataset path. Ensure the vector_store is a valid VectorStore output from another component.

ℹ️ Records Returned as Data Objects The component outputs a list of Data objects, each containing the record fields and metadata. Map these fields appropriately when consuming the output.

ℹ️ Runtime Errors Are Propagated If the component cannot fetch records, it raises a RuntimeError with a clear message. Check the error details to troubleshoot issues.

⚙️ Component in Development The component is marked as in development, which may mean occasional bugs or changes. Use it with caution in production workflows.

Tips and Best Practices

  • Always start with a low Max Records value (e.g., 5-10) to test your query and ensure it returns the correct data type before increasing the limit for production.
  • Ensure your DeepLake dataset is properly indexed if you plan to use the Vector Store input for faster retrieval.
  • Use clear, specific language in your Query field to get the most accurate semantic search results.

Security Considerations

  • API Token Security: The component relies on a DeepLake API Token. Never hardcode this token in your workflow code. Always use the Nappai Credential system to inject it securely.
  • HTTPS Enforcement: Ensure your connection to DeepLake uses HTTPS to protect credentials and data in transit. The component expects a secure connection.
  • Credential Isolation: Since this component accesses sensitive data, ensure that workflows containing the DeepLake Reader are only accessible to authorized users.