Skip to content

DeepLake Writer

The DeepLake Writer is a tool designed to help you manage your data collections in Activeloop DeepLake directly from your Nappai automation dashboard. Think of it as a digital librarian for your vector storage: it allows you to set up new data structures, remove old ones, or add and update information stored within them. It acts as a bridge, ensuring that the data you generate in your AI workflows is correctly saved and indexed in DeepLake.

How it Works

This component connects your Nappai workflow to the DeepLake storage service. When you run this component, it performs a specific action based on the settings you choose.

  1. Authentication: First, it uses your saved credentials (Username and API Token) to securely connect to your DeepLake account.
  2. Action Selection: You tell it what to do using the “Action to Dataset” menu. You can choose to Create a new dataset, Delete an existing one, or Get (interact with) an existing dataset.
  3. Data Handling:
    • If you choose CREATE, it sets up a new dataset at the specified path and stores any data you provide.
    • If you choose DELETE, it permanently removes the dataset at that path.
    • If you choose GET, it uses the “Action to Data” setting to either Add new records to an existing dataset or Update existing records.
  4. Result: Once finished, it provides a simple success or error message so you know if the operation worked.

Note: This component operates asynchronously, meaning it works in the background. Your workflow is designed to wait for it to finish before moving to the next step.

Connection & Credentials

This component requires configuring a credential in the Nappai panel before interacting with the external service:

  1. Go to the Credentials section in your Nappai panel.
  2. Create a new credential of the type specified for this component (DeepLake API) and fill in the required fields (Username and DeepLake API Token).
  3. In your workflow, select the saved credential in the Credential input field of this node.

Operations

This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:

  • CREATE: Creates a new dataset at the specified path. You can also add initial data during this step.
  • DELETE: Permanently removes an existing dataset at the specified path.
  • GET: Interacts with an existing dataset. This does not “retrieve” data for viewing; instead, it is used to modify data (Add or Update records).

To use the component, first select the operation you need in the “Action to Dataset” field.

Inputs

The following fields are available to configure this component. Each field may be visible in different operations:

  • Data: A list of data items (such as messages or text) that you want to store in the dataset. This is used when creating a new dataset or adding/updating records in an existing one.

    • Visible in: CREATE, GET
  • Embedding: Select an embeddings model here. This helps the system understand the context of your data, making it easier to search and retrieve later.

    • Visible in: CREATE, GET
  • Retriever: A dataset or object that acts as a Retriever. This connects the writer to a specific retrieval system.

    • Visible in: CREATE, GET
  • Vector Store: The vector store from which data will be read or managed. This links the component to your specific storage location.

    • Visible in: CREATE, GET
  • Action to Data: When using the GET action, this dropdown determines how you interact with the data. Choose ADD TO DATASET to insert new records or UPDATE DATASET to modify existing ones.

    • Visible in: GET
  • Action to Dataset: The main choice for what to do with the dataset. Choose CREATE to make a new one, DELETE to remove one, or GET to add/update data in an existing one.

    • Visible in: All
  • Dataset path: The unique name or URL of the DeepLake dataset you want to manage. If creating a new dataset, this is its name. If deleting or getting data, this is the name of the dataset you are targeting.

    • Visible in: All

Outputs

Output Data Example (JSON)

The component produces a simple result object that confirms whether the operation was successful. json { “message”: “Dataset ‘my-new-dataset’ created successfully.”, “status”: “success” }

Note: If an error occurs (e.g., wrong credentials or invalid path), the message will contain details about the failure.

Connectivity

This component is typically used at the end of a data processing chain.

  • Inputs: It connects to upstream components that generate data (Text, Messages) or define embeddings. It also requires a connection to a DeepLake API credential for authentication.
  • Outputs: The Result output is a status message. While it doesn’t contain the actual data records, it tells you if the save/update operation succeeded. You can connect this result to a notification component to alert your team if a data write operation fails.

Usage Example

Scenario: Saving Chat History to DeepLake

  1. Setup Credentials: In Nappai, create a DeepLake API credential with your username and token.
  2. Configure Node:
    • Set Action to Dataset to CREATE.
    • Enter a Dataset path like app/chat_history_v1.
    • Connect your Data input to a previous node that collected user messages.
    • Select an Embedding model to help index the text.
  3. Run: The system creates the dataset and saves the messages.
  4. Verify: Check the Result output. If it says “Created successfully,” your data is safe in DeepLake.

Scenario: Updating Existing Records

  1. Setup: Change Action to Dataset to GET.
  2. Configure: Set Action to Data to UPDATE DATASET.
  3. Path: Enter the path of an existing dataset.
  4. Data: Provide new data that matches existing records to update them.

Important Notes

🔒 Valid DeepLake credentials required Provide your Deeplake username and API token; without them the component cannot connect to your dataset. 🔴 high

🔒 Protect your DeepLake API token Do not expose the token in public notebooks or logs. Store it securely and limit its scope. 🔴 high

⚠️ Dataset deletion is irreversible Choosing DELETE will permanently remove the dataset and all its contents. There is no undo; make sure you have backups if needed. 🔴 high

📋 Dataset path must be unique The dataset_path should not conflict with existing datasets unless you intend to overwrite. Duplicate paths may cause errors. 🟡 medium

⚠️ GET action only updates or adds data The GET option does not retrieve dataset content; it can only add or update records. To view dataset, use a separate retrieval component. 🟡 medium

⚠️ Component is in development Marked as in development, so some features may not work as expected. Test thoroughly before production use. 🟡 medium

💡 Use embeddings for richer data indexing Supply an embeddings model via the Embedding input; this improves search quality and retrieval speed. 🟢 low

⚙️ Specify action_to_data when using GET When performing a GET action, set action_to_data to either ADD TO DATASET or UPDATE DATASET; the default value may not work. 🟡 medium

⚙️ Data input types supported You can pass lists of Data, Message, or Text objects for batch operations. Ensure the data format matches your dataset schema. 🟢 low

ℹ️ Async execution may affect flow The component runs asynchronously; ensure your workflow supports async operations to avoid timing issues. 🟡 medium

Tips and Best Practices

  • Always test your Dataset path in a sandbox or staging environment before using it in production to avoid accidentally overwriting important data.
  • When using the GET action, ensure you clearly define whether you are Adding new records or Updating existing ones to prevent data conflicts.
  • Keep your Embedding model consistent across your workflow to ensure that data can be retrieved accurately later.
  • Regularly check the Result message for any warnings, especially when managing large datasets.

Security Considerations

  • Ensure your DeepLake API Token is stored securely in the Nappai Credentials manager and never shared in plain text within your workflow diagrams.
  • Be cautious when using the DELETE action, as it permanently erases data. Always double-check the Dataset path before confirming deletion.
  • Limit the scope of your API token to only what is necessary for the tasks this component performs.