Astra DB
Astra DB is a simple way to keep your data in a vector format so you can quickly find the most relevant pieces later. It talks to the Astra DB service over the internet, so you can use it from anywhere in your Nappai dashboard.
How it Works
When you use the Astra DB component, Nappai connects to your Astra DB account using the credentials you set up in the Credentials section. The component then creates or uses a collection (think of it as a table) inside Astra DB.
- Adding data – You give the component a list of documents (text, images, etc.). It turns each document into a vector (a numeric representation) using the embedding model you provide, and stores those vectors in the collection.
- Searching – When you ask for a search, the component sends your query to Astra DB. Astra DB compares the query vector with all stored vectors and returns the ones that are most similar, optionally filtered by metadata or a score threshold.
- Retrieving – The component can also give you a retriever object that you can plug into other parts of your workflow. The retriever knows how to ask Astra DB for the best matches whenever it’s needed.
All of this happens behind the scenes; you only need to fill in a few fields in the dashboard.
Operations
This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:
- Add: Store new documents in Astra DB.
- Search: Look up the most relevant documents for a given query.
- Retriever: Create a reusable retriever that can be used by other components.
To use the component, first select the operation you need in the Operation field.
Inputs
Input Fields
The following fields are available to configure this component. Each field may be visible in different operations:
-
Credential: Choose the Astra DB API credential you created in the Nappai credentials section.
- Visible in: Add, Search, Retriever
-
Embedding or Astra Vectorize: Provide an embedding model or an Astra Vectorize configuration that turns your documents into vectors.
- Visible in: Add, Search, Retriever
-
Operation: Pick which operation you want to run (Add, Search, or Retriever).
- Visible in: Add, Search, Retriever
-
Ingest Data: Upload the documents you want to store.
- Visible in: Add
-
Collection Name: The name of the collection inside Astra DB where the vectors will be stored.
- Visible in: Add, Search, Retriever
-
Batch Size: Optional number of data items to process in a single batch.
- Visible in: Add, Search, Retriever
-
Bulk Delete Concurrency: Optional concurrency level for bulk delete operations.
- Visible in: Add, Search, Retriever
-
Bulk Insert Batch Concurrency: Optional concurrency level for bulk insert operations.
- Visible in: Add, Search, Retriever
-
Bulk Insert Overwrite Concurrency: Optional concurrency level for bulk insert operations that overwrite existing data.
- Visible in: Add, Search, Retriever
-
Collection Indexing Policy: Optional dictionary defining the indexing policy for the collection.
- Visible in: Add, Search, Retriever
-
Metadata Indexing Include: Optional list of metadata fields to include in the indexing.
- Visible in: Add, Search, Retriever
-
Metadata Indexing Exclude: Optional list of metadata fields to exclude from the indexing.
- Visible in: Add, Search, Retriever
-
Metric: Optional distance metric for vector comparisons (cosine, dot_product, euclidean).
- Visible in: Add, Search, Retriever
-
Namespace: Optional namespace within Astra DB to use for the collection.
- Visible in: Add, Search, Retriever
-
Number of Results: Number of results to return from a search.
- Visible in: Add, Search, Retriever
-
Pre Delete Collection: Boolean flag to delete the collection before creating a new one.
- Visible in: Add, Search, Retriever
-
Search Metadata Filter: Optional dictionary of filters to apply to the search query.
- Visible in: Add, Search, Retriever
-
Search Query: Enter a search query. Leave empty to retrieve all documents.
- Visible in: Search
-
Search Score Threshold: Minimum similarity score threshold for search results (used with “Similarity with score threshold”).
- Visible in: Add, Search, Retriever
-
Search Type: Search type to use (Similarity, Similarity with score threshold, MMR).
- Visible in: Add, Search, Retriever
-
Setup Mode: Configuration mode for setting up the vector store (Sync, Async, Off).
- Visible in: Add, Search, Retriever
Outputs
- Retriever: A retriever object that can be used by other components to fetch relevant documents.
- Results: The list of documents returned by a search operation.
- Vector Store: The underlying Astra DB vector store object.
Usage Example
-
Add Operation
- Set Operation to Add.
- Choose your Credential and Embedding or Astra Vectorize.
- Provide a Collection Name (e.g.,
my_docs
). - Upload a few documents in Ingest Data.
- Click Run. The component will store the vectors in Astra DB.
-
Search Operation
- Change Operation to Search.
- Keep the same Credential and Collection Name.
- Enter a Search Query like “machine learning”.
- Optionally set Number of Results and Search Score Threshold.
- Click Run. The component returns the most relevant documents.
-
Retriever Operation
- Set Operation to Retriever.
- The component outputs a retriever that you can connect to other components (e.g., a summarizer) to automatically fetch relevant data during a workflow.
Related Components
- Embedding – Create the embeddings that Astra DB uses.
- Vector Store – Generic vector store component for other databases.
- Retriever – Component that uses a retriever to fetch data.
Tips and Best Practices
- Keep collection names unique to avoid accidental overwrites.
- Use the Pre Delete Collection flag only when you’re sure you want to erase existing data.
- Choose a metric that matches your data type (cosine is common for text).
- If you need fast searches, set a reasonable Number of Results and Search Score Threshold.
- Store metadata (e.g., author, date) in the documents so you can filter searches later.
Security Considerations
- The component uses a Credential that stores your Astra DB Application Token and API Endpoint securely.
- Never expose the token or endpoint directly in the dashboard; they are hidden by the credential system.
- Make sure only trusted users have access to the credential in Nappai’s credentials section.