Cassandra Vector Store
This component lets you store and search information in Nappai using Cassandra, a database optimized for speed and scalability. Imagine it as a highly organized filing cabinet that lets you find similar documents quickly based on their content.
Relationship with Cassandra
This component uses Cassandra, a distributed NoSQL database, to store and retrieve vectorized documents. This means it’s excellent for handling large amounts of data and performing fast similarity searches. You don’t need to know the technical details of Cassandra to use this component; Nappai handles the complexities for you.
Inputs
- Contact Points / Astra Database ID: The connection details for your Cassandra database. This is usually a unique identifier provided by your database service. (Required)
- Username: Your username for accessing the database (leave blank if using AstraDB).
- Keyspace: Specifies the section of your database where the data will be stored. Think of it as a folder within your filing cabinet. (Required)
- Table Name: The name of the specific table (or collection in AstraDB) where your data will be stored. This is like a specific drawer within your folder. (Required)
- TTL Seconds (Advanced): How long (in seconds) you want to keep the data before it’s automatically deleted. Leave this blank for data to be stored permanently.
- Batch Size (Advanced): The number of data items processed at once. The default is 16, and you usually don’t need to change this.
- Setup Mode (Advanced): How the database table is set up. ‘Sync’ sets it up immediately, ‘Async’ does it in the background, and ‘Off’ skips setup. The default is ‘Sync’.
- Cluster arguments (Advanced): Additional settings for your Cassandra cluster (for advanced users only).
- Search Query: The text you use to search for similar items.
- Ingest Data: The data you want to store in the database.
- Embedding: A representation of your data in a format that allows for similarity searches (provided by other Nappai components).
- Number of Results (Advanced): How many search results you want to see. The default is 4.
- Search Type (Advanced): The type of search to perform. “Similarity” finds the most similar items. Other options offer more advanced search capabilities. The default is “Similarity”.
- Search Score Threshold (Advanced): The minimum similarity score required for a result to be returned.
- Search Metadata Filter (Advanced): Allows you to filter search results based on specific criteria.
- Search Body (Advanced): Textual search terms to apply to the search query.
- Enable Body Search (Advanced): Turns on searching within the text of your documents. This must be enabled before creating the table. The default is off.
- Credential: Your database credentials for secure access.
Outputs
This component doesn’t have explicit outputs in the traditional sense. Instead, when you perform a search, it returns a list of the most similar items found in your database. These items can then be used by other Nappai components in your workflow.
Usage Example
Let’s say you have a collection of product descriptions. You can use this component to:
- Store the descriptions: First, you’d use another Nappai component to convert your product descriptions into embeddings (numerical representations of the text). Then, you’d feed these embeddings and the descriptions themselves to the Cassandra Vector Store component to store them in your database.
- Search for similar products: When a customer searches for a product, you’d use the search query to find similar products in the database using the Cassandra Vector Store component. The results would then be displayed to the customer.
Templates
[List of templates where the component is used – This section needs to be populated with actual template names from your system.]
Related Components
- VectorStoreInfo: Provides information about the vector store.
- Self Query Retriever: Uses this component to find relevant information for generating queries.
- [Other components listed in the provided text]: These components interact with the Cassandra Vector Store in various ways, such as providing data, performing searches, or processing results. (Links to documentation for these components would be helpful here.)
Tips and Best Practices
- Ensure your Cassandra database is properly configured before using this component.
- Use appropriate batch sizes for optimal performance.
- Consider using TTL (Time To Live) to manage data retention.
- For advanced searches, explore the advanced options.
Security Considerations
- Protect your database credentials. Do not hardcode them directly into your Nappai workflows. Use secure credential management practices provided by Nappai.
- Regularly review and update your database security settings.