Skip to content

Chroma DB

The Chroma DB component lets you store and search text embeddings in a local or remote Chroma vector database. It can add new documents, find similar ones, or create a retriever that other parts of your workflow can use.

How it Works

When you add data, the component takes the text, turns it into a numeric vector (an embedding), and stores that vector along with the original text in the Chroma database.
When you search, it compares the query embedding to all stored vectors and returns the ones that are most similar.
The retriever operation builds a reusable object that can later fetch relevant documents on demand, which is handy for building chat or question‑answer flows.

Operations

This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:

  • Add: Adds new documents to the vector store. You provide the embedding and the data to ingest.
  • Search: Searches the vector store for documents similar to a query embedding. You can set the number of results and a similarity threshold.
  • Retriever: Creates a retriever object that can be used by other components to fetch relevant documents on demand.

To use the component, first select the operation you need in the “Operation” field.

Inputs

  • Embedding: The numeric vector that represents the text.

    • Visible in: Add, Search, Retriever
  • Ingest Data: The actual text or data you want to store.

    • Visible in: Add
  • Operation: Choose which operation to perform (Add, Search, or Retriever).

    • Visible in: Add, Search, Retriever
  • Allow Duplicates: If false, the component will not add documents that are already in the Vector Store.

    • Visible in: Add, Search, Retriever
  • Server CORS Allow Origins: List of origins allowed to access the server.

    • Visible in: Add, Search, Retriever
  • Server gRPC Port: Port number for gRPC communication.

    • Visible in: Add, Search, Retriever
  • Server Host: Host address of the server.

    • Visible in: Add, Search, Retriever
  • Server HTTP Port: Port number for HTTP communication.

    • Visible in: Add, Search, Retriever
  • Server SSL Enabled: Enable or disable SSL for secure communication.

    • Visible in: Add, Search, Retriever
  • Collection Name: Name of the collection inside the Chroma database.

    • Visible in: Add, Search, Retriever
  • Limit: Limit the number of records to compare when Allow Duplicates is False.

    • Visible in: Add, Search, Retriever
  • Number of Results: Number of results to return.

    • Visible in: Search, Retriever
  • Persist Directory: Folder where the database files are stored.

    • Visible in: Add, Search, Retriever
  • Search Query: Enter a search query. Leave empty to retrieve all documents.

    • Visible in: Search
  • Search Score Threshold: Minimum similarity score threshold for search results. (when using ‘Similarity with score threshold’)

    • Visible in: Search, Retriever
  • Search Type: Type of search to perform. Can be ‘similarity’, ‘similarity_score_threshold’

    • Visible in: Search, Retriever

Outputs

  • Retriever: A retriever object that can be used by other components to fetch relevant documents.
  • Results: The documents returned by a search operation.
  • Vector Store: The underlying vector store object that can be passed to other components.

Usage Example

Scenario: You want to store customer support tickets and later find the most similar tickets when a new ticket arrives.

  1. Add Operation

    • Set Operation to Add.
    • Provide the Embedding for each ticket and the Ingest Data (ticket text).
    • Optionally set Allow Duplicates to false to avoid storing the same ticket twice.
    • Click Run to store the tickets.
  2. Search Operation

    • Set Operation to Search.
    • Provide the Embedding of the new ticket and set Number of Results to 5.
    • Optionally set Search Score Threshold to 0.8 to only return highly similar tickets.
    • Click Run and the component will output the 5 most similar tickets in Results.
  3. Retriever Operation (optional)

    • Set Operation to Retriever.
    • The component outputs a Retriever that can be connected to a chat component, allowing the chat to fetch relevant tickets on demand.
  • Chroma Custom – The base component that provides common vector store functionality.
  • Vector Store – General component for storing embeddings; Chroma DB is a specific implementation.
  • Retriever – Component that uses a retriever object to fetch documents during a conversation.

Tips and Best Practices

  • Keep the same embedding model for all documents to ensure consistent similarity scores.
  • Use Allow Duplicates = false when you want to avoid storing the same document multiple times.
  • Set Search Score Threshold to a higher value if you only want very close matches.
  • Store the database in a secure Persist Directory and back it up regularly.
  • If you expose the server externally, enable Server SSL Enabled and configure Server CORS Allow Origins to limit access.

Security Considerations

  • When running the server, enable SSL (Server SSL Enabled = true) to encrypt traffic.
  • Restrict CORS origins to trusted domains to prevent unauthorized cross‑origin requests.
  • Keep the Persist Directory on a secure, access‑controlled filesystem.
  • If using gRPC, ensure the port is not exposed to the public internet unless protected by a firewall or VPN.