Skip to content

Astra Vectorize

Astra Vectorize lets you set up how your data is turned into embeddings (vector representations) inside Astra DB. By choosing a provider and a model, you tell Astra how to generate vectors for your documents, which can then be used for search, similarity, or other AI tasks.

How it Works

When you add this component to a workflow, it builds a configuration dictionary that Astra DB uses to create embeddings.

  1. Provider – Pick the service that will generate the embeddings (e.g., OpenAI, Hugging Face, Jina AI).
  2. Model Name – Specify the exact model from that provider that you want to use.
  3. Authentication – The component pulls the API key and other authentication details from a DATASTAX Api credential you set up in Nappai.
  4. Model Parameters – Optional extra settings that the provider may accept (e.g., temperature, max tokens).

The component does not perform any embedding itself; it only prepares the options that Astra DB will use when you later run a vectorization step.

Inputs

  • Provider: Choose the embedding service that will generate the vectors.
  • Model Name: Enter the exact name of the model you want to use for the selected provider.
  • Model Parameters: Provide any additional parameters required by the provider (optional).

Credential Setup
This component requires a DATASTAX Api credential.

  1. In Nappai, go to the Credentials section and create a new DATASTAX Api credential.
  2. Enter the API Key name and Provider API Key as requested.
  3. In the component, select this credential in the Credential field.
    The credential supplies the necessary authentication information to Astra DB.

Outputs

  • Vectorize: A dictionary containing the configuration that Astra DB will use to generate embeddings. This output can be passed to other components that perform the actual vectorization or storage.

Usage Example

  1. Add the Astra Vectorize component to your workflow.
  2. Select a credential: Choose the previously created DATASTAX Api credential.
  3. Configure inputs:
    • Provider: OpenAI
    • Model Name: text-embedding-3-small
    • Model Parameters: (leave blank or add optional settings)
  4. Connect the output (Vectorize) to a component that creates or updates a vector collection in Astra DB.

When the workflow runs, Astra DB will use the OpenAI model to generate embeddings for your data.

  • Astra Search – Perform vector or text search on a collection.
  • Astra Vector Store – Store and retrieve vectors in Astra DB.
  • Astra Query – Run custom queries against your vector data.

Tips and Best Practices

  • Choose the right provider: Some providers offer free tiers, while others may incur costs.
  • Use the correct model name: Refer to the provider’s documentation or the list shown in the component’s help text.
  • Keep credentials secure: Never expose your API keys in public workflows.
  • Test with a small dataset first to confirm the embeddings look correct before scaling up.

Security Considerations

  • The component relies on a DATASTAX Api credential, which stores API keys securely in Nappai.
  • Ensure that only authorized users have access to the credential.
  • Do not share the output dictionary directly; it contains sensitive configuration details.