OpenAI Embeddings

OpenAI Embeddings is a component that turns text into numerical vectors (embeddings) using OpenAI’s powerful language models. These vectors can be used for tasks like searching, clustering, or feeding into other AI models.

How it Works

When you give the component a piece of text, it sends that text to an OpenAI model over the internet. The model processes the text and returns a list of numbers that represent the meaning of the text. The component can split long text into smaller chunks, retry if a request fails, and show a progress bar while it’s working. All of this happens automatically, so you only need to set a few options.

Inputs

Note: This component needs an OpenAI API credential. First add an “OpenAI API” credential in Nappai’s Credentials section, then select it in the component’s “Credential” field.

Chunk Size: How many characters to send to the model at once. Larger chunks mean fewer API calls but can hit limits.
Client: Optional custom client configuration for the API request.
Default Headers: Extra HTTP headers to include with every request (e.g., authentication tokens).
Default Query: Extra query parameters to add to every request URL.
Deployment: Name of a custom deployment if you’re using Azure OpenAI.
Dimensions: Desired number of numbers in each embedding. Only some models support this.
Embedding Context Length: Maximum number of tokens the model can process in one request.
Max Retries: How many times to try again if a request fails.
Model: Choose which OpenAI embedding model to use (e.g., text-embedding-3-small).
Model Kwargs: Extra keyword arguments to pass to the model call.
OpenAI API Base: Custom base URL for the OpenAI API (useful for private endpoints).
OpenAI API Type: Type of OpenAI API (e.g., azure).
OpenAI API Version: Specific API version to use.
OpenAI Organization: Organization ID if you’re using Azure OpenAI.
OpenAI Proxy: Proxy server to route requests through.
Request Timeout: How long to wait for a response before timing out.
Show Progress Bar: Show a progress bar while embeddings are being generated.
Skip Empty: Skip empty text inputs instead of returning an error.
TikToken Enable: Enable TikToken for token counting; if disabled, you need the transformers library.
TikToken Model Name: Name of the TikToken model to use for token counting.

Outputs

Embeddings: A list of numerical vectors that represent the input text. These can be passed to other components such as similarity search or clustering.

Usage Example

Add the component to your workflow and connect a text source (e.g., a “Read File” component).
Configure the Model: Select text-embedding-3-small for quick results or text-embedding-3-large for higher quality.
Set Chunk Size: If you’re processing long documents, set a chunk size like 1000 characters.
Enable Show Progress Bar to see real‑time progress.
Run the workflow. The component will output embeddings that you can feed into a “Similarity Search” component to find related documents.

OpenAI Text Generation – Generates natural language text from prompts.
OpenAI Chat – Conversational AI using OpenAI’s chat models.
Similarity Search – Finds the most similar embeddings in a vector store.
Vector Store – Stores and retrieves embeddings for later use.

Tips and Best Practices

Use the right model: Smaller models are faster and cheaper; larger models give better quality.
Chunk wisely: Keep chunks below the model’s context length to avoid truncation.
Retry logic: Keep the default max retries (3) to handle transient network issues.
Monitor costs: Each embedding request counts toward your OpenAI usage; track it in your billing dashboard.
Secure your credentials: Store the OpenAI API key in Nappai’s credential store, not in the workflow.

Security Considerations

The component sends data to OpenAI’s servers. Ensure that any sensitive text is allowed to leave your network.
Keep your OpenAI API key secure by using Nappai’s credential system; never expose it in the workflow or logs.
If you use a proxy or custom API base, verify that the endpoint is trusted and secure.