OpenAI Embeddings
This component in Nappai uses OpenAI’s technology to turn text into numbers that represent its meaning. Think of it like translating words into a secret code that computers can easily understand and compare. This allows Nappai to perform advanced tasks like finding similar documents or understanding the relationships between different pieces of text.
Relationship with OpenAI
This component directly interacts with OpenAI’s powerful language models to generate these numerical representations (embeddings). You’ll need an OpenAI account and API key to use this component. Nappai will securely handle your API key.
Inputs
- Credential: Your OpenAI API key, which allows Nappai to access OpenAI’s services. This is securely managed within Nappai.
- Model: Choose the OpenAI model to use for generating embeddings. A default option is provided, but you can select others if needed.
- Chunk Size: Specifies how much text to process at once. Larger chunks are faster but might use more memory. The default is usually fine.
- Show Progress Bar: Choose whether to display a progress bar while the embeddings are being generated.
- Skip Empty: Choose whether to skip any empty text inputs.
Advanced Settings (These are usually left at their default values unless you have specific needs):
- Default Headers: (Advanced) Additional settings for the OpenAI API request (usually not needed).
- Default Query: (Advanced) Additional parameters for the OpenAI API request (usually not needed).
- Embedding Context Length: (Advanced) The maximum length of text that can be processed at once (usually the default is sufficient).
- Max Retries: (Advanced) The number of times to retry if the OpenAI API request fails.
- OpenAI API Base, OpenAI API Type, OpenAI API Version, OpenAI Organization, OpenAI Proxy: (Advanced) Technical settings for connecting to the OpenAI API (usually not needed).
- Request Timeout: (Advanced) How long to wait for a response from OpenAI before giving up.
- TikToken Model Name, TikToken Enable: (Advanced) Settings related to token counting (usually not needed).
- Dimensions: (Advanced) The number of dimensions for the resulting embeddings (usually not needed).
- Model Kwargs: (Advanced) Additional parameters for the chosen OpenAI model (usually not needed).
Outputs
The component produces numerical embeddings for your text. These embeddings are not directly visible but are used by other Nappai components to perform tasks such as:
- Finding similar documents: Documents with similar meanings will have similar embeddings.
- Clustering related information: Group similar documents together based on their embeddings.
- Improving search results: Find documents that are semantically relevant to a search query.
These embeddings are passed to other components in your workflow, such as vector databases.
Usage Example
Imagine you have a collection of customer support tickets. You can use the OpenAI Embeddings component to create embeddings for each ticket. Then, you can use a vector database component (like Pinecone or Weaviate) to store these embeddings. Finally, you can use a search component to quickly find similar tickets based on the meaning of the text, not just keywords.
Templates
This component is used in the following Nappai templates: ‘Vector Store RAG’, ‘CV Match’, ‘Load Cloud Data’, ‘Eurocup 2024’.
Related Components
- Semantic Text Splitter: This component breaks down large texts into smaller, semantically meaningful chunks before generating embeddings.
- Couchbase, Upstash, Chroma DB, Weaviate, Vectara, Redis, PGVector, FAISS, Astra DB, Qdrant, Pinecone, MongoDB Atlas, Milvus, Supabase, Cassandra: These are vector databases that store and search embeddings.
- Text Embedder: Another embedding generation component (this one might use different models).
Tips and Best Practices
- Start with the default settings. Only adjust advanced settings if you have a specific reason.
- Choose the appropriate OpenAI model based on your needs and budget. Smaller models are faster and cheaper, but larger models might provide better results.
- Consider using the Semantic Text Splitter component for very long texts to improve accuracy and efficiency.
Security Considerations
Always protect your OpenAI API key. Nappai securely stores your credentials, but never share them directly with others. Review OpenAI’s security best practices for additional information.