FAISS
FAISS is a powerful tool that lets you keep a collection of documents and quickly find the ones that best match a search query. It works with embeddings—numeric representations of text—so you can search by meaning rather than exact words.
How it Works
When you add data, FAISS turns each document into a vector using the embedding model you provide. These vectors are stored in an index that is saved to a folder you specify. Later, when you run a search, FAISS compares the query vector to all stored vectors and returns the closest matches, optionally filtering by a similarity score threshold.
Operations
This component offers several operations that you can select based on what you need to do. You can only use one operation at a time:
- Add: Store new documents in the FAISS index. You provide the data to ingest and the embedding model.
- Search: Look up documents that are most similar to a search query. You can set how many results to return and a minimum similarity score.
- Retriever: Create a retriever object that can be used by other components to fetch relevant documents on demand.
To use the component, first select the operation you need in the “Operation” field.
Inputs
The following fields are available to configure this component. Each field may be visible in different operations:
-
Embedding: The embedding model that turns text into vectors.
- Visible in: Add, Search, Retriever
-
Ingest Data: The documents you want to add to the index.
- Visible in: Add
-
Operation: Choose which operation to perform (Add, Search, or Retriever).
- Visible in: Add, Search, Retriever
-
Allow Dangerous Deserialization: Set to True to allow loading pickle files from untrusted sources. Only enable this if you trust the source of the data.
- Visible in: Add, Search, Retriever
-
Index Name: The name of the FAISS index file.
- Visible in: Add, Search, Retriever
-
Number of Results: Number of results to return.
- Visible in: Add, Search, Retriever
-
Persist Directory: Path to save the FAISS index. It will be relative to where Nappai is running.
- Visible in: Add, Search, Retriever
-
Search Query: Enter a search query. Leave empty to retrieve all documents.
- Visible in: Search
-
Search Score Threshold: Minimum similarity score threshold for search results. (when using ‘Similarity with score threshold’)
- Visible in: Add, Search, Retriever
Outputs
- Retriever: A retriever object that can be used by other components to fetch relevant documents.
- Results: A list of documents that match the search query, each with a similarity score.
- Vector Store: The underlying FAISS vector store object, useful for advanced customizations.
Usage Example
Adding Documents
- Set Operation to Add.
- Connect your embedding model to Embedding.
- Drag your document data into Ingest Data.
- Choose a folder name in Persist Directory (e.g.,
my_docs
). - Click Run. The documents are now stored in
faiss_db/my_docs
.
Searching
- Set Operation to Search.
- Connect the same embedding model to Embedding.
- Enter a query in Search Query (e.g., “machine learning”).
- Optionally adjust Number of Results and Search Score Threshold.
- Click Run. The component returns the top matching documents in Results.
Related Components
- Embeddings – Create the vector representations that FAISS uses.
- Retriever – Use the retriever output from FAISS in downstream components.
- Vector Store – General interface for storing and retrieving vectors; FAISS is one implementation.
Tips and Best Practices
- Keep the Persist Directory small and organized; large indexes can slow down loading.
- Use a consistent embedding model across all operations to ensure comparable results.
- If you only need to retrieve documents without storing them, use the Retriever operation for faster performance.
- Set Search Score Threshold carefully; a high value may return fewer results but with higher relevance.
Security Considerations
- Allow Dangerous Deserialization is set to True by default. Only enable it if you trust the source of the data, as it can load arbitrary pickle files.
- Store the FAISS index in a secure location if it contains sensitive information.
- Regularly back up the Persist Directory to prevent data loss.