Retrieval QA History

Retrieval QA History is a simple way to turn your data into a smart question‑answering assistant.
You give it a question, it looks up the most relevant information in your data, asks a language model to craft an answer, and remembers the chat so you can keep asking follow‑up questions.

How it Works

Ask a question – The user types a question into the dashboard.
Find the right data – A Retriever searches your data stores (files, databases, APIs, etc.) and returns the most relevant documents.
Ask the language model – A Language Model (LLM) is fed the retrieved documents and the conversation history.
Return an answer – The LLM produces a concise answer.
Keep the chat alive – The component stores the conversation in a Memory so later questions can reference earlier context.
Optional extras –
- Stream – Show the answer as it is being generated.
- Return Source Documents – Attach the documents that were used to build the answer, so you can see where the information came from.

All of this happens inside the Nappai dashboard; no external API keys are required beyond those you already use for your LLM and retriever.

Inputs

Model: The language model that will generate the answer.
External Memory: Retrieve messages from an external memory. If empty, it will use the nappai tables.
Retriever: The component that finds the most relevant documents for a question.
Input: The question or prompt you want the assistant to answer.
Memory top message: How many recent messages to keep in the conversation history (default 20).
Return Source Documents: If checked, the answer will include links to the documents that were used.
Stream: If checked, the answer will be streamed back to the user in real time.

System Prompt: The instruction given to the language model before it answers.
Default value

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise.

Outputs

Text: A Message object containing the assistant’s answer (and optionally the source references).
Runnable: A Runnable that can be reused or chained with other components.

Usage Example

Add the component to your flow.
Connect a Retriever (e.g., a vector store that holds your company documents).
Connect a Language Model (e.g., OpenAI GPT‑4).
Optionally connect an External Memory if you want to pull conversation history from a custom store.
Set “Stream” to true if you want the answer to appear as it is generated.
Set “Return Source Documents” if you want to see which documents were used.
Enter a question in the “Input” field or feed it from another component.
Run the flow – the assistant will answer and keep the chat history for future turns.

Tip: Use a short “System Prompt” to keep answers concise, or customize it to match your brand voice.

Retriever – Finds relevant documents from your data.
Language Model – Generates natural‑language answers.
External Memory – Stores and retrieves chat history from a custom source.

Tips and Best Practices

Keep Memory top message low (e.g., 10–20) to avoid overloading the LLM with too much context.
Enable Stream for a smoother user experience, especially for long answers.
Turn on Return Source Documents when you need auditability or want to show users where the answer came from.
If you have sensitive data, make sure the Retriever and Memory are properly secured and access‑controlled.

Security Considerations

The data retrieved by the Retriever may contain confidential information.
Ensure that only authorized users can trigger the component and that the LLM’s output is reviewed if it will be shared externally.
If using an External Memory, verify that it complies with your organization’s data‑handling policies.