Recursive Character Text Splitter
This component in Nappai helps you break down large blocks of text into smaller, easier-to-handle pieces. Imagine trying to read a whole novel at once – it’s much easier to read it chapter by chapter. This component does the same for your text data, making it ready for other Nappai tools.
Relationship with Nappai’s Automation System
This component works within Nappai’s automation workflow. It takes text as input, splits it, and then passes the smaller chunks to other components for further processing, such as analysis or summarization.
Inputs
- Chunk Size: This sets the maximum length (in characters) of each smaller text chunk. The default is 1000 characters. Think of this as the length of each “chapter.”
- Chunk Overlap: This determines how many characters from the end of one chunk are repeated at the beginning of the next. The default is 200 characters. This helps maintain context between chunks. It’s like having a short summary at the start of each chapter to remind you of what happened before.
- Input: This is the large text you want to split. It can be from various sources within Nappai.
- Separators: These are the characters that the component uses to naturally split the text. Examples include spaces, new lines, or even specific punctuation. If left blank, the component will use spaces, new lines, and empty strings as default separators.
Outputs
The component produces a list of smaller text chunks. These chunks are then available for use by other components in your Nappai workflow, such as the summarizer or sentiment analyzer.
Usage Example
Let’s say you have a long news article as input. You want to summarize it using Nappai. First, you’d use the “Recursive Character Text Splitter” to break the article into smaller sections. Then, you’d feed these sections into the “Summarizer” component to get a summary of each section. Finally, you could combine those summaries to get an overall summary of the news article.
Templates
This component is used in the ‘Eurocup 2024’ template.
Related Components
- Summarizer: This component summarizes large bodies of text. The output of the “Recursive Character Text Splitter” is a perfect input for this.
- Categorizer: This component extracts categories from data. Splitting text beforehand can improve its accuracy.
- Entities extraction: This component extracts key information from text. Smaller chunks of text make this process more efficient.
- Many other components: This component is widely used as a preprocessing step for many other components in Nappai that work with text.
Tips and Best Practices
- Start with the default values for “Chunk Size” and “Chunk Overlap” and adjust them as needed based on your text and the downstream components you are using.
- Experiment with different separators to optimize the splitting for your specific text.
- Consider the context of your text when choosing the “Chunk Overlap.” A larger overlap might be beneficial for maintaining context between chunks.
Security Considerations
This component does not handle sensitive data directly. Security is dependent on the security of the input data and the downstream components used in your workflow.