Split Text
The Split Text component is designed to break down text into smaller pieces based on specific criteria, such as maximum character size per piece and a separator character. This is useful for managing large blocks of text by dividing them into manageable chunks that can be processed or analyzed separately.
Relationship with CharacterTextSplitter
The Split Text component uses the CharacterTextSplitter technology to perform the text division. This allows for customized configurations like chunk size and overlap, ensuring that the text is split according to user preferences.
Inputs
- Data Inputs: The text data you want to split into smaller pieces. You can input multiple pieces of text at once.
- Chunk Overlap: The number of characters that overlap between each piece of text. Default is 200 characters.
- Chunk Size: The maximum number of characters allowed in each piece of text. Default is 1000 characters.
- Separator: The character used to divide the text. By default, this is a new line character (“\n”).
Outputs
The component produces a list of text pieces called “Chunks.” These chunks are the result of the text splitting process and can be used in other processes or components within the Nappai system.
Usage Example
Imagine you have a long document that you need to analyze in smaller sections. You can use the Split Text component to divide the document into chunks of 1000 characters each, with a 200-character overlap, using a new line as the separator. This makes it easier to process each section individually.
Templates
Currently, there are no specific templates where this component is used. However, it can be configured and integrated into various workflows within the Nappai system.
Related Components
- Text Analyzer: This component can be used to analyze the text chunks produced by the Split Text component.
- Data Merger: After processing, you can use this component to merge the chunks back into a cohesive document.
Tips and Best Practices
- Use a separator that naturally occurs in your text to avoid splitting in the middle of important information.
- Adjust the chunk size and overlap to balance between processing efficiency and context preservation.
Security Considerations
Ensure that any sensitive information within the text is handled appropriately, especially if the chunks are shared or processed in external systems.