Split Text
This component in Nappai helps you divide large blocks of text into smaller, more easily processed chunks. This is useful when working with very long documents or when your other Nappai components work better with smaller pieces of information.
Relationship with CharacterTextSplitter
This component uses a special text-splitting technique called CharacterTextSplitter
. This means it divides the text based on the number of characters, ensuring each chunk is roughly the same size.
Inputs
- Data Inputs: This is the large text you want to split into smaller pieces. It can be text from a document, a message, or any other text source within Nappai.
- Chunk Overlap: This setting controls how many characters are shared between consecutive chunks. A higher overlap helps maintain context between chunks, but creates slightly larger chunks. The default is 200 characters.
- Chunk Size: This sets the maximum number of characters in each resulting chunk. The default is 1000 characters.
- Separator: This specifies the character(s) used to separate the chunks. The default is a new line character (
\n
), meaning the text will be split into separate lines. You can change this if needed.
Outputs
The component produces a list of smaller text chunks. These chunks are ready to be used by other components in your Nappai workflow, such as those that analyze text, summarize information, or translate languages. Each chunk is a separate piece of data that can be processed independently.
Usage Example
Imagine you have a long legal document you want to summarize. You can use the “Split Text” component to break the document into smaller sections. Then, you can feed these sections into the “Summarizer” component to get a summary of each section, and finally combine those summaries for a complete overview.
Templates
This component is used in the ‘Vector Store RAG’ template.
Related Components
- Summarizer: Use this component to summarize the smaller text chunks created by “Split Text”.
- Categorizer: This component can help categorize the information within each chunk.
- Entities extraction: This component can extract key information from each chunk.
- Semantic Text Splitter: An alternative component that splits text based on meaning, rather than character count.
Tips and Best Practices
- Start with the default settings for Chunk Size and Chunk Overlap. Adjust these values if the resulting chunks are too large or too small for your needs.
- Consider the context of your data when choosing the Separator. A new line (
\n
) is usually a good default, but you might need a different separator depending on your data format. - Experiment with different Chunk Overlap values to find the optimal balance between chunk size and context preservation.
Security Considerations
No specific security considerations apply to this component. The security of your data depends on the security of the data sources and other components in your Nappai workflow.