Split Text
The Split Text component takes large blocks of text and divides them into smaller, more manageable pieces. This makes it easier to feed the text into other components, such as AI models or analysis tools, that work best with shorter inputs.
How it Works
When you drop a piece of data into the Data Inputs field, the component reads the text and splits it into chunks.
- If you set Chunk Size to a positive number, the component creates pieces that are at most that many characters long.
- Chunk Overlap lets you specify how many characters should be repeated between consecutive chunks, which helps preserve context when the text is processed later.
- The Separator field tells the component where it can safely cut the text (for example, on a newline or a period). If you leave it as the default newline, the component will split on line breaks.
- If you set Chunk Size to
0
, the component ignores the size limit and simply splits the text wherever the separator appears.
Internally, the component uses LangChain’s CharacterTextSplitter
to perform the split. The result is a list of new Data objects, each containing one chunk of the original text.
Inputs
- Data Inputs: The data you want to split. Provide one or more Data objects that contain the text to be processed.
- Chunk Overlap: Number of characters to overlap between chunks. A higher overlap keeps more context in each piece.
- Chunk Size: The maximum number of characters in each chunk. Set to
0
to split only on the separator. - Separator: The character or string that the component uses to split the text. The default is a newline (
\n
).
Outputs
- Chunks: A list of Data objects, each containing a chunk of the original text. These chunks can be fed into other components for further processing, such as summarization, sentiment analysis, or storage.
Usage Example
- Add the Split Text component to your workflow.
- Connect a data source (e.g., a “Read File” component) to the Data Inputs field.
- Set Chunk Size to
1000
and Chunk Overlap to200
. - Leave Separator as the default newline or change it to a period if you want sentence‑level splits.
- Connect the Chunks output to the next component, such as “Analyze Text” or “Store in Database”.
This setup will take a long article, break it into 1,000‑character pieces with a 200‑character overlap, and then pass each piece to the next step in your workflow.
Related Components
- Merge Text – Combine multiple pieces of text back into a single document.
- Count Words – Count the number of words in each chunk.
- Summarize Text – Generate a summary for each chunk.
Tips and Best Practices
- Choose a Chunk Size that matches your downstream component. Smaller chunks are faster but may lose context; larger chunks preserve context but can be slower.
- Use Chunk Overlap when context matters (e.g., for language models that need to see preceding text).
- Select a Separator that matches your data format. For plain text, a newline works well; for CSV or JSON, you might use a comma or a custom delimiter.
- Test with a sample of your data before running the full workflow to ensure the splits look correct.
Security Considerations
The Split Text component processes data locally and does not send any information outside of the Nappai system. Ensure that any sensitive data you feed into the component is handled according to your organization’s privacy policies.