CharacterTextSplitter
This component in Nappai helps you break down large blocks of text into smaller, more easily processed chunks. Think of it like cutting a long rope into shorter, usable lengths. This is especially helpful when working with lots of text data, making it easier for Nappai’s AI to analyze and understand.
Relationship with Nappai’s AI
This component prepares text data for Nappai’s AI assistants. By splitting large text inputs into smaller chunks, it ensures that the AI can process the information efficiently and accurately, leading to better results from other components in your workflow.
Inputs
- Chunk Size: This determines the maximum length (in characters) of each smaller text piece created. The default is 1000 characters. Increase this value for larger chunks, decrease it for smaller ones.
- Chunk Overlap: This sets how many characters are shared between consecutive chunks. The default is 200 characters. Overlap helps ensure that context is maintained between chunks.
- Input: This is where you provide the large text you want to split. You can input text directly or feed it from other Nappai components.
- Separator: This lets you specify characters to split the text at. For example, you could use a period (
.
) or two new lines (\n\n
). If left blank, the component will split at double new lines by default.
Outputs
This component doesn’t directly produce a visible output in the dashboard. Instead, it creates internally configured settings that are used by other Nappai components to process the smaller text chunks. These chunks are then passed on to other components in your workflow for further processing (like summarization or analysis).
Usage Example
Imagine you have a long document you want to summarize. You would feed the document into the “Input” of the CharacterTextSplitter. You might set the “Chunk Size” to 500 and “Chunk Overlap” to 100. The component then splits the document into smaller chunks, and these chunks are then sent to the “Summarizer” component to create a summary of each chunk, which are then combined to create a summary of the whole document.
Templates
This component is used in the following Nappai templates: ‘CV Match’, ‘Load Cloud DatA’
Related Components
- Summarizer: This component summarizes large bodies of text, often using the smaller chunks created by the CharacterTextSplitter.
- Categorizer: This component extracts categories from text, often working with the smaller chunks produced by the CharacterTextSplitter.
- Entities extraction: This component extracts key information from text, often benefiting from the pre-processing done by the CharacterTextSplitter.
- Many other components that process text data will benefit from the pre-processing done by the CharacterTextSplitter.
Tips and Best Practices
- Experiment with different “Chunk Size” and “Chunk Overlap” values to find the optimal settings for your data and the downstream components. Smaller chunks might be better for detailed analysis, while larger chunks might be better for maintaining context.
- Consider using the “Separator” input if your text has natural breaks (like paragraphs) that you want to align with the chunk boundaries.
Security Considerations
This component does not handle sensitive data directly. The security of your data depends on the security of the other components in your Nappai workflow and the data sources you connect to.