Language Recursive Text Splitter

The Language Recursive Text Splitter takes a piece of text and breaks it into smaller, manageable pieces called chunks. It does this automatically, respecting the language of the text so that words and sentences stay intact. This is useful when you want to feed long documents into AI models that can only handle a limited amount of text at once.

How it Works

When you drop a document into the component, it first detects the language you selected (e.g., English, Spanish, Python code, etc.). It then walks through the text, cutting it into pieces that are no longer than the Chunk Size you set. If you also set a Chunk Overlap, the end of one chunk will share a few characters with the start of the next chunk. This overlap helps AI models keep context between chunks. All of this happens right inside Nappai—no external services are called.

Inputs

Input: The texts you want to split. You can paste a document, upload a file, or connect it from another component.
Chunk Overlap: The number of characters that will be repeated between consecutive chunks. A higher overlap can improve context but creates more data.
Chunk Size: The maximum number of characters each chunk can contain. Pick a size that fits the limits of the AI model you plan to use.
Code Language: The language of the text (e.g., python, javascript, english). This tells the splitter how to treat punctuation and line breaks so that code or natural language is split correctly.

Outputs

Data: The component returns a list of split chunks. Each chunk is a piece of the original text, ready to be passed to the next step in your workflow (for example, an AI model or a storage component). The output method is called split_data.

Usage Example

Add the component to your dashboard and connect a document source (e.g., a PDF loader).
Set the Chunk Size to 1000 characters and the Chunk Overlap to 200 characters.
Choose the Code Language that matches your document (e.g., english for a report, python for source code).
Run the workflow. The component will output a list of text chunks that you can feed into an AI model or store for later use.

Simple Text Splitter – Splits text into equal parts without language awareness.
Document Loader – Loads documents from files or URLs into the workflow.
AI Text Summarizer – Takes the split chunks and generates concise summaries.

Tips and Best Practices

Match the Chunk Size to your AI model: Most large‑language models have a token limit (e.g., 4096 tokens). Convert characters to tokens or use a smaller chunk size to stay within limits.
Use overlap sparingly: Too much overlap can double the amount of data you process, increasing cost and time.
Select the correct language: For code, choose the programming language; for prose, choose the natural language. This ensures punctuation and line breaks are handled properly.
Preview the output: After running, check the first few chunks to confirm they look reasonable before feeding them into downstream components.

Security Considerations

Local processing: All splitting happens inside Nappai; no data leaves your environment unless you explicitly send it to another component.
No external API calls: Because the component uses a local library, there is no risk of exposing sensitive text to third‑party services.
Data retention: Be mindful of how long you keep the split chunks, especially if they contain confidential information. Use Nappai’s data‑retention settings to delete them when no longer needed.