Chunking Component
The Chunking Component creates a Langgraph Chunking Agent that splits large blocks of text into smaller, more manageable pieces. These pieces, or “chunks,” can then be used by other parts of your automation workflow, such as language‑model analysis or data indexing.
How it Works
When you add this component to your dashboard, it sets up a Langgraph Chunking Agent. The agent takes the text you provide and divides it into chunks based on the settings you choose—such as how many words or characters each chunk should contain. All of this happens locally on your machine or server, so no external APIs or services are involved.
Inputs
The component inherits the standard inputs from the base ChunkingComponent
. Typical inputs include:
-
Input Text: The raw text you want to split into chunks.
Visible in: All operations (the component has no separate operations, so this field is always visible). -
Chunk Size: The maximum number of words or characters each chunk should contain.
Visible in: All operations. -
Overlap: The number of words or characters that should overlap between consecutive chunks (useful for maintaining context).
Visible in: All operations.
These inputs allow you to control how the text is divided and how much context is shared between chunks.
Outputs
- Chunks: A list (or array) of text segments produced by the agent. Each chunk is a separate piece of the original text, ready to be passed to downstream components such as a language‑model or a data‑storage component.
Usage Example
- Add the component to your workflow.
- Connect the source of your long text (e.g., a document upload component) to the Input Text field.
- Set the Chunk Size to a value that fits your downstream processing needs (e.g., 200 words).
- Optionally set Overlap if you want context to carry over between chunks.
- Run the workflow.
- Use the output “Chunks” as input for a summarization component, a language‑model component, or any other downstream task.
Related Components
- Text Summarization Component – Summarizes each chunk into a concise paragraph.
- Language Model Component – Feeds chunks into an AI model for analysis or generation.
- Data Storage Component – Stores the chunks in a database or file system for later retrieval.
Tips and Best Practices
- Choose a chunk size that balances detail and speed: Smaller chunks mean more calls to downstream services but can improve accuracy.
- Use overlap when context matters: If you’re feeding chunks to a language model, a small overlap (e.g., 10–20 words) can help maintain continuity.
- Test with a sample: Run the component on a short text first to confirm the chunking behavior before scaling up.
- Keep an eye on memory usage: Very large texts can produce many chunks, which may consume significant memory.
Security Considerations
- The component processes data locally, so no text is sent outside your environment.
- Ensure that any sensitive or confidential text is handled according to your organization’s data‑privacy policies.
- If you’re integrating with downstream services that transmit data externally, review those services’ security settings as well.