Data Batch Chain
Chain for batching data.
How it Works
The Data Batch Chain takes a set of records (your data) and splits them into smaller groups called batches.
It then sends each batch to a language model (the “Model” input) so the model can work on the data in parallel.
The component can flatten nested JSON structures or keep them as JSON, depending on the options you set.
You can control how many batches run at the same time with the Max Concurrency setting, which helps keep your system responsive.
Because all the work happens inside Nappai, no external API calls are made unless you connect the output to another component that does.
Inputs
- Data: The raw data you want to process in batches.
- Model: The language model that will handle each batch.
- Source data input key: The key in each data record that contains the information the model should read.
- JSON Flatten: If checked, nested JSON objects are flattened into a single level before being sent to the model.
- JSON Mode: If checked, the model’s response will be returned as JSON instead of plain text.
- Max Concurrency: The maximum number of batches that can be processed at the same time.
- Output key name: The key name that will be used to store the processed results in the output data.
- prompt: The prompt template that will be sent to the language model for each batch.
Outputs
- Data: The original data enriched with the results from the language model. Each record will contain a new field (defined by Output key name) holding the model’s output.
Usage Example
- Upload your dataset into the dashboard and connect it to the Data input of the Data Batch Chain.
- Select a language model (e.g., GPT‑4) and set the Source data input key to the field that holds the text you want to analyze.
- If you want the model’s answer in JSON, tick JSON Mode.
- Adjust Max Concurrency to a value that matches your system’s capacity (e.g., 5).
- Run the workflow. The component will split the data into batches, send each batch to the model, and return the enriched data in the Data output.
Related Components
- Data Processor – Applies transformations to individual records.
- Batch Processor – Handles custom logic for each batch after it’s created.
- Data Splitter – Splits a dataset into parts based on size or other criteria.
- Data Aggregator – Combines results from multiple components into a single dataset.
Tips and Best Practices
- Keep Max Concurrency low if you’re using a free or limited‑quota model to avoid hitting rate limits.
- Use JSON Flatten when the model expects a flat input; otherwise, leave it unchecked to preserve nested structures.
- Set Output key name to something descriptive (e.g., “model_response”) so you can easily reference it later.
- Test with a small sample of data first to confirm the prompt and output format before scaling up.
Security Considerations
- All processing is performed locally within Nappai, so no data leaves your environment unless you explicitly connect the output to an external service.
- If your data contains sensitive information, ensure that the language model you use complies with your organization’s privacy policies.
- Consider encrypting data at rest and using role‑based access controls to restrict who can view or modify the workflow.