Trustcall Extractor
Trustcall Extractor is a tool in Nappai that takes raw data and turns it into clean, structured information. It uses a language model to read the data, then applies JSON patches to update or add new records in a consistent way. The result is a set of organized data that can be used by other parts of your automation workflow.
How it Works
When you feed the component with your existing data and the new data you want to extract, it sends the information to a language model. The model interprets the text and produces a list of changes (JSON patches) that describe how the existing data should be updated. The component then applies those patches locally, creating a new version of the data that reflects the extracted information. Because everything happens inside Nappai, you don’t need to call any external APIs—just provide the data and the model will do the rest.
Inputs
- Existing Data: The current data set that you want to update.
- Input Data: The new information that should be extracted and merged into the existing data.
- Model: The language model that will read the input data and generate the extraction logic.
- Schemas: The set of data schemas that define the structure of the records you want to extract.
- Enable deletes: Whether the component should remove existing records that are no longer present in the new data.
- Enable Inserts: Whether the component should add new records that are not already in the existing data.
- Enable updates: Whether the component should modify existing records that have changed.
- Max Concurrency: The maximum number of extraction jobs that can run at the same time, helping you control resource usage.
- prompt: A custom prompt you can provide to guide the language model’s extraction process.
- Schema to extract: Choose which schema to target. Selecting “any” allows the model to pick multiple schemas, while “auto” lets the model decide the best fit.
- Tool description: A short description that will appear when the component is used as a tool in other workflows.
- Tool Name: The name that will be shown for the tool when it is used in other parts of Nappai.
Outputs
- Data: The updated data set after applying the JSON patches. This output can be fed into other components that need the cleaned, structured information.
- Tool: A reusable tool object that can be invoked by other parts of the system, allowing the extraction logic to be reused without reconfiguring the component each time.
Usage Example
- Add the component to your workflow and connect the output of a data‑collection component to the Existing Data input.
- Connect the raw text you want to parse to the Input Data input.
- Choose a language model (e.g., GPT‑4) for the Model input.
- Select the relevant schema(s) in Schemas and set Schema to extract to “auto” so the model picks the best match.
- Enable Enable Inserts and Enable Updates so new and changed records are added or updated.
- Run the workflow. The component will output the cleaned data in Data, ready for the next step (e.g., storing it in a database or sending it to another service).
Related Components
- Data Cleaner – Removes unwanted fields from data sets.
- Schema Validator – Checks that data conforms to a specified schema.
- JSON Patch Applier – Applies JSON patches to existing data without using a language model.
Tips and Best Practices
- Keep the prompt concise; a clear instruction helps the model produce accurate patches.
- Use Max Concurrency wisely; setting it too high can overload your system, while too low may slow down processing.
- If you only need to add new records, disable Enable Updates to avoid accidental changes to existing data.
- Test the component with a small sample of data before running it on large batches to ensure the schema matches your expectations.
Security Considerations
- The component processes data locally, so no sensitive information leaves your Nappai instance.
- If your data contains personally identifiable information (PII), make sure the chosen language model complies with your organization’s privacy policies.
- Review the output before storing it in a shared location to avoid accidental exposure of sensitive fields.