Structured Output
Structured Output is a component that turns the free‑form text produced by a language model into clean, well‑defined data. By giving the model a clear schema and a set of formatting instructions, you can automatically convert emails, reports, or any unstructured text into JSON objects that can be used directly in your workflows.
How it Works
- Model Selection – Choose a language model that supports structured output (e.g., OpenAI GPT‑4, Anthropic Claude).
- Prompt Construction – The component builds a system prompt that tells the model to return data in a JSON format that matches the user‑defined schema.
- Schema Building – The JSON schema you provide is turned into a Pydantic model. If you tick “Generate Multiple,” the model is wrapped to return a list of those objects.
- LLM Call – The model is called with the input message and the system prompt.
- Result Parsing – The response is parsed into the Pydantic model, then wrapped in a
Data
object and sent out asllm_structured_output
.
The whole process happens inside Nappai, so you don’t need to write any code—just fill in the fields.
Inputs
- Model: The language model to use to generate the structured output.
- Input Message: The input message to the language model.
- Format Instructions: The instructions to the language model for formatting the output.
Example default value:You are an AI system designed to extract structured information from unstructured text.Given the input_text, return a JSON object with predefined keys based on the expected structure.Extract values accurately and format them according to the specified type (e.g., string, integer, float, date).If a value is missing or cannot be determined, return a default (e.g., null, 0, or 'N/A').If multiple instances of the expected structure exist within the input_text, stream each as a separate JSON object. - Schema Name: Provide a name for the output data schema.
- Output Schema: Define the structure and data types for the model’s output.
- Generate Multiple: Set to True if the model should generate a list of outputs instead of a single output.
Outputs
- llm_structured_output: The structured data returned by the language model, wrapped in a
Data
object. It can be any JSON‑compatible structure defined by your schema.
Usage Example
You want to pull order details from a customer email.
- Model – Select OpenAI GPT‑4.
- Input Message – Paste the email text.
- Format Instructions – Leave the default value.
- Schema Name –
Order
. - Output Schema –
{"order_id": "string","customer_name": "string","items": [{"product_id": "string","quantity": "integer","price": "float"}],"total": "float","order_date": "date"}
- Generate Multiple – Leave unchecked (single order per email).
- Run the component.
- The output will be a JSON object with the order details, ready to feed into downstream components like a database writer or a notification system.
Related Components
- LLM Output – For simple text responses from a language model.
- Data Parser – To transform raw JSON into Nappai data structures.
- Database Writer – To store the structured data in a database.
Tips and Best Practices
- Keep the schema simple – Start with a few fields, then add more as you test.
- Validate the schema – Use the “Output Schema” editor to catch syntax errors before running.
- Use “Generate Multiple” when you expect several records in one message (e.g., a list of invoices).
- Test with sample data – Run the component on a few example messages to ensure the output matches your expectations.
- Avoid sensitive data – If the input contains personal information, consider anonymizing it before sending it to the LLM.
Security Considerations
- Data Privacy – The text you send to the language model may leave your local environment, depending on the provider’s policies.
- Compliance – Ensure that sending customer or financial data to an external LLM complies with your organization’s data‑handling regulations.
- Model Access – Restrict which users can configure the component to prevent accidental exposure of sensitive prompts or schemas.