Structured Output

Structured Output is a component that turns the free‑form text produced by a language model into clean, well‑defined data. By giving the model a clear schema and a set of formatting instructions, you can automatically convert emails, reports, or any unstructured text into JSON objects that can be used directly in your workflows.

How it Works

Model Selection – Choose a language model that supports structured output (e.g., OpenAI GPT‑4, Anthropic Claude).
Prompt Construction – The component builds a system prompt that tells the model to return data in a JSON format that matches the user‑defined schema.
Schema Building – The JSON schema you provide is turned into a Pydantic model. If you tick “Generate Multiple,” the model is wrapped to return a list of those objects.
LLM Call – The model is called with the input message and the system prompt.
Result Parsing – The response is parsed into the Pydantic model, then wrapped in a Data object and sent out as llm_structured_output.

The whole process happens inside Nappai, so you don’t need to write any code—just fill in the fields.

Inputs

Model: The language model to use to generate the structured output.
Input Message: The input message to the language model.

Format Instructions: The instructions to the language model for formatting the output.
Example default value:

You are an AI system designed to extract structured information from unstructured text.Given the input_text, return a JSON object with predefined keys based on the expected structure.Extract values accurately and format them according to the specified type (e.g., string, integer, float, date).If a value is missing or cannot be determined, return a default (e.g., null, 0, or 'N/A').If multiple instances of the expected structure exist within the input_text, stream each as a separate JSON object.

Schema Name: Provide a name for the output data schema.
Output Schema: Define the structure and data types for the model’s output.
Generate Multiple: Set to True if the model should generate a list of outputs instead of a single output.

Outputs

llm_structured_output: The structured data returned by the language model, wrapped in a Data object. It can be any JSON‑compatible structure defined by your schema.

Usage Example

You want to pull order details from a customer email.

Model – Select OpenAI GPT‑4.
Input Message – Paste the email text.
Format Instructions – Leave the default value.
Schema Name – Order.

Output Schema –

{
  "order_id": "string",
  "customer_name": "string",
  "items": [
    {
      "product_id": "string",
      "quantity": "integer",
      "price": "float"
    }
  ],
  "total": "float",
  "order_date": "date"
}

Generate Multiple – Leave unchecked (single order per email).
Run the component.
The output will be a JSON object with the order details, ready to feed into downstream components like a database writer or a notification system.

LLM Output – For simple text responses from a language model.
Data Parser – To transform raw JSON into Nappai data structures.
Database Writer – To store the structured data in a database.

Tips and Best Practices

Keep the schema simple – Start with a few fields, then add more as you test.
Validate the schema – Use the “Output Schema” editor to catch syntax errors before running.
Use “Generate Multiple” when you expect several records in one message (e.g., a list of invoices).
Test with sample data – Run the component on a few example messages to ensure the output matches your expectations.
Avoid sensitive data – If the input contains personal information, consider anonymizing it before sending it to the LLM.

Security Considerations

Data Privacy – The text you send to the language model may leave your local environment, depending on the provider’s policies.
Compliance – Ensure that sending customer or financial data to an external LLM complies with your organization’s data‑handling regulations.
Model Access – Restrict which users can configure the component to prevent accidental exposure of sensitive prompts or schemas.