Skip to content

Whisper Model Base

Whisper Model Base is a simple tool that turns spoken words into written text.
You can upload an audio file, let the component do the heavy lifting, and get the resulting text back for use in your workflow or as a tool for an AI agent.

How it Works

The component uses the Whisper speech‑to‑text model that runs locally on your machine.
When you provide an audio file (in base64 format), the component first decodes it and then feeds it to Whisper.
Whisper processes the audio and returns the recognized text.
The component also prepares a “tool” representation that can be used by Nappai’s AI agents to call this functionality directly.

Inputs

Mapping Mode

This component has a special mode called “Mapping Mode”. When you enable this mode using the toggle switch, an additional input called “Mapping Data” is activated, and each input field offers you three different ways to provide data:

  • Fixed: You type the value directly into the field.
  • Mapped: You connect the output of another component to use its result as the value.
  • Javascript: You write Javascript code to dynamically calculate the value.

This flexibility allows you to create more dynamic and connected workflows.

Input Fields

  • Speech to Text: The Audio data in base64 format to be converted to text. This uses the Binary Component to upload the files.
  • Mapping Mode: Enable mapping mode to process multiple data records in batch.
  • Tool Name: The name of the tool that will be used when this component is connected as a tool. This name will be displayed to the agent when it selects tools to use.
  • Tool Description: A detailed description of what this tool does. This description will help the agent understand when and how to use this tool effectively.
  • Tools arguments metadata: Defines the arguments metadata for the tools.

Outputs

  • Data: A Data object containing the transcribed text. Use the get_data method to retrieve the text.
  • Tool: A Tool object that can be passed to an AI agent. Use the to_toolkit method to convert it into a usable tool.

Usage Example

  1. Upload an audio file
    Drag and drop an MP3 or WAV file into the “Speech to Text” input. The file will be converted to base64 automatically.

  2. Configure the tool (optional)
    If you want the component to act as a tool for an AI agent, fill in “Tool Name” and “Tool Description”.
    Leave “Tools arguments metadata” empty if the tool takes no arguments.

  3. Run the component
    Click “Run”. The component will process the audio and output the transcribed text in the Data field.
    If you configured a tool, the Tool output will be ready for the agent to use.

  4. Use the output
    Connect the Data output to a text‑processing component, or pass the Tool output to an AI agent that can call the Whisper tool on demand.

  • Binary Component – Handles file uploads and base64 conversion.
  • Data Transformation – Text – Allows you to manipulate the transcribed text (e.g., cleaning, summarizing).
  • Agent Toolkit – Lets you expose the Whisper tool to AI agents for dynamic task execution.

Tips and Best Practices

  • Keep audio files under 30 MB for faster processing.
  • Use Mapping Mode when you need to transcribe many files at once.
  • Provide clear “Tool Description” so agents know when to call the Whisper tool.
  • Verify the ffmpeg path (/usr/bin/ffmpeg) is correct on your system; otherwise the component will fail to decode audio.

Security Considerations

  • The component processes audio locally; no data is sent to external services.
  • Ensure that the ffmpeg binary is from a trusted source to avoid malicious code execution.
  • If you expose the tool to external agents, consider adding authentication or rate‑limiting to prevent abuse.