Skip to content

OpenAI Whisper

OpenAI Whisper is a simple tool that turns spoken words in an audio file into written text.
It uses OpenAI’s Whisper model to understand the audio and produce a transcription that you can use in other parts of your workflow.

How it Works

When you drop an audio file into the component, it sends the file to OpenAI’s Whisper API.
Whisper processes the audio, detects the language (or uses the language you specify), and returns the spoken words as plain text.
The component then gives you two outputs: the transcription text and a ready‑to‑use transcriber object that can be passed to other LLM‑based components.

Credentials

This component needs an OpenAI API credential.

  1. Go to the Credentials section of Nappai and create a new credential of type OpenAI API.
  2. In the component, choose that credential from the Credential dropdown.
    The API key is stored securely and is not shown in the component’s input fields.

Inputs

  • Audio File: The audio file you want to transcribe. Supported formats are mp3, mp4, wav, m4a, and mkv.
  • Message: A message that may contain files; useful when you want to pass the audio file as part of a larger message payload.
  • Language: The language of the audio file (e.g., en, es, fr). The default is English (en).

Outputs

  • Transcription Text: The plain‑text transcription of the audio.
  • LLM Audio Transcriber: A transcriber object that can be used by other components to process audio or video with LLMs.

Usage Example

  1. Add the component to your workflow.
  2. Select your OpenAI API credential in the Credential field.
  3. Upload an audio file (e.g., meeting.mp3) into the Audio File input.
  4. Choose the language of the recording (or leave it as the default English).
  5. Run the workflow.
  6. The Transcription Text output will contain the spoken words, which you can then feed into a text‑generation component or store in a database.
  • OpenAI Text Generation – Use the transcription as input for generating summaries or answers.
  • Audio File Splitter – Split long recordings into smaller chunks before transcription.
  • File Storage – Save the transcription text to a file or database.

Tips and Best Practices

  • Keep audio files under 30 minutes for faster processing.
  • Use the Language dropdown to match the audio’s language; this improves accuracy.
  • If you’re transcribing multiple files, consider using the Message input to batch them together.
  • Store the LLM Audio Transcriber output if you plan to run additional LLM operations on the same audio.

Security Considerations

  • The OpenAI API key is stored securely in Nappai’s credential store.
  • Do not expose the key in any workflow outputs or logs.
  • Ensure that only authorized users have access to the component and the credential.