Skip to content

WhisperLocal

This component allows you to convert spoken audio into written text directly on your device. It uses a powerful AI model called Whisper that runs locally on your computer or server. Because it processes everything locally, you do not need an active internet connection or an API key to use it, making it secure and reliable for sensitive data.

How it Works

When you connect an audio file to this component, it prepares the file for processing and then uses the Whisper AI model to listen to the audio and generate a text transcript.

The process happens in these simple steps:

  1. Preparation: The component takes the audio file you provide and ensures it is in a format the AI can understand.
  2. Processing: The AI model “listens” to the audio. Depending on the settings you choose, it can detect the language automatically or rely on a language you specify.
  3. Output: Once the audio is analyzed, the component outputs the text version of what was said.

If you upload multiple audio files at once, the component will process them one by one and combine all the text results into a single document, separating each file’s transcript with a clear break for easy reading.

Connection & Credentials

This component does not require any API keys, passwords, or external credentials. It runs entirely on your local machine, so you can start using it immediately after adding it to your workflow.

Inputs

This component has a special mode called “Mapping Mode”. When you enable this mode using the toggle switch, an additional input called “Mapping Data” is activated, and each input field offers you three different ways to provide data:

  • Fixed: You type the value directly into the field.
  • Mapped: You connect the output of another component to use its result as the value.
  • Javascript: You write Javascript code to dynamically calculate the value.

This flexibility allows you to create more dynamic and connected workflows.

Input Fields

The following fields are available to configure this component. Each field may be visible in different operations:

  • Audio: The audio file or files you want to transcribe into text. You can upload single files or multiple files at once.

    • Visible in: All modes
  • Model size: Determines how accurate the transcription is versus how fast it runs.

    • Visible in: All modes
  • Language: The language the audio is spoken in.

    • Visible in: All modes

Outputs

  • Transcript: The final text result of your audio files. If you processed multiple files, this output will contain all the transcribed text combined, with clear separators between each file’s content. You can then use this text in subsequent steps, such as summarizing the content or saving it to a database.

Output Data Example (JSON)json

{ “message”: “Transcript from File 1:\n\nHello, welcome to the meeting.\n\nTranscript from File 2:\n\nThe project timeline has been updated.” }

Connectivity

This component is typically connected to other components that handle audio files, such as:

  • Audio File Uploaders: To provide the raw audio data.
  • Text Processors: Such as “Summarize” or “Translate” components, to analyze the transcribed text further.
  • Document Storage: To save the final text transcript for record-keeping.

It works best when placed after you have collected audio data and before you need to analyze or store that text.

Usage Example

Scenario: You have a folder of recorded customer service calls and want to read them without listening to every single one.

  1. Drag and drop your audio files into the Audio input.
  2. Keep Model size as “base” for a good balance of speed and accuracy.
  3. Set Language to “auto” if the calls are in different languages, or select “en” if they are all in English.
  4. Connect the Transcript output to a “Read File” or “Text Analysis” component.
  5. Run the workflow. You will receive the text of all calls in one go, allowing you to quickly search for keywords or summaries.

Tips and Best Practices

  • Language Selection: If you know the audio is in a specific language (e.g., Spanish), select “es” instead of “auto”. This can make the transcription slightly faster because the AI doesn’t have to guess the language first.
  • File Size: Larger audio files will take longer to transcribe. If you have very long recordings, consider splitting them into smaller chunks before uploading for faster processing.
  • Memory Usage: The “base” model is recommended for most users. If you experience performance issues on older devices, try switching to “tiny”, though the transcription quality may be lower.

Security Considerations

Since this component processes data locally on your machine, your audio data is not sent to external servers. This ensures that sensitive information remains private and secure within your local environment.