Speech to Text

Speech to Text is a simple tool that turns spoken words in an audio or video file into written text. You can drop a file into the dashboard, and the component will return the transcript for you to use in other parts of your workflow.

How it Works

When you provide an audio or video file, the component first checks whether the file contains video. If it does, it extracts the audio track. The audio is then split into smaller chunks so that each piece is small enough for the OpenAI Whisper model to handle. The component sends each chunk to the Whisper API, which returns the spoken words as text. All of this happens automatically, and you only see the final combined transcript.

Inputs

Before using the component, make sure you have set up an OpenAI API credential in Nappai’s Credentials section. Then select that credential in the component’s “Credential” field.

Input Fields

Input Audio o Video: Upload or provide a link to an audio or video file that you want to transcribe. The component accepts common formats such as MP3, WAV, MP4, and more.

Outputs

Output: The component returns a Message object that contains the full transcript of the audio. You can use this text in downstream components, such as text analysis, summarization, or storage.

Usage Example

Add the Speech to Text component to your workflow.
Select your OpenAI API credential in the “Credential” field.
Upload an MP3 file (or any supported audio/video file) into the “Input Audio o Video” field.
Run the workflow.
The component will output the transcript, which you can then feed into a “Text Summarizer” component to create a short summary.

Text to Speech – Convert written text back into spoken audio.
Audio Splitter – Manually split long audio files into smaller segments before transcription.
Video Processor – Extract frames or metadata from video files for visual analysis.

Tips and Best Practices

Keep your audio files under 25 MB to avoid long processing times; the component automatically splits larger files.
Use clear, high‑quality recordings to improve transcription accuracy.
If you need multiple languages, Whisper can translate, but the default is English transcription.
Store the resulting transcript in a database or document store for future reference.

Security Considerations

The OpenAI API key is stored securely in Nappai’s credential vault; never expose it in your workflow.
Audio files are sent to OpenAI’s servers for transcription. If your data is sensitive, consider using a private Whisper deployment or anonymizing the content before sending.
Ensure that any downstream components that handle the transcript also follow your organization’s data‑privacy policies.