YouTube Transcripts

The YouTube Transcripts component is designed to automatically extract the spoken text (transcripts) from YouTube videos. It allows you to pull the full text of a video or split it into timed segments. Additionally, it can translate the extracted text into other languages, making it a powerful tool for analyzing video content in multiple languages.

How it Works

This component acts as a bridge between YouTube and your automation workflow. Here is a simple breakdown of how it functions:

Video Retrieval: You provide a YouTube video link. The system validates the link and identifies the specific video.
Text Extraction: The component searches for available captions or subtitles for that video. It uses robust methods to find the text, prioritizing direct APIs and falling back to alternative methods if necessary.
Formatting: Depending on your settings, it either returns the entire text as one continuous block or breaks it down into smaller “chunks” with timestamps (e.g., every minute).
Translation (Optional): If you specify a target language, the component sends the extracted text to a translation service to convert it from the original language (usually English) to your preferred language.
Output: The final text or translated text is passed to the next step in your workflow.

Connection & Credentials

This component does not require configuring a separate credential in the Nappai panel. It uses public YouTube data and standard internet connections. However, ensure your Nappai environment has the necessary library (youtube-transcript-api) installed as per the requirements below.

Operations

This component is a tool-based node and does not have separate selectable operations. It performs a single primary function: extracting and processing transcripts based on the input configurations.

Inputs

Mapping Mode

This component has a special mode called “Mapping Mode”. When you enable this mode using the toggle switch, an additional input called “Mapping Data” is activated, and each input field offers you three different ways to provide data:

Fixed: You type the value directly into the field.
Mapped: You connect the output of another component to use its result as the value.
Javascript: You write Javascript code to dynamically calculate the value.

This flexibility allows you to create more dynamic and connected workflows.

Input Fields

The following fields are available to configure this component. Each field may be visible in different operations:

Video URL: Enter the full YouTube video URL (e.g., https://www.youtube.com/watch?v=... or https://youtu.be/...). This is the only required field.
Transcript Format: Choose how you want the text returned. Select text for a single block of continuous text, or chunks if you want the text broken into segments with timestamps.
Chunk Size (seconds): If you selected chunks as your format, this setting determines the length of each segment in seconds. The default is 60 seconds.
Language: Enter a comma-separated list of language codes (e.g., en,es) to prioritize the search order. If left empty, it defaults to the video’s primary language (usually English).
Translation Language: Select a language from the dropdown to translate the transcript. Leave this blank if you want the original text.

Outputs

Output Data Example (JSON)

The component produces a Data object. The structure depends on the Transcript Format you chose:

**Example 1: Format set to “text”**json { “transcripts”: “Welcome to this tutorial on automation. In this video, we will learn how to use the Nappai system to streamline your data processes…” }

**Example 2: Format set to “chunks”**json [ { “content”: “Welcome to this tutorial on automation.”, “metadata”: { “start”: 0, “end”: 15 } }, { “content”: “In this video, we will learn how to use the Nappai system.”, “metadata”: { “start”: 15, “end”: 30 } } ]

Example 3: Error Case If the video has no captions or the URL is invalid, the output will contain an error message:json { “error”: “No transcript found for this video.” }

Connectivity

This component is typically used at the beginning of a data analysis workflow.

LanggraphReactAgent: You can connect the output to an AI Agent to analyze, summarize, or answer questions based on the video transcript.
ParseData: If you use the chunks format, you might connect this to a parsing component to further structure the timestamped segments.
TextInput: You can connect the text output to a text input node if you need to display or edit the transcript manually in a subsequent step.

Usage Example

Scenario: Summarizing a Product Review

Input: You drag the YouTube Transcripts component into your workflow.
Configuration:
- Paste the URL of a product review video into Video URL.
- Set Transcript Format to text.
- Set Translation Language to Spanish (if the video is in English and you need a Spanish summary).
Connection: Connect the Data output to the LanggraphReactAgent component.
Result: The agent receives the Spanish-translated transcript and generates a summary of the product’s pros and cons.

Important Notes

🔒 Translation Language Codes Only specific language codes (en, es, fr, de, it, pt, ru, ja, ko, hi, ar, id) are supported for translation. Using an unsupported code will result in no translation being applied.

🔒 Data Exposure with Translation When you select a translation language, the transcript text is sent to an external service. This requires a stable internet connection and may affect privacy.

🔒 Use Caution with Sensitive Content Do not use this component for videos containing private or confidential information if you enable translation, as the text may be exposed to external services.

⚠️ No Captions on Video If the YouTube video does not have captions or subtitles, the component cannot generate a transcript and will return an error.

⚠️ Language Availability If the specific language code you requested is not available for the video, the system will fall back to English captions, which may not match your expectations.

📋 Dependency Installation Ensure that the youtube-transcript-api library is installed in your environment (pip install youtube-transcript-api) before using this component.

💡 Choose Format Wisely Use text for quick, single outputs. Use chunks when you need timestamps or want to analyze specific parts of the video. Selecting the right format improves readability.

💡 Use Standard YouTube URLs Enter the full watch URL (youtube.com/watch?v=...) or the shortened link (youtu.be/...). Avoid embedded player URLs as they may not work correctly.

💡 Adjust Chunk Size for Length For very long videos, increase the Chunk Size (e.g., to 120 seconds) to reduce the number of chunks and improve performance. The default 60 seconds is usually sufficient.

ℹ️ Error Output Format When an error occurs (e.g., invalid URL or missing captions), the component returns a Data object with an error field. Always check this field to understand why the operation failed.

⚠️ URL Format Restriction The component only accepts URLs from youtube.com or youtu.be. Links from other sites or malformed URLs will result in an error.

🟡 Chunk Timestamp Accuracy When using chunks, the timestamps are approximate. They may not align perfectly with the exact moments in the original video.

Tips and Best Practices

Always use the standard youtube.com/watch?v= format for URLs to avoid parsing errors.
If you only need the content for reading, use the text format for better readability.
If you need to cite specific parts of a video, use the chunks format and the LanggraphReactAgent to find specific time-stamped answers.
Be mindful of privacy when using the Translation Language feature, as text is sent to external servers.

Security Considerations

Privacy: When using the Translation Language feature, the transcript text is sent to an external translation service. Do not use this feature with videos containing sensitive, private, or confidential information.
Data Exposure: Ensure that the transcript output is handled securely in subsequent steps of your workflow, especially if it contains personally identifiable information (PII) from the video content.