Skip to content

Language Detector

The Language Detector component helps you find out which language a piece of text is written in. It works with plain text, messages, or data objects that contain text, and it adds a new field with the detected language.

How it Works

When you feed the component a piece of text (or a data object that contains text), it looks at the content and uses the PresidioLanguageDetector to figure out the language. The detector runs locally on your data, so no external API calls are made. After detection, the component writes the language code (like en for English, es for Spanish, etc.) into the data object under the key you specify.

Inputs

  • Data or Text: The text you want to analyze. You can provide a single string, a Message object, or a Data object that contains text.
  • Input Key with the text to detect: If you are using a Data object, this is the name of the column that holds the text you want to analyze. The default is text.
  • Output Key to store the detected language: The name of the column where the detected language code will be stored. The default is language.

Outputs

  • Language: The detected language code as a simple text string.
  • Data: The original data object(s) with an added field containing the detected language.

Usage Example

  1. Add the component to your workflow and connect it to the data source that contains a column called text.
  2. Leave the Input Key as text (the default) and the Output Key as language (the default).
  3. Run the workflow. The component will scan each row, detect the language, and add a new column called language with values like en, fr, de, etc.
  4. Use the new language column in downstream components, such as a filter that only keeps English messages or a translator that sends non‑English text to a translation service.
  • Text Cleaner – Clean up text before detection.
  • Sentiment Analyzer – Analyze sentiment after language detection.
  • Translator – Translate text based on the detected language.

Tips and Best Practices

  • Make sure the Input Key matches the exact column name in your data; otherwise, the detector will see an empty string and return an empty language code.
  • If you’re working with a large dataset, consider running the component in batches to avoid memory issues.
  • Use the Output Key to store the language in a column that won’t overwrite existing data.

Security Considerations

The Language Detector processes data locally and does not send any information outside your environment. No external API keys or network connections are required, so there is no risk of data leakage through third‑party services.