Language Detector
The Language Detector component helps you find out which language a piece of text is written in. It works with plain text, messages, or data objects that contain text, and it adds a new field with the detected language.
How it Works
When you feed the component a piece of text (or a data object that contains text), it looks at the content and uses the PresidioLanguageDetector to figure out the language. The detector runs locally on your data, so no external API calls are made. After detection, the component writes the language code (like en
for English, es
for Spanish, etc.) into the data object under the key you specify.
Inputs
- Data or Text: The text you want to analyze. You can provide a single string, a Message object, or a Data object that contains text.
- Input Key with the text to detect: If you are using a Data object, this is the name of the column that holds the text you want to analyze. The default is
text
. - Output Key to store the detected language: The name of the column where the detected language code will be stored. The default is
language
.
Outputs
- Language: The detected language code as a simple text string.
- Data: The original data object(s) with an added field containing the detected language.
Usage Example
- Add the component to your workflow and connect it to the data source that contains a column called
text
. - Leave the Input Key as
text
(the default) and the Output Key aslanguage
(the default). - Run the workflow. The component will scan each row, detect the language, and add a new column called
language
with values likeen
,fr
,de
, etc. - Use the new
language
column in downstream components, such as a filter that only keeps English messages or a translator that sends non‑English text to a translation service.
Related Components
- Text Cleaner – Clean up text before detection.
- Sentiment Analyzer – Analyze sentiment after language detection.
- Translator – Translate text based on the detected language.
Tips and Best Practices
- Make sure the Input Key matches the exact column name in your data; otherwise, the detector will see an empty string and return an empty language code.
- If you’re working with a large dataset, consider running the component in batches to avoid memory issues.
- Use the Output Key to store the language in a column that won’t overwrite existing data.
Security Considerations
The Language Detector processes data locally and does not send any information outside your environment. No external API keys or network connections are required, so there is no risk of data leakage through third‑party services.