Skip to content

LLM Anonimyzer

The LLM Anonimyzer component lets you protect personal or confidential data in your dashboards.
It uses a language model to spot sensitive words or phrases (like names, addresses, or IDs) and replaces them with placeholders or anonymized versions. The result is a clean, privacy‑safe version of your original text that can be stored or displayed without exposing sensitive details.

How it Works

When you drop the component into a workflow, you provide it with the text you want to clean and tell it which entities to look for (e.g., “Name”, “Email”, “Phone”).
The component builds a prompt that asks the language model to extract those entities from the text.
The model returns a JSON list of the found entities.
The component then replaces each entity in the original text with a placeholder (or a redacted version) and stores the cleaned text back into the data object.
All of this happens inside Nappai, so you don’t need to write any code or call external APIs yourself.

Inputs

  • Datos: The data object that contains the text you want to anonymize.
  • Modelo de Lenguaje: Choose the language model that will perform the anonymization (e.g., OpenAI GPT‑4, local LLM).
  • Entities: Pick the types of information you want to remove, such as Name, Email, Phone, etc.
  • Prompt template: The instruction sent to the model.
    • Default example
      You are an expert NER. Your TASK is to extract only the following entities: [{entities}]. The result must be a JSON object with the entities found in the text. Return only the JSON object, no other text. Extract the entities from the following text:
      {Data}
  • Data key to anonymize: The key inside the data object that holds the text to be cleaned (default: text).
  • Entities Result Key: Where the list of extracted entities will be stored in the data object (default: entities_anonimized).
  • Remark Anonimization: If checked, the anonymized text will be wrapped with *** to highlight the changes.
  • Result Key: The key where the cleaned text will be saved (default: text_anonymized).
  • Use Word Boundaries: When enabled, the component will only replace whole words, preventing accidental splits of longer words.

Outputs

  • Datos: The original data object, now enriched with two new keys:
    • entities_anonimized: a JSON list of the entities that were found and replaced.
    • text_anonymized: the cleaned text with sensitive information removed or masked.

Usage Example

  1. Add the component to your workflow and connect the data source that contains a text field.
  2. Select a language model (e.g., “OpenAI GPT‑4”).
  3. Choose the entities you want to strip out, such as Name and Email.
  4. Leave the prompt template as default or tweak it if you need a different format.
  5. Set the keys: keep the defaults (text for input, text_anonymized for output).
  6. Run the workflow.
  7. Check the output: the Datos object will now have text_anonymized with all names and emails replaced, and entities_anonimized listing what was removed.
  • Data Cleaner – Basic text cleaning (remove punctuation, trim spaces).
  • Data Validator – Checks data against rules before processing.
  • Data Exporter – Sends cleaned data to external systems or files.

Tips and Best Practices

  • Limit the entity list to only what you truly need; a smaller list speeds up the model.
  • Test the prompt with a few sample texts to ensure the model extracts the right entities.
  • Use word boundaries if you want to avoid partial word replacements (e.g., “email” inside “emailing”).
  • Enable remark anonymization during testing to see exactly what was changed.
  • Choose a model that balances cost and accuracy; larger models may give better results but cost more.

Security Considerations

  • The component runs inside Nappai’s secure environment, so your data stays on your infrastructure unless you explicitly connect to an external LLM service.
  • If you use an external model (like OpenAI), data will be sent over HTTPS to the provider’s API. Make sure you comply with your organization’s data‑handling policies.
  • The anonymized output is stored in the same data object; keep an eye on where that object is saved or shared to avoid accidental leaks.