HuggingFace

The HuggingFace component lets you generate text, summaries, translations, or other text‑based outputs by calling models hosted on Hugging Face. You can choose from popular models or provide a custom model ID, set generation parameters, and connect the output to other parts of your workflow.

How it Works

When you use this component, Nappai sends a request to the Hugging Face Inference API.

The Inference Endpoint (default https://api-inference.huggingface.co/models/) is the base URL for the request.
The API Token authenticates the request so Hugging Face knows you have permission to use the model.
The Model ID (or a custom ID you supply) tells the API which model to run.
The Input is the text prompt you want the model to process.
Optional parameters such as Max New Tokens, Temperature, Top K, Top P, Typical P, and Repetition Penalty fine‑tune how the model generates text.
If Retry Attempts is set, the component will automatically retry the request a few times if it fails.
The response is returned as a Text output (a simple message) and a Model output (the configured language model object) that can be reused elsewhere in your workflow.

Inputs

Mapping Mode

This component has a special mode called “Mapping Mode”. When you enable this mode using the toggle switch, an additional input called “Mapping Data” is activated, and each input field offers you three different ways to provide data:

Fixed: You type the value directly into the field.
Mapped: You connect the output of another component to use its result as the value.
Javascript: You write Javascript code to dynamically calculate the value.

This flexibility allows you to create more dynamic and connected workflows.

Input Fields

The following fields are available to configure this component. Each field may be visible in different operations:

Custom Model ID: Enter a custom model ID from HuggingFace Hub. Required when you select “custom” in Model ID.
API Token: Your Hugging Face API token used to authenticate requests.
Inference Endpoint: Custom inference endpoint URL. Default is https://api-inference.huggingface.co/models/.
Input: The text prompt you want the model to process.
Mapping Mode: Enable mapping mode to process multiple data records in batch.
Max New Tokens: Maximum number of tokens the model can generate.
Model ID: Select a pre‑built model from Hugging Face Hub. Choose “custom” to use a custom model ID.
Model Keyword Arguments: Additional keyword arguments to pass to the model.
Repetition Penalty: The parameter for repetition penalty. 1.0 means no penalty.
Retry Attempts: Number of times to retry the request if it fails.
Stream: Stream the response from the model. Streaming works only in Chat.
System Message: System message to pass to the model.
Task: The task to call the model with. Should be a task that returns generated_text or summary_text.
Temperature: Controls randomness of the output. Lower values make the output more deterministic.
Top K: The number of highest probability vocabulary tokens to keep for top‑k filtering.
Top P: If set to < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
Typical P: Typical Decoding mass.

Credential
This component requires a HuggingFace API credential.

In the Nappai dashboard, go to Credentials and create a new credential of type HuggingFace API.

Enter your Hugging Face API key (password).

In the component, select this credential in the Credential field.

Outputs

Text: The generated text returned by the model (method: text_response).
Model: The configured language model object (method: build_model). This can be reused by other components that need a language model.

Usage Example

Add the HuggingFace component to your workflow.
Select a model – e.g., meta-llama/Llama-3.3-70B-Instruct.
Enter your API Token (or choose the HuggingFace API credential).
Set the Inference Endpoint to the default or a custom URL.
Provide an Input: "Write a short story about a robot learning to dance."
Adjust generation settings:
- Max New Tokens: 200
- Temperature: 0.7
- Top K: 50
- Top P: 0.95
Run the workflow.
The Text output will contain the generated story, which you can feed into a Text Display component or use in downstream logic.

OpenAI – Generate text using OpenAI models.
Google Gemini – Access Google’s Gemini language models.
Azure OpenAI – Use Azure‑hosted OpenAI services.
LLM Callback – Capture and log model responses for monitoring.

Tips and Best Practices

Choose the right model size: Larger models produce higher quality text but consume more tokens and may be slower.
Use mapping mode for batch processing: When you need to generate text for many prompts, enable Mapping Mode and connect a data source.
Adjust temperature for creativity: Lower temperatures (e.g., 0.2) give more deterministic outputs; higher temperatures (e.g., 0.8) add variety.
Limit Max New Tokens: Setting a reasonable limit prevents runaway token usage and keeps costs down.
Set Repetition Penalty: A value slightly above 1.0 (e.g., 1.2) can reduce repetitive output.
Use retries sparingly: If you experience transient network issues, increase Retry Attempts; otherwise keep it low to avoid unnecessary delays.

Security Considerations

Keep your API token secret: Store it in a credential and never expose it in logs or shared workflows.
Use HTTPS endpoints: The default inference endpoint uses HTTPS; if you provide a custom endpoint, ensure it is secure.
Limit access to the component: Only users who need to generate text should have permission to configure this component.