Gemma 4 Model

This component brings the power of Google’s Gemma 4 language model directly to your Nappai dashboard. It acts as a local AI brain for your automation workflows, allowing you to generate text, answer questions, and process data right on your computer. Unlike cloud-based AI services, this component does not require an internet connection or expensive API keys. Instead, it downloads the necessary model files once and stores them locally, ensuring your data remains private and your workflows run smoothly even offline.

How it Works

Internally, this component uses a specialized technology called LiteRT to run large language models efficiently on your device’s hardware. When you first use this component in a workflow, it automatically downloads the model files from HuggingFace (a central repository for AI models). After this initial download, all processing happens locally on your machine.

You can choose to run the model using your computer’s CPU (Central Processing Unit) or GPU (Graphics Processing Unit). The CPU is generally safer for memory usage, while the GPU (specifically on Apple Silicon devices) offers faster processing speeds but requires more RAM. The component manages these resources automatically, adjusting to your hardware to provide the best performance possible.

Connection & Credentials

This component does not require any external credentials, API keys, or login information. It is designed to work immediately after installation.

Inputs

The following fields allow you to customize how the AI behaves and how much computer memory it uses:

Model variant: Choose the size of the AI model. The “E2B” variant is smaller and lighter (ideal for most devices), while “E4B” is larger and potentially more accurate but requires more memory.
- Visible in: Standard
Backend: Select the hardware used to run the AI. Choose “cpu” for general compatibility and lower memory usage. Choose “gpu” for faster speeds on Apple Silicon devices.
- Visible in: Standard
Temperature: Control how creative or predictable the AI’s answers are. A lower value (like 0.1) makes answers factual and consistent. A higher value makes them more creative and varied.
- Visible in: Standard
Top K: Limit the AI to only consider the “K” most likely words when generating a response. This helps reduce weird or random outputs.
- Visible in: Standard
Top P: Adjust the probability threshold for word selection. This works alongside Temperature to control creativity.
- Visible in: Standard
Context window: Set the maximum amount of text the AI can “remember” at one time. This includes both your input and the AI’s output.
- Visible in: Standard
Enable thinking: Turn this on to allow the AI to “think” through complex problems internally before answering. This can improve accuracy for difficult tasks but uses more resources.
- Visible in: Standard
Thinking budget: If “Enable thinking” is on, this sets the limit on how much internal reasoning the AI can do. A value of 0 means there is no limit.
- Visible in: Standard
Custom model path: If you have already downloaded a model file locally, you can paste the path here to use that specific file instead of downloading a new one.
- Visible in: Standard

Outputs

Language Model Instance: This is the result of the component. It provides a configured AI model that can be connected to other components in your workflow, such as Chat Chains or Agents, to generate text responses.

Output Data Example (JSON)

This component outputs a functional model object, not raw text. You typically do not need to view the raw output directly but rather connect this output to a node that can use the model to generate text. json { “model_instance”: “ChatGemma4LiteRT Instance”, “status”: “ready”, “description”: “A configured language model object ready to be used by downstream nodes” }

Connectivity

This component is designed to connect to other parts of your automation workflow that require AI intelligence. Typically, you will connect the output of this Gemma 4 Model to:

Chat Chain / LLM Chain: To send text prompts to the model and receive generated responses.
Agent: To allow an automated agent to use the Gemma 4 model to reason through tasks, browse data, or make decisions.

Usage Example

Imagine you want to automate customer support responses using your local data.

Drag the Gemma 4 Model component into your workflow.
Set Model variant to “E2B” for best performance on most computers.
Set Backend to “cpu” to ensure stability.
Connect the output of this model to a Chat Chain component.
In the Chat Chain, you can define a prompt like: “Summarize this email into a short reply.”
The system will now use your local Gemma 4 model to generate the reply without sending your email data to the internet.

Tips and Best Practices

Start with CPU: Unless you have an Apple Silicon Mac and need speed, always start with the CPU backend. It is more stable and uses less memory.
Watch your Memory: If you experience slowdowns, switch to the “E2B” model variant or lower the Context window size.
Creative vs. Factual: Keep Temperature low (e.g., 0.1) for factual tasks like summarizing data. Increase it (e.g., 0.7) for creative writing or brainstorming.
Think Before You Answer: For complex logic puzzles, enable Enable thinking to let the model plan its response internally.

Security Considerations

Data Privacy: Since this component runs locally, your data is processed on your own device. No sensitive data is sent to external servers or the internet during inference.
Local Storage: The first time you run the model, it will download files from HuggingFace. Ensure you have sufficient disk space.