Skip to content

Ollama

The Ollama component lets you ask a local language model to write or answer questions.
It connects to an Ollama server running on your machine (or on a local network) and sends the text you give it.
The model then returns a response that you can use in the rest of your workflow.

How it Works

When you drop the Ollama component into a dashboard, you first choose the model you want to use (for example, llama3.1).
The component builds a request that includes all the settings you entered—things like temperature, number of GPUs, or special sampling options.
It then talks to the Ollama API over HTTP, sends the request, and receives the generated text.
Because the model runs locally, the data never leaves your computer, which keeps it fast and private.

Inputs

Mapping Mode

This component has a special mode called Mapping Mode.
When you enable this mode using the toggle switch, an additional input called Mapping Data is activated, and each input field offers you three different ways to provide data:

  • Fixed: You type the value directly into the field.
  • Mapped: You connect the output of another component to use its result as the value.
  • Javascript: You write Javascript code to dynamically calculate the value.

This flexibility allows you to create more dynamic and connected workflows.

Input Fields

  • Base URL: The address of the Ollama server. If you leave it blank, it will use http://localhost:11434 by default.
  • Format: Choose the format you want the model’s answer in, such as plain text or JSON.
  • Input: The text prompt you want the model to respond to.
  • Mapping Mode: Turn this on to process many prompts at once.
  • Metadata: Add extra information that will be stored with the run for later reference.
  • Mirostat: Turn on or off special sampling that keeps the model’s output balanced.
  • Mirostat Eta: Controls how quickly the Mirostat algorithm learns. Default is 0.1.
  • Mirostat Tau: Adjusts the trade‑off between coherence and variety. Default is 5.0.
  • Model Name: Pick which Ollama model to use. A list of available models is fetched automatically.
  • Context Window Size: How many words the model can look at when generating a response. Default is 2048.
  • Number of GPUs: How many GPUs the model should use. 0 means no GPU.
  • Number of Threads: How many CPU threads the model should use. The default is chosen automatically.
  • Repeat Last N: How far back the model checks to avoid repeating text. Default is 64.
  • Repeat Penalty: How strongly the model is penalized for repeating. Default is 1.1.
  • Stop Tokens: A list of words that tell the model to stop writing. Separate them with commas.
  • Stream: If checked, the model will send its answer piece by piece as it writes it.
  • System: A short instruction that tells the model how to behave (e.g., “You are a helpful assistant.”).
  • System Message: A longer system prompt that can be used instead of the short System field.
  • Tags: Add tags to the run so you can find it later. Separate tags with commas.
  • Temperature: Controls how creative the model’s answers are. Lower is more deterministic, higher is more varied.
  • Template: A reusable prompt template that can be filled in with data from other components.
  • TFS Z: Tail‑free sampling value. Default is 1.
  • Timeout: How long the component will wait for a response before giving up.
  • Top K: Limits the model to the top K most likely words. Default is 40.
  • Top P: Works with Top K to keep the most probable words. Default is 0.9.
  • Verbose: If checked, the component will print the model’s raw output to the log.

Outputs

  • Text: The generated answer from the model. It can be used as a message, stored in a database, or displayed to a user.
  • Model: The configured language model object. It can be passed to other components that need a model reference.

Usage Example

  1. Add the Ollama component to your workflow.
  2. Set the Base URL to http://localhost:11434 (or your server’s address).
  3. Choose a Model Name such as llama3.1.
  4. Enter a prompt in the Input field, e.g., “Summarize the following paragraph: …”.
  5. Optional: Set Temperature to 0.2 for a more focused summary.
  6. Run the workflow.
  7. The Text output will contain the summary, which you can then feed into a Text Output component to display on a dashboard or store in a database.
  • Text Input – Capture user text that can be fed into the Ollama component.
  • Text Output – Show the generated text on a dashboard or send it to a chat window.
  • Prompt Template – Create reusable prompt structures that the Ollama component can fill.
  • LLM Call – A generic component for calling any language model; Ollama is a specific implementation.

Tips and Best Practices

  • Keep the Base URL and Model Name consistent across your workflow to avoid confusion.
  • Use Mapping Mode when you need to process many prompts at once, such as summarizing a batch of documents.
  • Set Temperature low (e.g., 0.1–0.3) for factual or deterministic answers; raise it for creative writing.
  • If you notice the model repeating phrases, increase Repeat Penalty or adjust Repeat Last N.
  • For long prompts, increase Context Window Size to give the model more context.
  • Turn on Verbose only when debugging; it can clutter logs otherwise.

Security Considerations

  • The Ollama component communicates with a local server, so data stays on your machine.
  • Ensure the Ollama server is only accessible to trusted users or networks.
  • If you expose the component through a public dashboard, consider adding authentication to prevent unauthorized use.