Ollama OCR

This component helps you get text out of pictures and PDF files automatically. It uses smart AI models (powered by Ollama) to “read” documents just like a person would. You can use this to turn scanned documents into searchable text, extract specific information from forms, or automate data entry from your files.

How it Works

When you connect an image or PDF to this node, the system sends it to an Ollama AI model. The model looks at the images inside the file and reads the text, tables, or other visual content. It returns the text it found, which you can then use for other tasks in your automation.

You can guide the AI by writing a Prompt (instructions) that tells it exactly what to look for or how to format the result. For example, you can ask it to extract only the invoice total or return the text in a specific format like JSON.

The component works with local AI servers on your computer or via a cloud service, depending on your configuration.

Connection & Credentials

This component requires configuring a credential in the Nappai panel before interacting with the external service:

Go to the Credentials section in your Nappai panel.
Create a new credential of the type Ollama Cloud API and fill in the Ollama Turbo API Key field. You can get your key from the Ollama Turbo Console.
In your workflow, select the saved credential in the Credential input field of this node.

Note: If you are using a local Ollama server, you may not need a cloud credential.

Inputs

Mapping Mode

This component has a special mode called “Mapping Mode”. When you enable this mode using the toggle switch, an additional input called “Mapping Data” is activated, and each input field offers you three different ways to provide data:

Fixed: You type the value directly into the field.
Mapped: You connect the output of another component to use its result as the value.
Javascript: You write Javascript code to dynamically calculate the value.

This flexibility allows you to create more dynamic and connected workflows.

Input Fields

The following fields are available to configure this component:

Ollama Base URL: The address where your Ollama server is running. Use http://localhost:11434 for local servers or a cloud URL if using Ollama Cloud.
Credential: Select a stored credential for Ollama Cloud. Only required when using Ollama Cloud services.
Model Name: Choose the AI vision model to use for reading text. Click the refresh button to update the list of available models.
OCR Prompt: Provide instructions to guide the AI on what to extract and how to format the result. Examples: ‘Read all the text’, ‘Extract invoice details’, or ‘Output in JSON format’.
Image/PDF Input: Connect your image or PDF file here. This accepts images, binary data, or messages containing files.
Draw Bounding Boxes: If enabled, the component returns an image with boxes drawn around the detected text. Useful for visual verification.
Process PDF Pages: If enabled, the system extracts and processes each page of a PDF separately. Good for multi-page documents.
PDF DPI: Controls the resolution when converting PDF pages to images. Higher values give better quality but may take longer to process.
Enhance Images: Applies smart preprocessing to improve readability and accuracy. Recommended setting: True.
Binarize (Black & White): Converts images to pure black and white. ⚠️ NOT recommended for modern models as it can lower quality; keep this off if possible.
Gentle Contrast: Improves contrast for low-quality or blurry images to help the AI read better. Recommended setting: True.
Subtle Sharpen: Applies slight sharpening to images to make text clearer. Recommended setting: True.
Binarize Threshold: Sets the threshold for black and white conversion. Use 0 for automatic detection, or 1-255 for manual control.
Max Tokens: Limits the maximum amount of text the AI can generate. Useful if you expect a very long output.
Temperature: Controls how deterministic the output is. 0.0 makes the AI stick strictly to the text; 1.0 allows for more creative variations.

Outputs

This component provides the text extracted from your documents, along with confidence scores and metadata. You can use these outputs in the following ways:

OCR Results: Contains the extracted text, a confidence score indicating accuracy, and processing metadata. You can map this to other components to store data, analyze content, or generate responses.

Output Data Example (JSON)

json { “text”: “Invoice #12345\nDate: 2023-10-01\nTotal: $150.00\nItem: Widget A”, “confidence_score”: 0.95, “processing_metadata”: { “model_used”: “llava”, “pages_processed”: 1, “processing_time_ms”: 1200 } }

Connectivity

This component is typically used in workflows involving document processing:

Incoming Connections: Connects to File Upload nodes, Data nodes, or Message nodes that contain images or PDFs.
Outgoing Connections: Sends data to Text Analysis nodes, Database Storage nodes, Chat Response nodes, or Data Transformation nodes that need the extracted text for further processing.

Usage Example

Scenario: Automating Invoice Processing

Upload an invoice PDF to your workflow using a File Upload node.
Connect the PDF to the Image/PDF Input of this node.
Set the Model Name to a vision model like llava or nougat.
In the OCR Prompt, enter: 'Extract invoice number, date, and total amount in JSON format.'
Enable Process PDF Pages if the invoice has multiple pages.
Run the workflow. The node returns the text in the requested format.
Connect the output to a Database node to save the invoice details automatically.

Tips and Best Practices

Choose the Right Model: Use models like llava for general text and tables. Use specialized models like nougat for complex academic documents or structured data.
Write Clear Prompts: Clear instructions help the AI extract exactly what you need. Mention the desired format (e.g., JSON, CSV) for structured results.
Enable Enhancements: Keep Enhance Images and Gentle Contrast enabled for scans or photos to improve accuracy.
Local vs. Cloud: Use local URLs for privacy and faster processing. Use Cloud credentials when you need advanced models or shared resources.
Avoid Binarization: Modern AI models perform best with color information. Avoid converting images to black and white unless necessary.
Check Confidence: Use the confidence score to verify extraction quality. If the score is low, try improving image quality or refining your prompt.

Security Considerations

Data Privacy: When using local Ollama servers, your document data stays within your infrastructure. For cloud credentials, ensure you trust the provider with your document content.
Credential Security: Store API keys securely in the Nappai Credentials panel. Never expose keys in shared workflows or public repositories.
Model Selection: Be aware that some models may retain data based on their service terms. Review model documentation if handling sensitive information.