Easy OCR
Easy OCR is a powerful tool within the Nappai automation system that allows you to convert images into editable and searchable text. By using advanced optical character recognition (OCR) technology, this component “reads” text from scanned documents, photos, screenshots, or PDFs. This is particularly useful for automating workflows where you need to extract specific information from visual documents, such as invoices, receipts, or ID cards, and move that data into other systems.
How it Works
When you connect an image or PDF to this component, it analyzes the visual content to identify patterns that resemble text characters. It processes these pixels using a deep learning model to recognize letters, numbers, and symbols.
The component offers several ways to customize how it reads the text:
- Language Selection: You can specify which languages the component should look for (e.g., English, Spanish, French). If you leave this blank, it defaults to English and Spanish.
- Performance Optimization: You can choose to use your computer’s graphics card (GPU) to speed up the process, or use the standard processor (CPU) for compatibility.
- Formatting: You can choose to keep the text exactly as it appears in lines or group nearby lines into single paragraphs for easier reading. You can also set a “confidence threshold” to ignore text that the component is unsure about, ensuring higher quality results.
Connection & Credentials
This component does not require any external API keys or credential configuration. It runs locally within the Nappai environment once the necessary AI models are loaded.
Inputs
Input Fields
The following fields are available to configure this component. Each field may be visible in different operations:
-
Input Image: The image or PDF file from which you want to extract text. It accepts standard image formats like PNG, JPG, JPEG, BMP, GIF, and TIFF, as well as PDF files. You can connect the output of a previous step (like a file uploader or an email attachment) to this field.
- Visible in: process
-
Languages: Specify the languages of the text in the image using comma-separated codes (e.g.,
en,esfor English and Spanish). If left empty, it defaults toen,es. This helps the component know which character sets to look for, improving accuracy for multilingual documents.- Visible in: process
-
Paragraph Mode: A toggle switch. When enabled, the component groups nearby lines of text into single paragraphs to reduce line breaks. When disabled, it keeps the text in separate lines, which is useful if you need to see the original line structure.
- Visible in: process
-
Min Confidence: A numerical value between 0.0 and 1.0. This setting tells the component to ignore any text lines it is not confident about. For example, setting this to
0.6will filter out blurry or unclear text, ensuring only high-quality extractions are passed through. Invalid values default to0.5.- Visible in: process
-
Detailed Output: A toggle switch. When enabled, the output includes extra data such as the bounding boxes (location) of each text piece and its specific confidence score. When disabled, you get a simpler, cleaner text output.
- Visible in: process
-
Single Paragraph Output: A toggle switch. When enabled, it joins all the detected lines into one single block of text with normalized spacing. When disabled, it keeps each detection on a separate line, preserving the original layout structure.
- Visible in: process
Outputs
Output Data Example (JSON)
When the component successfully extracts text, it provides the result in the following format. If “Detailed Output” was enabled, the result will be a list of objects containing text and coordinates; otherwise, it is a simple string of text. json { “extracted_text”: “Thank you for your payment.\nAmount: $150.00\nDate: 2023-10-27” }
Note: If you enabled “Detailed Output,” the response will be a list of items like {"text": "Amount", "confidence": 0.95, "bbox": [...]}.
Connectivity
This component typically connects to:
- Preceding Components: Image Uploader, File Parser, Email Extractor, or Screenshot Tool. It needs a visual input (image or PDF) to function.
- Following Components: Text Manipulator, Database Writer, AI Chat Bot, or Formatter. Once the text is extracted, you usually want to save it to a database, search for keywords within it, or send it to an AI for analysis.
Usage Example
Scenario: Automating Invoice Data Entry
- Start: An email arrives with a PDF invoice attached.
- Component 1: The Email Extractor gets the PDF file.
- Component 2: The Easy OCR component takes the PDF file from step 2. You set Languages to
en. You enable Single Paragraph Output to get a clean block of text. - Component 3: A Text Manipulator or AI Assistant reads the extracted text to find the “Total Amount” and “Due Date.”
- End: The system saves these two values into your accounting spreadsheet.
Tips and Best Practices
- Language Accuracy: Always specify the languages used in the image. If you leave it blank, it defaults to English and Spanish. If you are processing a French document without specifying
fr, the OCR might misinterpret French characters. - Confidence Filtering: Use the Min Confidence setting if you are dealing with low-quality scans. Setting it to
0.7or0.8helps ensure you only get readable text, reducing errors in subsequent automation steps. - Formatting Choice: Use Paragraph Mode and Single Paragraph Output if you just need the raw content for search or AI analysis. Keep them disabled if you need to preserve the exact layout or line breaks of the original document.
- PDF Support: Remember that this component accepts PDFs directly. You do not need to convert them to images first unless the PDF is a scanned image-based PDF that the internal parser handles poorly.
Security Considerations
- Data Privacy: Ensure that the images you upload to this component do not contain sensitive personal information (like passwords or credit card numbers in plain text) if you are using a shared or cloud-based OCR service, unless your Nappai instance is securely hosted locally.
- Input Validation: The component is designed to handle standard image formats and PDFs. Avoid uploading executable files or corrupted files, as they may cause processing errors.