Image Understanding

Use the Image Understanding component to analyze images using Google Gemini AI. This tool helps you automatically identify items in photos, generate text descriptions, or locate specific objects with precision. It is ideal for tasks such as organizing photos, extracting product details from images, or creating visual summaries of visual content.

How it Works

This component connects your uploaded image to Google’s powerful Gemini AI models. When you provide an image, the AI processes it to understand the visual content. Depending on the Tool Function you select, it will either describe the image in natural language, find objects and draw boxes around them, or create detailed “masks” that isolate specific items.

The component relies on a Google Gemini credential to authenticate the request and access the AI service. You can choose different models (like Gemini 2.0 or 2.5 Flash) to balance speed and accuracy. The results are returned as structured data (JSON) for further automation, and optionally as a visual image with drawn bounding boxes or masks.

Connection & Credentials

This component requires configuring a credential in the Nappai panel before interacting with the external service:

Go to the Credentials section in your Nappai panel.
Create a new credential of the type Google Gemini and fill in your Google Api Key.
In your workflow, select the saved credential in the Credential input field of this node.

Inputs

The following fields are available to configure this component. Each field may be visible in different operations:

Images: The image(s) you want to analyze. You can connect this from a previous step, such as a file upload or image storage component.
- Visible in: All operations
Prompt: A text instruction that guides the AI on what to look for or how to describe the image. For example, you might type “Find all red cars” or “Describe the scene.”
- Visible in: All operations
Model: The specific version of the Gemini AI model to use. Newer models like Gemini 2.5 Flash are recommended for better performance.
- Visible in: All operations
Tool Function: Select the type of analysis you want to perform. This determines what output the component will generate:
- Describe Image: Generates a natural language description of the image content.
- Object Detection: Detects objects and provides bounding box coordinates (useful for Gemini 2.0+).
- Segmentation: Detects objects with pixel-level precision masks (useful for Gemini 2.5+).
- Visible in: All operations
Generate Overlay Image: A toggle that decides if you want the component to return a visual image with boxes or masks drawn on it. Disable this if you only need the raw data to save processing time.
- Visible in: All operations
Extract Individual Objects: A toggle that decides if you want to extract each detected object as a separate, cropped image. This is useful for segmentation tasks where you want individual pieces of the image.
- Visible in: All operations

Outputs

Output Data Example (JSON)

When using Object Detection or Segmentation, the Data output will contain structured information about the objects found. Here is an example of the JSON structure you can expect: json { “objects”: [ { “label”: “Car”, “confidence”: 0.95, “bounding_box”: { “y0”: 120, “x0”: 400, “y1”: 600, “x1”: 800 }, “mask_id”: “mask_001” }, { “label”: “Person”, “confidence”: 0.88, “bounding_box”: { “y0”: 50, “x0”: 200, “y1”: 900, “x1”: 350 }, “mask_id”: “mask_002” } ], “image_url”: “https://nappai-storage.com/overlay/generated_image_123.png” }

If you select Describe Image, the Data output will primarily contain a text string describing the image content.

Connectivity

This component is typically connected to:

Image Sources: Connect the Images input to outputs from components that provide images, such as “File Upload,” “Image Storage,” or “Web Scraper” image outputs.
Data Processors: Connect the Data output to components that parse JSON, such as “Filter,” “Map,” or “Database Writer,” to automate decisions based on detected objects.
Visual Displays: If Generate Overlay Image is enabled, you can connect the output to image viewer components to see the detected objects visually.

Usage Example

Scenario: Automating Product Tagging

Input: Upload a batch of product photos using a “File Upload” component.
Configuration:
- Set Tool Function to Object Detection.
- Set Model to Gemini 2.5 Flash.
- Enable Generate Overlay Image to visually verify the detections.
- Enable Extract Individual Objects if you need to save each product separately.
Execution: The component analyzes the photos, detects items like “Shoe” or “Shirt,” and returns the coordinates and labels.
Next Step: Connect the Data output to a “Database Writer” to save the product details, or to a “Filter” to keep only images where specific objects are detected.

Tips and Best Practices

Use Clear Prompts: For the best results in Describe Image or Object Detection, write specific prompts (e.g., “Identify all safety equipment” instead of “Analyze image”).
Model Selection: Use Gemini 2.5 Flash for complex segmentation tasks as it offers pixel-level precision. For simple object detection, standard Gemini models may be sufficient and faster.
Save Processing Power: If you only need the text description or raw data, disable Generate Overlay Image to speed up your workflow.
Credential Security: Always store your Google API Key in the Nappai Credentials section. Never hardcode API keys directly in the workflow steps.

Security Considerations

API Key Safety: Ensure your Google Gemini API key is kept secure within the Nappai Credentials section. Do not share workflows containing exposed keys.
Data Privacy: Be aware that images sent to Google Gemini AI are processed by Google’s servers. Ensure your workflow complies with your organization’s data privacy policies regarding image uploads.