Web Scraper Agent
The Web Scraper Agent is an intelligent tool designed to help you gather information from the internet without needing to write code or configure complex settings. Instead of telling the system exactly where to find data (like clicking through menus or copying specific URLs), you simply describe what you want in plain, everyday language.
For example, you might say, “Get the names and prices of all running shoes from this website,” and the agent will figure out how to navigate the site, load the pages, and compile the data for you. It works behind the scenes to handle the complexity of modern websites, including those with dynamic content that loads as you scroll.
How it Works
This component acts like a virtual assistant that browses the web for you. Here is a simple breakdown of the process:
- You Speak: You provide a target website URL and a description of the data you need (e.g., “extract product details”).
- AI Planning: The agent uses an advanced language model (like OpenAI or Ollama) to understand your request. It plans the best way to visit the website and find the information, deciding which links to click and how many pages to check.
- Automated Browsing: The agent acts like a human user, visiting the pages and interacting with the site. It can handle complex websites that use JavaScript (modern interactive sites) and automatically move through multiple pages if necessary.
- Data Collection: Once it finds the information, the agent organizes it into a clean, structured format (like a list or a table) that you can easily use in other parts of your automation workflow.
Because it is “task-agnostic,” it adapts to almost any type of website structure, meaning you don’t need to set it up differently for every new site you want to scrape.
Connection & Credentials
This component requires configuring a credential in the Nappai panel before interacting with the external service:
- Go to the Credentials section in your Nappai panel.
- Create a new credential of the type specified for this component (typically related to the AI Language Model provider, such as OpenAI or Ollama) and fill in the required fields (API Keys, tokens, etc.).
- In your workflow, select the saved credential in the Credential input field of this node.
Inputs
The following fields are available to configure this component.
-
Natural Language Query: A text description of the data you want to extract. Be as specific as possible about what information you need (e.g., “product name,” “price,” “address”).
- Visible in: All Operations
-
Target URL: The web address (website link) from which you want to extract the data.
Outputs
- Extracted Data: The final result of the scraping task. This will be a structured list or collection of the information you requested. You can map this output to other components in your workflow, such as a database to save the data or an email component to send a report.
Output Data Example (JSON)json
[ { “product_name”: “Wireless Headphones”, “price”: “$99.99”, “rating”: “4.5 stars” }, { “product_name”: “Smart Watch”, “price”: “$199.50”, “rating”: “4.2 stars” } ]
Connectivity
This component typically serves as a data source in your workflow.
- Connect TO: Components that process, store, or communicate the gathered data.
- Database Components: To save the scraped information for long-term use.
- Email/Notification Components: To send the scraped data to you or your team.
- Data Analysis/AI Components: To analyze the extracted data for insights.
It makes logical sense to connect this to downstream tools because it acts as the “feeder” of fresh, external data into your automation system.
Usage Example
Scenario: You want to monitor competitor pricing for a specific product on an e-commerce site.
- Add the Web Scraper Agent to your workflow.
- Set the Target URL to the competitor’s product page.
- Enter this query: “Extract the current price and stock status for the product.”
- Connect the Output to a Condition component.
- Configure the Condition: If the stock status is “Out of Stock,” send an email alert to the sales team.
Tips and Best Practices
- Be Specific in Queries: The more detailed your natural language request, the more accurate the results. Instead of “get data,” try “get the product title, price, and image URL.”
- Use Clear URLs: Ensure the URL you provide is stable and accessible. Avoid URLs that require immediate login without providing those credentials elsewhere in the workflow.
- Allow Time for Processing: Since the agent browses the web and processes data, it may take a few seconds longer than a simple data lookup. Avoid setting extremely tight time limits on subsequent steps.
Security Considerations
- Data Privacy: Ensure that the websites you are scraping do not contain sensitive personal information that violates privacy laws or terms of service.
- Credential Security: Keep your AI provider API keys secure. Do not share them publicly, as they allow access to powerful language models that are billed based on usage.
- Website Terms: Always respect the
robots.txtfile and terms of service of the websites you are targeting to avoid potential legal or access issues.