Skip to content

Web Scraper

The Web Scraper component lets you pull content from any website and turn it into clean, readable markdown text. You can use this markdown in reports, knowledge bases, or feed it into other parts of your automation workflow.

How it Works

When you give the component a URL (or a list of URLs separated by commas), it sends an HTTP request to that page. The component then extracts the main article or page content and converts it into markdown format. The result is a plain‑text version of the page that’s easy to read and manipulate in other steps of your workflow.

Because the component runs locally inside Nappai, it doesn’t rely on any external APIs. It simply fetches the page, parses the HTML, and outputs the markdown.

Inputs

Mapping Mode

This component has a special mode called Mapping Mode. When you enable this mode using the toggle switch, an additional input called Mapping Data is activated, and each input field offers you three different ways to provide data:

  • Fixed: You type the value directly into the field.
  • Mapped: You connect the output of another component to use its result as the value.
  • Javascript: You write Javascript code to dynamically calculate the value.

This flexibility allows you to create more dynamic and connected workflows.

Input Fields

  • Mapping Mode: Enable mapping mode to process multiple data records in batch.
  • Timeout: Timeout in milliseconds for the request.
  • Tool Description: A brief description of the tool that will be created from the scraped content.
  • Tool Name: The name you want to give to the tool that will be built from the scraped data.
  • URL: The URL to scrape. Use comma separated list for multiple URLs. This field is required.

Outputs

  • Data: A Data object containing the markdown result of the scraping operation. This can be used as input for other components that accept markdown or plain text.
  • Tool: A Tool object that can be added to your AI assistant’s knowledge base. It includes the scraped content and the description you provided, making it easy for the assistant to answer questions about that page.

Usage Example

  1. Simple scrape
    Drag the Web Scraper into your workflow.

    • Turn off Mapping Mode.
    • Set URL to https://example.com.
    • (Optional) Set Timeout to 5000.
    • Provide a Tool Name like Example Site and a Tool Description such as Information from Example.com.
    • Run the workflow.
      The Data output will contain the markdown of the page, and the Tool output will be ready to add to your assistant.
  2. Batch scrape with Mapping Mode
    Enable Mapping Mode.

    • Connect a previous component that outputs a list of URLs to the URL field in Mapped mode.
    • The Web Scraper will process each URL in the list and produce a markdown result for each one.
    • The Data output will be a collection of markdown snippets, and the Tool output will be a collection of tools, one per URL.
  • Data Processor – Clean and transform the markdown output before sending it to other steps.
  • Knowledge Base Manager – Store the generated tools in a searchable knowledge base.
  • Web Scrape Base – The underlying class that powers this component; useful for developers extending functionality.

Tips and Best Practices

  • Keep URLs short and specific; large pages can take longer to scrape.
  • Use the Timeout setting if you’re scraping sites that may respond slowly.
  • When using Mapping Mode, ensure the input list of URLs is properly formatted (comma‑separated).
  • Add a clear Tool Description so your AI assistant can explain what the tool does.
  • Test the component with a single URL first before enabling batch mode.

Security Considerations

  • Respect the target website’s robots.txt and terms of service.
  • Avoid scraping sensitive or private data unless you have explicit permission.
  • Be mindful of rate limits; adding a short delay between requests can prevent your IP from being blocked.
  • Store any scraped data securely, especially if it contains personal or confidential information.