URL Advanced
Extract information like titles, text, links, and images from web pages you specify. This component is great for automatically gathering data from websites for use in your Nappai workflows.
Relationship with CSS Selectors
This component uses CSS selectors to pinpoint the exact information you want from a webpage. Think of CSS selectors as precise instructions telling the component where to look for specific content on a website. You don’t need to know CSS to use this, but you will need to provide the selectors (which are often provided by web developer tools).
Inputs
- URLs: Enter one or more website addresses (URLs) to extract information from. You can add multiple URLs using the ’+’ button. For example:
https://www.example.com
,https://www.anotherwebsite.com
- text selectors: Enter CSS selectors to specify which text you want to extract from the web pages. (Your web developer can help you find these).
- image selectors: Enter CSS selectors to specify which images you want to extract from the web pages. (Your web developer can help you find these).
- link selectors: Enter CSS selectors to specify which links you want to extract from the web pages. (Your web developer can help you find these).
Outputs
The component produces a list of data objects. Each object contains the title, text, links, and images extracted from a single URL. This data can then be used by other components in Nappai to perform further actions, such as summarizing the text, analyzing the images, or sending the information in an email.
Usage Example
Let’s say you want to extract the title and main article text from a news website. You would:
- Enter the news article URL in the “URLs” input.
- Enter the appropriate CSS selectors for the title and article text in the “text selectors” input. (These selectors would be specific to the website’s structure).
- Run the component.
- The output will contain a data object with the extracted title and text. You can then use other Nappai components to further process this information (e.g., summarize the article).
Templates
This component is used in the ‘AI-Powered Property Description Optimizer’ template.
Related Components
- Summarizer: Use this component to summarize the extracted text.
- Entities extraction: Extract key information (like names, dates, locations) from the extracted text.
- Google Sheet Writer: Write the extracted data to a Google Sheet.
- Many more: The extracted data can be used as input for a wide variety of Nappai components depending on your needs.
Tips and Best Practices
- Test your selectors carefully: Incorrect selectors will result in no data being extracted. Use your browser’s developer tools to inspect the website’s HTML and find the correct selectors.
- Start with a single URL: Test your configuration with one URL before adding more.
- Use specific selectors: The more specific your selectors, the more accurate the results will be.
Security Considerations
- Only use this component with websites you trust. The component extracts data from the provided URLs, so ensure the websites are safe and reliable.
- Be mindful of the data you are extracting and ensure you comply with any relevant terms of service or privacy policies of the websites you are accessing.