Website

The Website component is an application component that allows users to scrape websites. It can carry out the following tasks:

#Release Stage

Alpha

#Configuration

The component configuration is defined and maintained here.

#Supported Tasks

#Scrape Website

Scrape the website contents.

InputIDTypeDescription
Task ID (required)taskstringTASK_SCRAPE_WEBSITE
Query (required)target-urlstringThe root URL to scrape. All links on this page will be scraped, and all links on those pages, and so on.
Allowed Domainsallowed-domainsarray[string]A list of domains that are allowed to be scraped. If empty, all domains are allowed.
Max Number of Pages (required)max-kintegerThe max number of pages to return. If the number is set to 0, all pages will be returned. If the number is set to a positive integer, at most max k pages will be returned.
Include Link Textinclude-link-textbooleanIndicate whether to scrape the link and include the text of the link associated with this page in the 'link-text' field
Include Link HTMLinclude-link-htmlbooleanIndicate whether to scrape the link and include the raw HTML of the link associated with this page in the 'link-html' field
OutputIDTypeDescription
Pagespagesarray[object]The scraped webpages

Last updated: 7/2/2024, 1:19:14 PM