Website

The Website component is a data connector that allows users to scrape websites. It can carry out the following tasks:

#Release Stage

Alpha

#Configuration

The component configuration is defined and maintained here.

#Supported Tasks

#Scrape Website

Scrape the website contents.

InputIDTypeDescription
Task ID (required)taskstringTASK_SCRAPE_WEBSITE
Query (required)target_urlstringThe root URL to scrape. All links on this page will be scraped, and all links on those pages, and so on.
Allowed Domainsallowed_domainsarray[string]A list of domains that are allowed to be scraped. If empty, all domains are allowed.
Max Number of Pages (required)max_kintegerThe max number of pages to return. If the number is set to 0, all pages will be returned. If the number is set to a positive integer, at most max k pages will be returned.
Include Link Textinclude_link_textbooleanIndicate whether to scrape the link and include the text of the link associated with this page in the 'link_text' field
Include Link HTMLinclude_link_htmlbooleanIndicate whether to scrape the link and include the raw HTML of the link associated with this page in the 'link_html' field
OutputIDTypeDescription
Pagespagesarray[object]The scraped webpages

Last updated: 4/29/2024, 5:53:52 AM