The Google Search data connector allows users to leverage the Google Search engine.
The up-to-date configuration is maintained here
|api_key*||string||API Key for the Google Custom Search API. You can create one here|
|cse_id*||string||ID of the Search Engine to use. Before using the Custom Search JSON API you will first need to create and configure your Programmable Search Engine. If you have not already created a Programmable Search Engine, you can start by visiting the Programmable Search Engine control panel. You can find this in the URL of your Search Engine. For example, if the URL of your search engine is https://cse.google.com/cse.js?cx=012345678910, the ID value is: 012345678910"|
To create a Google Search connector effortlessly, follow these steps:
- Go to the Resource page and click Add Resource
- Select Google Search
- Fill in a unique ID for the resource. Optionally, give a short description in the Description field
- Fill in the required fields.
When using the Google Search connector in your pipeline, certain fields of the input data need to be filled to trigger the pipeline effectively. Here are the expected input and output data fields for the connector.
task is set to TASK_SEARCH
|query*||string||The search query for Google.|
|top_k||int||The number of results to return for each query. Default to |
|include_link_text||boolean||Indicate whether to scrape the link and include the text of the link associated with this search result in the 'link_text' field". Default to |
|include_link_html||boolean||Indicate whether to scrape the link and include the raw HTML of the link associated with this search result in the 'link_html' field". Default to |
|results||array[object]||The returned search results from Google. Each result include the following fields: |
- title: The title of a search result, in plain text;
- link: The full URL to which the search result is pointing, e.g., http://www.example.com/foo/bar
- snippet: The snippet from the page associated with this search result, in plain text
- link_text: The scraped text of the link associated with this search result, in plain text
- link_html: The scraped raw HTML of the link associated with this search result
Example input and output data for each task: