#About
💧 Instill VDP (Versatile Data Pipeline) is a powerful tool designed to build end-to-end unstructured data pipelines. It leverages the capabilities of 3rd-party data, AI, and applications functionalities and seamlessly connect with ⚗️ Instill Model and 💾 Instill Artifact via ⚙️ Instill Component components.
#Pipeline Recipe
A pipeline is defined by a recipe
, which is essentially a JSON object composed
of a run method, variable, output and multiple components:
Here is a sample recipe where the input is a prompt
, and the output is a
text
.
{ "on": {}, "variable": { "prompt": { "title": "input", "instillFormat": "string" } }, "output": { "text": { "title": "text", "value": "${op-0.output.texts[0]}" } }, "component": { "op-0": { "type": "openai", "task": "TASK_TEXT_GENERATION", "input": { "model": "gpt-3.5-turbo", "n": 2, "prompt": "${variable.prompt}", "response-format": { "type": "text" }, "system-message": "You are a helpful assistant.", "temperature": 1, "top-p": 1 }, "setup": { "api-key": "${secret.my-openai-key}" } } }}
Within the recipe, we have four essential fields, on
, variable
, output
and
component
.
#On
Indicates how the pipeline is run.
#Run on Request
VDP supports running a pipeline via HTTP and gRPC protocols in SYNC
or ASYNC
mode. The user doesn't need to set up the configuration in the on
field. Every
pipeline can run if the user provides variable
in the request.
-
SYNC Mode: Responds to a request synchronously, sending back the result to the user immediately after the data is processed. This mode is suitable for real-time inference scenarios where low latency is crucial.
-
ASYNC Mode: Performs asynchronous workload. The user runs the pipeline with an asynchronous request and only receives an acknowledgment response. This mode is ideal for use cases that require long-running workloads.
Please refer to the Run Pipeline page for more details.
#Run on Scheduler (Coming Soon)
A pipeline run by a scheduler performs scheduled workloads to regularly run the pipeline.
#Run on Event (Coming Soon)
A pipeline run by a Pub-Sub or Message Queue service.
#Run on Application (Coming Soon)
A pipeline run by an application event.
#Variable
Set up the variables of the pipeline so every component can reference the data from the variables:
title
: The title of the variable, which can be displayed on the Console.description
: A description of the variable, which can be displayed on the Console.instillFormat
: The instill format of the variable. Users can use this to control the data format. Please refer toInstill Format
for more details.
#Output
Set up ths output of the pipeline:
title
: The title of the output, which can be displayed on the Console.description
: A description of the output, which can be displayed on the Console.value
: The value of the output, which can be referenced from any component data.
#Component
Consists of multiple components, which can be a generic, AI, data, application, or operator component.
- Each component must have a unique component ID, e.g.,
openai-0
. This component ID should follow the RFC1034 rule, which consists of alphabets, numbers, and hyphens. - For a
AI
,data
,application
oroperator
, we need to set up these fields:- type: Indicates which type of component it is.
- task: Indicates which task of the component. The task will be listed in
the component definition
tasks
field. - input: Sets up the input data. The schema is described in the
spec.componentSpecification
field. - setup: Defines how the component is setup, e.g. configure an
api-key
for the connection. The schema is described in thespec.componentSpecification
field. - condition: (optional) Sets up the condition for whether this component will be executed or not.
- For a
generic.iterator
, we need to set up these fields:- input: The array input.
- outputElements: Sets up the output of this iterator.
- component: An array of components that are executed inside the iterator.
#Data Flow
#Reference from Data
Within pipeline components, we use a reference syntax to set up the data flow, i.e., the input field of each component:
- A Reference employs special syntax enclosed in single curly brackets. For
example, you can
- Reference data from the pipeline variable:
${variable.KEY}
, you can add any fields you want in therecipe.variable
. - Reference data from the component:
${comp-id.input.KEY}
or${comp-id.output.KEY}
, the available data fields are described in thecomponentSpecification
of each component.
- Reference data from the pipeline variable:
- It functions as a variable reference, copying the value from an upstream component to the data input while preserving the original data type.
When run by request, we also need to set up the response data of the request.
You can add any fields you want in the recipe.output
.
If you don't want to use data reference, you can also set up a constant value in the recipe.
When utilizing batch running in 💧 Instill VDP, where each component processes an array of inputs, the rendered data input for this component, with batch 2 as an example, appears as follows:
{ "inputs": [ { "prompt": "What is Instill AI building?" }, { "prompt": "What is Instill Core?" } ]}
Please refer to the API document for instructions on how to run a pipeline.
#Reference from Secrets
Besides reference data from the pipeline input or component settings, we can also
reference secrets from the secret management system. The value of these secret
fields will be kept out of the recipe. So when the pipeline is shared or
published, the keys will always be protected. Examples: "api-key": "${secrets.my-openai-key}"
. Please refer to Secret Management
for more details.
#Control Flow
Within pipeline components, control flow is facilitated through setting the condition in each component, which serves as the means to specify the condition determining whether this component will be executed or not.
Example Configuration
Let's begin with an example.
{ "condition": "${variable.a-condition-str} == \"TARGET_CONDITION_STR\""}
This condition
field allows us to define the condition setting. 💧 Instill
VDP will interpret and use this configuration to decide whether this component
will be executed or not. We have two types of conditions: "condition on value"
and "condition on status".
#Condition on Value
We support the following syntax for the condition:
- Logic
&&
||
- Comparison
<
>
<=
>=
==
!=
- Not
!
- Parentheses
()
Examples:
Here are some examples for the condition field:
-
Condition on string value
{"condition": "${variable.a-condition-str} == \"TARGET_CONDITION_STR\""} -
Condition on number value
{"condition": "${variable.a-condition-num} > 1"} -
Condition on boolean value
{"condition": "${variable.a-condition-bool}"} -
Complex condition
{"condition": "(${variable.a-condition-bool} && ${comp-a.output.x} == 1) || ${comp-b.output.y} < 1"} -
Always false
{"condition": "false"}
#Condition on Status
Besides Condition on Value, we also support Condition on
Status. We provide a boolean value called
status.completed
, which allows you to condition on whether a component's
execution is completed or not.
Examples:
Here are some examples for the condition field:
-
The component will be executed after component
comp-a
is completed; you can use this to control the execution order of components.{"condition": "${comp-a.status.completed}"} -
You can also combine it with the value condition.
{"condition": "${variable.a-condition-bool} || ${comp-a.status.completed}"}