In this tutorial, we will build our first unstructured data pipeline with VDP, which allows you to process your data for a specific AI task with minimum effort.
As we demonstrated in VDP 101 - Introduction, VDP aims to streamline the end-to-end unstructured data pipeline. An unstructured data pipeline is defined by a recipe which consists of three components:
source- where the pipeline starts to ingest unstructured data to be processed.
model instances- deployed AI models to process the ingested unstructured data and generate meaningful outputs.
destination- where to send the processed outputs.
The following tutorial will demonstrate how to set up these three components and build a
SYNC pipeline for Object Detection with YOLOv7 via Console.
Before diving into the details, please ensure you have launched VDP on your machine. If not, you can set it up following the tutorial VDP 101 [2/7] Launch VDP on your local machine.
After launching VDP, you can access the Console via http://localhost:3000. If this is your first time setting up VDP, you should see the onboarding page. Enter your email, and you are all set!
#Step 1: Create a new pipeline
Click on the Pipeline page on the left sidebar. Since we have not yet set up any pipeline, this page will be empty. To create our first pipeline via VDP Console, click the Set up your first pipeline button.
#Step 2: Add an HTTP source
VDP currently supports two sources, HTTP and gRPC. An HTTP source accepts HTTP requests with payloads to be processed by a pipeline.
Check our growing list of Source Connectors.
To set up a Source Connector,
- click the Pipeline mode ▾ drop-down and choose
- click the Source type ▾ drop-down and choose
- click Next.
#Step 3: Import and deploy a model from GitHub
To fulfil objective detection tasks, we import a model from our public GitHub repository instill-ai/model-yolov7-dvc.
To set it up,
- give your model a unique ID,
- add a description (optional),
- click the Model source ▾ drop-down and choose
- fill in the GitHub repository URL
- click Set up to fetch the AI model to VDP.
VDP will fetch all the releases of the GitHub repository. Each release is converted into one Model Instance, using the release tag as the corresponding model instance ID.
A Model Instance represents a tagged snapshot of a Model. A model may have multiple model instances. The tag of a model instance depends on the model source and what versioning control tool the model source uses.
Check out the documentation.
ff9c78e (chore: update links)
Once the model is imported, we can choose one model instance to deploy. For simplicity,
- click the Model instances ▾ drop-down,
v1.0-cputo deploy on CPU or
v1.0-gputo deploy on GPU, and
- click Deploy to deploy it on VDP.
#Step 4: Add an HTTP destination
Since we are building a
SYNC pipeline, the
HTTP destination is paired automatically with the
HTTP source as we set up for the source. Click Next.
When creating pipelines under
sync mode, source and destination connectors in VDP must be the same, which means:
- HTTP source → HTTP destination
- gRPC source → gRPC destination
#Step 5: Set up the pipeline
We are almost there! We have created
Destination. The last step is to give this pipeline an ID, and we are ready to go! Just
- give your pipeline a unique ID
- add a description (optional), and
- click Set up.
🎉 CONGRATULATIONS! You have your first VDP pipeline setup. You should see it on the Pipeline page.
You can find further details about this pipeline by clicking on the one you just created. The green light indicates the pipeline is
Active and can be triggered via sending HTTP requests.
Check out the documentation to understand all the pipeline states.
You may notice the REST request examples to trigger the pipeline in the Trigger section at the bottom of the page. No worry about this now. You will learn how to trigger this pipeline in the following tutorial → VDP 101 [4/7] How to trigger a SYNC pipeline?.
↓↓↓ VDP 101 - Get familiar with the basics ↓↓↓