The theme of this tutorial

May 29, 2024

Tutorial

Serving Custom Models in Instill Core

Learn how to create and deploy models locally with ๐Ÿ”ฎ Instill Core using the โš—๏ธ Instill Model MLOps platform.

#Unlock new Possibilities with โš—๏ธ Instill Model

โš—๏ธ Instill Model is a sophisticated MLOps/LLMOps platform specially designed to orchestrate model serving and monitoring to ensure consistent and reliable performance. It allows for efficient management and deployment of deep learning models for unstructured data ETL, and is used to deploy and serve models locally with ๐Ÿ”ฎ Instill Core, and on the cloud with โ˜๏ธ Instill Cloud.

#Why use โš—๏ธ Instill Model?

  1. Seamless Integration with ๐Ÿ’ง Instill VDP: integrate effortlessly with our Versatile Data Pipeline, allowing for streamlined unstructured data ETL and model serving workflows.
  2. No Code Console Builder: Easily utilize custom models defined with โš—๏ธ Instill Model as modular AI components via ๐Ÿ“บ Instill Console, allowing for seamless integration into downstream tasks.
  3. AutoML Feature (Coming Soon): With the upcoming AutoML feature, โš—๏ธ Instill Model will soon be capable of automating model training and tuning, simplifying model optimization for deployment.

This step-by-step tutorial will guide you through the process of setting up your own custom model with Instill Model for local deployment with ๐Ÿ”ฎ Instill Core.

#Prerequisites

  1. Please ensure that you have installed the latest version of the Python SDK by running:


    pip install instill-sdk

  2. Docker: Both ๐Ÿ”ฎ Instill Core and โš—๏ธ Instill Model use Docker to ensure that models and code can be deployed in consistent, isolated and reproducible environments. Please ensure that you have Docker installed and running by following the official instructions, and see our deployment guide for recommended resource settings.

  3. โŒจ๏ธ Instill CLI: The easiest way to launch ๐Ÿ”ฎ Instill Core for local deployment is via the โŒจ๏ธ Instill CLI. To install โŒจ๏ธ Instill CLI using Homebrew, please run the following command in your terminal:


    brew install instill-ai/tap/inst

  4. Launch ๐Ÿ”ฎ Instill Core: To launch simply run:


    inst local deploy

    Please note that the initial launch process may take up to 1 hour, depending on your internet speed. Subsequent launches will be much faster, usually completing in under 5 minutes.

  5. Now that ๐Ÿ”ฎ Instill Core has been deployed, you can access ๐Ÿ“บ Instill Console at http://localhost:3000. Please use the following initial login details to initiate the password reset process for onboarding:

    • Username: admin
    • Password: password

For further details about launching ๐Ÿ”ฎ Instill Core, we recommend that you refer to the deployment guide which covers alternative ways you can launch with Docker Compose or Kubernetes with Helm.

Finally, please note that this guide assumes you have a basic understanding of machine learning and can code in Python. If you are new to these concepts, we recommend that you look at our quickstart guide which introduces our no/low-code ๐Ÿ’ง Instill VDP pipeline builder, and also take a look at some of our other tutorials.

#Step-by-Step Tutorial

#Step 1: Create a Model Namespace

To get started, navigate to the Model page in the console window and click the + Create Model button.

This should bring up a configuration window (see image below) where you are able to configure your model settings. For a full description of the available fields, please refer to the Create Namespace page.

In this tutorial, we will be walking through how to create and deploy a version of the TinyLlama-1.1B-Chat model. To follow along, please fill in the configuration fields as per the image below.

Configure Model Settings
Configure Model Settings

You have now created an empty model namespace on โš—๏ธ Instill Model. In the next sections of this tutorial we will show you how to define your own custom model for deployment!

#Step 2: Create a Model Config

To prepare a model to be served with โš—๏ธ Instill Model, you first need to create your own model directory containing two files - model.py and instill.yaml. Within the Python SDK, you can run the following helper command to generate corresponding template files which we can modify:


instill init

To configure the TinyLlama-1.1B-Chat model, simply open the instill.yaml file and populate it with:

instill.yaml

build:
gpu: true
python_version: "3.11"
python_packages:
- torch==2.2.1
- transformers==4.36.2
- accelerate==0.25.0

This file specifies the dependencies required to run the model. We will be loading these libraries in the next stage where we define our model class!

#Step 3. Write a Model Script

In this step we will create the model.py file, which will contain the model class definition. This will be broken down in three phases to demonstrate the structure of the model class, and explain the methods it should implement.

#Define the Model Initialization

The first phase involves defining the model class and creating the __init__ constructor which is responsible for loading the model. Here we will use pipeline() from the transformers library to directly load in the TinyLlama-1.1B-Chat model.

model.py

import torch
from transformers import pipeline
from instill.helpers.ray_config import instill_deployment, InstillDeployable
@instill_deployment
class TinyLlama:
def __init__(self):
self.pipeline = pipeline(
"text-generation",
model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
torch_dtype=torch.bfloat16
)

The instill.helpers.ray_config package contains the decorators and deployment object for the model class, which we will use to convert the model class into a servable model. These are required to properly define a model class for โš—๏ธ Instill Model.

#Define the Model Metadata method

In the second phase, we define the ModelMetadata method which is responsible for communicating the models expected input and output shapes to the backend service. To easily facilitate this, we can make use of the Python SDK through the instill.helpers module which provides a number of functions that can be selected according to the AI Task the model performs.

Here, we recognise that the TinyLlama-1.1B-Chat model falls under the Text Generation Chat AI Task, and so we will make use of the construct_text_generation_chat_metadata_response helper function.

Please refer here for a full list of the supported AI Tasks.

model.py

# previous imports defined above
from instill.helpers import construct_text_generation_chat_metadata_response
@instill_deployment
class TinyLlama:
# __init__ defined in phase 1
def ModelMetadata(self, req):
return construct_text_generation_chat_metadata_response(req=req)

#Implement the Inference Method

In the third phase, we implement the inference method __call__, which handles the trigger request from โš—๏ธ Instill Model, contains the necessary logic to run the inference, and constructs the response. We use the StandardTaskIO module to parse the request payload into input parameters, and convert the model outputs to the appropriate response format.

The TextGenerationChatInput class from instill.helpers.const is used to define the input format for the Text Generation Chat AI Task, and the construct_text_generation_chat_infer_response function from instill.helpers is used to format the model output into the appropriate response format.

model.py

# previous imports defined above
from instill.helpers.const import TextGenerationChatInput
from instill.helpers.ray_io import StandardTaskIO
from instill.helpers import construct_text_generation_chat_infer_response
@instill_deployment
class TinyLlama:
# __init__ defined in phase 1
# ModelMetadata method defined in phase 2
async def __call__(self, request):
# parse the request and get the corresponding input for text-generation-chat task
task_text_generation_chat_input: TextGenerationChatInput = (
StandardTaskIO.parse_task_text_generation_chat_input(request=request)
)
# prepare prompt with chat template
conv = [
{
"role": "system",
"content": "You are a friendly chatbot",
},
{
"role": "user",
"content": task_text_generation_chat_input.prompt,
},
]
prompt = self.pipeline.tokenizer.apply_chat_template(
conv,
tokenize=False,
add_generation_prompt=True,
)
# inference
sequences = self.pipeline(
prompt,
max_new_tokens=task_text_generation_chat_input.max_new_tokens,
do_sample=True,
temperature=task_text_generation_chat_input.temperature,
top_k=task_text_generation_chat_input.top_k,
top_p=0.95,
)
# convert the model output into response output
task_text_generation_chat_output = (
StandardTaskIO.parse_task_text_generation_chat_output(sequences=sequences)
)
return construct_text_generation_chat_infer_response(
req=request,
# specify the output dimension
shape=[1, len(sequences)],
raw_outputs=[task_text_generation_chat_output],
)
# now simply declare a global entrypoint for deployment
entrypoint = InstillDeployable(TinyLlama).get_deployment_handle()

Putting it all together, your model.py file should now look like this:

model.py

import torch
from transformers import pipeline
from instill.helpers.ray_config import instill_deployment, InstillDeployable
from instill.helpers.const import TextGenerationChatInput
from instill.helpers.ray_io import StandardTaskIO
from instill.helpers import (
construct_text_generation_chat_metadata_response,
construct_text_generation_chat_infer_response,
)
@instill_deployment
class TinyLlama:
def __init__(self):
self.pipeline = pipeline(
"text-generation",
model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
torch_dtype=torch.bfloat16
)
def ModelMetadata(self, req):
return construct_text_generation_chat_metadata_response(req=req)
async def __call__(self, request):
# parse the request and get the corresponding input for text-generation-chat task
task_text_generation_chat_input: TextGenerationChatInput = (
StandardTaskIO.parse_task_text_generation_chat_input(request=request)
)
# prepare prompt with chat template
conv = [
{
"role": "system",
"content": "You are a friendly chatbot",
},
{
"role": "user",
"content": task_text_generation_chat_input.prompt,
},
]
prompt = self.pipeline.tokenizer.apply_chat_template(
conv,
tokenize=False,
add_generation_prompt=True,
)
# inference
sequences = self.pipeline(
prompt,
max_new_tokens=task_text_generation_chat_input.max_new_tokens,
do_sample=True,
temperature=task_text_generation_chat_input.temperature,
top_k=task_text_generation_chat_input.top_k,
top_p=0.95,
)
# convert the model output into response output
task_text_generation_chat_output = (
StandardTaskIO.parse_task_text_generation_chat_output(sequences=sequences)
)
return construct_text_generation_chat_infer_response(
req=request,
# specify the output dimension
shape=[1, len(sequences)],
raw_outputs=[task_text_generation_chat_output],
)
# now simply declare a global entrypoint for deployment
entrypoint = InstillDeployable(TinyLlama).get_deployment_handle()

Awesome ๐Ÿ˜Ž, you have now defined your own custom model class for model serving with โš—๏ธ Instill Model. In the next step, we will show you how to build and deploy this model locally with ๐Ÿ”ฎ Instill Core!

#Step 4: Build and Deploy the Model

First, you must ensure that you have the same Python version installed in your local environment as specified in the instill.yaml file in step 2, in this case python_version: "3.11".

#Build the Model Image

You can now build your model image by running the following command from within the directory containing the model.py and instill.yaml files:


instill build USER_ID/MODEL_ID -t v1

Importantly, you must replace USER_ID with admin, and replace MODEL_ID for tinyllama - the same Model ID that was specified in Step 1.

INFO

If you are building on a different architecture to the one you are deploying on, you must explicitly specify the target architecture using the --target-arch flag. For example, when building on an ARM machine and deploying to an AMD64 architecture, you must pass --target-arch amd64 when running instill build. If unspecified, the target architecture will default to that of the system you are building on.

This command will build the model image under version tag v1. Upon successful completion, you should see a similar output to the following:


2024-05-28 01:54:44,404.404 INFO [Instill Builder] admin/tinyllama:v1 built
2024-05-28 01:54:44,423.423 INFO [Instill Builder] Done

#Push the Model Image

To push the model image to the ๐Ÿ”ฎ Instill Core instance we will need to be able to login to the hosted Docker registry. To do this, you first need to create an API token by:

  1. Selecting the profile icon in the top right corner of the console window and choosing the Settings option.
  2. Select API Tokens from the left-hand menu.
  3. Click the Create Token button and give it a name, e.g. tutorial. Copy the generated API token.
Create API Token
Create API Token

Now we can login to the Docker register in the ๐Ÿ”ฎ Instill Core instance by running:


docker login localhost:8080

and entering the following credentials: - Username: admin - Password: API_TOKEN (replace this with the token you generated in the previous step)

Once logged in, we can push model image v1 to โš—๏ธ Instill Model with:


instill push USER_ID/MODEL_ID -t v1 -u localhost:8080

Again, you must remember replace USER_ID with admin, and replace MODEL_ID for tinyllama.

Upon successful completion, you should see a similar output to the following:


2024-05-23 23:05:03,484.484 INFO [Instill Builder] localhost:8080/admin/tinyllama:v1 pushed
2024-05-23 23:05:03,485.485 INFO [Instill Builder] Done

โš—๏ธ Instill Model will then automatically allocate the resources required by your model and deploy it. Please note that the deployment time varies based on the model size and hardware type.

#Status Check

To check the status of your deployed model version you can:

  1. Navigate back to the Models page on the Console.
  2. Select the admin/tinyllama model you created in Step 1.
  3. Click the Versions tab, where you will see the corresponding version ID or tag of your pushed model image and the Status of deployment. You should then see a similar screen to the image below.
3.
Check Model Status
3. Check Model Status

The Status will initially show as Starting, indicating that your model is offline and โš—๏ธ Instill Model is still in the process of allocating resources and deploying it (this may take a few minutes). Once this status changes to Active, your model is ready to serve requests. ๐Ÿš€

#Step 5. Inference

Once your model is deployed and Active, you can easily test its behaviour following these steps:

  1. Navigating to the Overview tab for your admin/tinyllama model.
  2. Enter a prompt in the Input pane (e.g. What is a rainbow?)
  3. Scroll down and hit Run to trigger the model inference.
4.
Model Inference
4. Model Inference

You should see a response generated by the TinyLlama-1.1B-Chat model in the Output pane.

To access model inferences via the API:

  1. Navigate to the API tab for your admin/tinyllama model.
  2. Follow the instructions by setting your own INSTILL_API_TOKEN as an environment variable:

    export INSTILL_API_TOKEN=********

    and using the provided curl command to send a request to the model endpoint:

    curl --location 'https://api.instill.tech/model/v1alpha/users/admin/models/tinyllama/trigger' \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $INSTILL_API_TOKEN" \
    --data '{
    "taskInputs": [
    {
    "textGeneration": {
    "prompt": "How is the weather today?",
    "chatHistory": [
    {
    "role": "user",
    "content": [
    {
    "type": "text",
    "text": "hi"
    }
    ]
    }
    ],
    "systemMessage": "you are a helpful assistant",
    "maxNewTokens": 1024,
    "topK": 5,
    "temperature": 0.7
    }
    }
    ]
    }'

#Step 6. Tear Everything Down

After you have finished testing and serving your model, you might want to tear down the local ๐Ÿ”ฎ Instill Core instance to free up system resources. You can do this using the โŒจ๏ธ Instill CLI command:


inst local undeploy

#Conclusion

Congratulations on successfully deploying and serving a custom model with โš—๏ธ Instill Model and ๐Ÿ”ฎ Instill Core! ๐ŸŽ‰

By following this tutorial, you've accomplished the following:

  1. Set up ๐Ÿ”ฎ Instill Core for local deployment.
  2. Created a model namespace and configured model settings.
  3. Defined and implemented a custom model class for the TinyLlama-1.1B-Chat model.
  4. Built and deployed your model image.
  5. Tested your model's inference capabilities via the Console and API.
  6. Undeployed the local ๐Ÿ”ฎ Instill Core instance.

โš—๏ธ Instill Model and ๐Ÿ”ฎ Instill Core together provide a powerful, streamlined solution for managing and deploying your own deep learning models.

Excitingly, you can now connect your own custom models via the โš—๏ธ Instill Model AI component to construct bespoke ๐Ÿ’ง Instill VDP pipelines tailored to your unstructured data ETL requirements. Please see the Create Pipeline page for more information on building ๐Ÿ’ง Instill VDP pipelines with ๐Ÿ“บ Instill Console.

Ultimately these tools allow you the freedom and creativity to develop and iterate innovative AI-powered workflows to solve your real-world use cases.

Thank you for following along with this tutorial and stay tuned for more updates soon! ๐Ÿš€

blurred spotbeam
line

AI infrastructure for Enterprise