Import Models from GitHub

The GitHub model definition allows you to import models from a GitHub repository. Your model files can be tracked with Git LFS or DVC.

#Feature

Currently, VDP supports importing models from

  • ✅ Public GitHub repository
  • 🚧 Private GitHub repository (coming soon!)

#Release stage

Alpha

#Configuration

FieldTypeNote
repository*stringName of a public GitHub repository, e.g., instill-ai/model-yolov4

#Getting started

#Requirements

  • A public GitHub repository where model files are stored
  • The repository has at lease one release

#Prepare a GitHub repository and track large model files by Git LFS

GitHub limits the size of files (max 100 MB) allowed in repositories. But the size of the model files can be large. To track large model files beyond the limit, you can use Git LFS.

Assume Git LFS is installed, this guideline publishes model files in a repository on GitHub.

INFO

In general, importing models via this approach is not recommended. GitHub has limited quotas of storage and bandwidth for Git LFS files, and the usage will count against the repository owner's quotas leading you to purchase more when reaching the cap.

Instead, consider using GitHub DVC or ArtiVC approaches.

Step 1: Create a GitHub repository

Go to GitHub and create a new public repository and set up on the command line


# Create a folder
mkdir model-yolov4
cd model-yolov4
# Set a new remote
git init
git branch -M main
git remote add origin https://github.com/user/repo.git

Replace https://github.com/user/repo.git with your repository's remote URL.

Step 2: Download sample model data

Having initialized the project, let's download the sample model files


# Download sample model
curl -o yolov4-onnx-cpu.zip https://artifacts.instill.tech/vdp/sample-models/yolov4-onnx-cpu.zip
tar -xvf yolov4-onnx-cpu.zip
rm yolov4-onnx-cpu.zip

The extracted model files should look like:


├── README.md
├── post
│   ├── 1
│   │   ├── labels.py
│   │   ├── model.py
│   │   └── yolov4_anchors.txt
│   └── config.pbtxt
├── pre
│   ├── 1
│   │   └── model.py
│   └── config.pbtxt
├── yolov4
│   ├── 1
│   │   └── .keep
│   └── config.pbtxt
└── yolov4-infer
├── 1
│   └── model.onnx // <--- large model file
└── config.pbtxt

In this case, we use the object detection model YOLOv4 as sample data. Among all model files, the size of yolov-infer/1/model.onnx is 257 MB that beyonds the GitHub file uploading limit.

Step 3: Track large files with Git LFS

To associate a file type with Git LFS, enter git lfs track followed by the name of the file extension.


# Install Git LFS
git lfs install
# Associate onnx files to Git LFS
git lfs track "*.onnx"
# List the currently tracked paths
git lfs track
# Output
Listing tracked patterns
*.onnx (.gitattributes)
Listing excluded patterns

This commands amends the repository's .gitattributes file and associates every .onnx files with Git LFS.


*.onnx filter=lfs diff=lfs merge=lfs -text

Then, let's push all the other files to GitHub as you normally would:


# Update remote
git add --all
git commit -m "feat: add model files"
# List Git LFS tracked paths
git lfs ls-files
# Output
1881fe9c50 * yolov4-infer/1/model.onnx
# Update remote
git push -u origin main

INFO

As the official GitHub Docs suggested, please commit the local .gitattributes file into your repository.

After uploading all files successfully, go to your GitHub repository. You should see all model files are uploaded with the .onnx file in Git LFS.

Show YOLOv4 tracked by Git LFS

Step 4: Create a GitHub release

GitHub releases mark specific points in the repository's history. They are based on Git tags and are deployable software iterations for share and re-use. When importing a model from a GitHub repository, VDP creates one model instances per release. So let's tag the current model.


git tag <tagname>
git push origin --tags

Go to your GitHub repository and follow the GitHub Docs to create a new release based on created tag above.

Create a new GitHub release

🎉 This repository is ready. Follow the setup guide and import the repository to VDP.

#Prepare a GitHub repository and manage large model files by DVC

Besides Git LFS, a good alternative is to use DVC within a Github repository.

By using DVC, you can be sure not to bloat your repositories with large volumes of data or huge models. These large docs-assets reside in the cloud or other remote storage locations. You will simply track their version info in Git.

—— From DVC doc

Supported DVC remote storage

  • ✅ Public Google Cloud Storage (GCS)

Assuming DVC is installed, this guideline publishes a repository on GitHub and uploads tracked large model files remotely with DVC.

Follow Step 1-2 of the Prepare a GitHub repository and track large model files by Git LFS guideline.

Step 1: Create a GitHub repository

Go to GitHub and create a new public repository and set up on the command line


# Create a folder
mkdir model-yolov4
cd model-yolov4
# Set a new remote
git init
git branch -M main
git remote add origin https://github.com/user/repo.git

Replace https://github.com/user/repo.git with your repository's remote URL.

Step 2: Download sample model data

Having initialized the project, let's download the sample model files


# Download sample model
curl -o yolov4-onnx-cpu.zip https://artifacts.instill.tech/vdp/sample-models/yolov4-onnx-cpu.zip
tar -xvf yolov4-onnx-cpu.zip
rm yolov4-onnx-cpu.zip

Step 3: Initialize DVC in the repository


dvc init

A few DVC internal directories and files are created. Let's track them with Git.


git status
# Output
...
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .dvc/.gitignore
new file: .dvc/config
new file: .dvcignore
git commit -m "chore: initialize DVC"

Step 4: Track large model files with DVC

Let's use dvc add to track the ONNX model file:


dvc add yolov4-infer/1/model.onnx
# Output
To track the changes with git, run:
git add yolov4-infer/1/model.onnx.dvc yolov4-infer/1/.gitignore
...

DVC stores the model file information in a .dvc metadata file and lists it in .gitignore. The .dvc file is a placeholder for the original large file.


cat yolov4-infer/1/model.onnx.dvc
# Output
outs:
- md5: 2e0eeb4de8da2a0663ae3eb4a0dabbce
size: 257470589
path: model.onnx

dvc add moves the large file into .dvc/cache:


.dvc/cache
└── 2e
└── 0eeb4de8da2a0663ae3eb4a0dabbce

The hash value of the ONNX file we just added (2e0eeb4...) determines the above cache path.

Follow the instruction and track these files with Git


git add yolov4-infer/1/model.onnx.dvc yolov4-infer/1/.gitignore
git commit -m "feat: add model file"

Step 5: Push the large model files to DVC remote storage

Currently, VDP supports fetching models from public Google Cloud Storage (GCS). Let's set up the remote storage location with a public GCS bucket:

INFO

Prepare the GCS bucket:

  • Create a storage bucket before adding DVC remote
  • Make sure to run gcloud auth application-default login or other ways to authenticate and access GCS.

# Create a new data remote
dvc remote add -d myremote gs://my-public-bucket/yolov4
# Record changes
git add .dvc/config
git commit -m "chore: set up dvc remote storage"

Instead of storing the DVC-tracked large files in the repository, we can store them remotely (usually with a cloud storage service) with dvc push.


dvc push

dvc push copies the local cached data to the remote storage we set up earlier. The remote bucket directory should look like:


.../yolov4
└── 2e
└── 0eeb4de8da2a0663ae3eb4a0dabbce

Let's push all files including dvc files to GitHub


git add --all
git commit -m "feat: add model files"
git push -u origin main

Step 6: Create a GitHub release

Follow Step 4 of the Prepare a GitHub repository and track large model files by Git LFS guideline, tag the current model and create a new release based on the tag.


git tag <tagname>
git push origin --tags

Go to your GitHub repository and follow the GitHub Docs to create a new release based on created tag above.

🎉 If you've followed the above steps to store the model in remote storage and version it within a GitHub repository using DVC, just run the setup guide below, VDP will import the model accordingly.

TIP

Use dvc pull to retrieve DVC-tracked files from remote storage. See here for more information.

#No-code setup

To import a model from GitHub in the Console, do the following:

  1. Go to the Model page and click Add new model
  2. In the Set Up New Model page, fill an ID for your model, this will be the unique identifier of this model
  3. Click the Model type ▾ drop-down and choose GitHub
  4. [Optional] Give a short description of your model in the Description field
  5. Fill the GitHub repository URL that stores the model files and click Setup new model
  6. Once the model is imported, click the Model instances ▾ drop-down, pick one model instance and click Deploy
  7. Now go to the Model page, the corresponding model instance should be online

No matter if you are using Git LFS or DVC in the repository, when the model is imported, each model instance of the model corresponds to one release tag of the GitHub repository.

#Low-code setup

  1. Send a HTTP request to the VDP model-backend to import a model from a GitHub repository.
cURL
Copy

curl -X POST http://localhost:8083/v1alpha/models -d '{
"id": "yolov4",
"model_definition": "model-definitions/github",
"configuration": {
"repository": "instill-ai/model-yolov4"
}
}'

  1. Deploy the v1.0-cpu model instance.
cURL
Copy

curl -X POST http://localhost:8083/v1alpha/models/yolov4/instances/v1.0-cpu:deploy

  1. Perform an inference to test the model
cURL(url)
cURL(base64)
cURL(multipart)
Copy

curl -X POST http://localhost:8083/v1alpha/models/yolov4/instances/v1.0-cpu:test -d '{
"inputs": [
{
"image_url": "https://artifacts.instill.tech/imgs/dog.jpg"
}
]
}'

in which http://localhost:8083 is the model-backend default URL.

#Limitations

Current implementation does not support real-time GitHub sync: after you import a model from a GitHub repository, new releases of this GitHub repository won't be synced as new model instances of this model in VDP.

Last updated: 8/28/2022, 8:42:32 PM