Instill Artifact

The Instill Artifact component is a data component that allows users to manipulate and smart search files and data in the artifact store. It can carry out the following tasks:

To use Artifact Component, you will need to set up the OpenAI API key for self-hosted deployment of Instill Core. You can do this by setting the OPENAI_API_KEY environment variable. Please refer to configuring-the-embedding-feature p.s. In Instill Cloud case, you do not need to set up the OpenAI API key.

#Release Stage

Alpha

#Configuration

The component definition and tasks are defined in the definition.json and tasks.json files respectively.

#Supported Tasks

#Upload File

Upload and process the files into chunks into Catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_UPLOAD_FILE
Options (required)optionsobjectChoose to upload the files to existing catalog or create a new catalog.
The options Object

Options

options must fulfill one of the following schemas:

Existing Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID that you input in the Catalog.
FilefilestringBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog.
File Namefile-namestringName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "existing catalog"
Create New Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID for new catalog you want to create.
DescriptiondescriptionstringDescription of the catalog.
FilefilestringBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog.
File Namefile-namestringName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "create new catalog"
TagstagsarrayTags for the catalog.
OutputIDTypeDescription
FilefileobjectResult of uploading file into catalog.
StatusstatusbooleanThe status of trigger file processing, if succeeded, return true.
Output Objects in Upload File

File

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

#Upload Files

Upload and process the files into chunks into Catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_UPLOAD_FILES
Options (required)optionsobjectChoose to upload the files to existing catalog or create a new catalog.
The options Object

Options

options must fulfill one of the following schemas:

Existing Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID that you input in the Catalog.
File Namesfile-namesarrayName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
FilesfilesarrayBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "existing catalog"
Create New Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID for new catalog you want to create.
DescriptiondescriptionstringDescription of the catalog.
File Namesfile-namesarrayName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
FilesfilesarrayBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "create new catalog"
TagstagsarrayTags for the catalog.
OutputIDTypeDescription
Filesfilesarray[object]Files metadata in catalog.
StatusstatusbooleanThe status of trigger file processing, if ALL succeeded, return true.
Output Objects in Upload Files

Files

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

#Get Files Metadata

get the metadata of the files in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_GET_FILES_METADATA
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
OutputIDTypeDescription
Filesfilesarray[object]Files metadata in catalog.
Output Objects in Get Files Metadata

Files

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

#Get Chunks Metadata

get the metadata of the chunks from a file in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_GET_CHUNKS_METADATA
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
File UID (required)file-uidstringThe unique identifier of the file.
OutputIDTypeDescription
Chunkschunksarray[object]Chunks metadata of the file in catalog.
Output Objects in Get Chunks Metadata

Chunks

FieldField IDTypeNote
Chunk UIDchunk-uidstringThe unique identifier of the chunk.
Create Timecreate-timestringThe creation time of the chunk in ISO 8601 format.
End Positionend-positionintegerThe end position of the chunk in the file.
File UIDoriginal-file-uidstringThe unique identifier of the file.
RetrievableretrievablebooleanThe retrievable status of the chunk.
Start Positionstart-positionintegerThe start position of the chunk in the file.
Token Counttoken-countintegerThe token count of the chunk.

#Get File in Markdown

get the file content in markdown format.

InputIDTypeDescription
Task ID (required)taskstringTASK_GET_FILE_IN_MARKDOWN
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
File UID (required)file-uidstringThe unique identifier of the file.
OutputIDTypeDescription
File UIDoriginal-file-uidstringThe unique identifier of the file.
ContentcontentstringThe content of the file in markdown format.
Create Timecreate-timestringThe creation time of the source file in ISO 8601 format.
Update Timeupdate-timestringThe update time of the source file in ISO 8601 format.

#Match File Status

Check if the specified file's processing status is done.

InputIDTypeDescription
Task ID (required)taskstringTASK_MATCH_FILE_STATUS
Catalog ID (required)catalog-idstringCatalog ID that you input to check files' processing status in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
File UID (required)file-uidstringThe unique identifier of the file.
OutputIDTypeDescription
StatussucceededbooleanThe status of the file processing, if succeeded, return true.

#Retrieve

search the chunks in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_RETRIEVE
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
Text Prompt (required)text-promptstringThe prompt string to search the chunks.
Top Ktop-kintegerThe number of top chunks to return. The range is from 1~20, and default is 5.
OutputIDTypeDescription
Chunkschunksarray[object]Chunks data from smart search.
Output Objects in Retrieve

Chunks

FieldField IDTypeNote
Chunk UIDchunk-uidstringThe unique identifier of the chunk.
Similaritysimilarity-scorenumberThe similarity score of the chunk.
Source File Namesource-file-namestringThe name of the source file.
Text Contenttext-contentstringThe text content of the chunk.

#Ask

Reply the questions based on the files in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_ASK
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
Question (required)questionstringThe question to reply.
Top Ktop-kintegerThe number of top answers to return. The range is from 1~20, and default is 5.
OutputIDTypeDescription
AnsweranswerstringAnswers data from smart search.
Chunks (optional)chunksarray[object]Chunks data to answer question.
Output Objects in Ask

Chunks

FieldField IDTypeNote
Chunk UIDchunk-uidstringThe unique identifier of the chunk.
Similaritysimilarity-scorenumberThe similarity score of the chunk.
Source File Namesource-file-namestringThe name of the source file.
Text Contenttext-contentstringThe text content of the chunk.

#Example Recipes

Recipe for the Ask your Catalog pipeline.


version: v1beta
component:
artifact-0:
type: instill-artifact
task: TASK_ASK
input:
catalog-id: ${variable.catalog_name}
namespace: ${variable.namespace}
question: ${variable.question}
top-k: 5
variable:
catalog_name:
title: catalog-name
description: The name of your catalog i.e. "instill-ai"
format: string
namespace:
title: namespace
description: The namespace of your catalog i.e. "instill-ai"
format: string
question:
title: question
description: The question to ask your catalog i.e. "What is Instill AI doing?", "What is Artifact?"
format: string
output:
answer:
title: answer
value: ${artifact-0.output.answer}

Sync files from Google Drive to Instill Catalog.


# VDP Version
version: v1beta
variable:
namespace:
title: Namespace
format: string
catalog:
title: Catalog
format: string
folder-link:
title: Folder Link
format: string
component:
read-folder:
type: google-drive
input:
shared-link: ${variable.folder-link}
read-content: true
setup:
refresh-token: ${secret.refresh-token-gd}
task: TASK_READ_FOLDER
sync:
type: instill-artifact
input:
namespace: ${variable.namespace}
catalog-id: ${variable.catalog}
third-party-files: ${read-folder.output.files}
task: TASK_SYNC_FILES
output:
sync-result:
title: Sync Result
value: ${sync.output}