The Instill Artifact component is a data component that allows users to manipulate and smart search files and data in the artifact store.
It can carry out the following tasks:
To use Artifact Component, you will need to set up the OpenAI API key for self-hosted deployment of Instill Core.
You can do this by setting the OPENAI_API_KEY
environment variable.
Please refer to configuring-the-embedding-feature
p.s. In Instill Cloud case, you do not need to set up the OpenAI API key.
#Release Stage
Alpha
#Configuration
The component definition and tasks are defined in the definition.json and tasks.json files respectively.
#Supported Tasks
#Upload File
Upload and process the files into chunks into Catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_UPLOAD_FILE |
Options (required) | options | object | Choose to upload the files to existing catalog or create a new catalog. |
The options
Object
Options
options
must fulfill one of the following schemas:
Existing Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID that you input in the Catalog. |
File | file | string | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog. |
File Name | file-name | string | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "existing catalog" |
Create New Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID for new catalog you want to create. |
Description | description | string | Description of the catalog. |
File | file | string | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog. |
File Name | file-name | string | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "create new catalog" |
Tags | tags | array | Tags for the catalog. |
Output | ID | Type | Description |
---|
File | file | object | Result of uploading file into catalog. |
Status | status | boolean | The status of trigger file processing, if succeeded, return true. |
Output Objects in Upload File
File
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
#Upload Files
Upload and process the files into chunks into Catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_UPLOAD_FILES |
Options (required) | options | object | Choose to upload the files to existing catalog or create a new catalog. |
The options
Object
Options
options
must fulfill one of the following schemas:
Existing Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID that you input in the Catalog. |
File Names | file-names | array | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Files | files | array | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "existing catalog" |
Create New Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID for new catalog you want to create. |
Description | description | string | Description of the catalog. |
File Names | file-names | array | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Files | files | array | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "create new catalog" |
Tags | tags | array | Tags for the catalog. |
Output | ID | Type | Description |
---|
Files | files | array[object] | Files metadata in catalog. |
Status | status | boolean | The status of trigger file processing, if ALL succeeded, return true. |
Output Objects in Upload Files
Files
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
get the metadata of the files in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_GET_FILES_METADATA |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Output | ID | Type | Description |
---|
Files | files | array[object] | Files metadata in catalog. |
Output Objects in Get Files Metadata
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
get the metadata of the chunks from a file in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_GET_CHUNKS_METADATA |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
File UID (required) | file-uid | string | The unique identifier of the file. |
Output | ID | Type | Description |
---|
Chunks | chunks | array[object] | Chunks metadata of the file in catalog. |
Output Objects in Get Chunks Metadata
Field | Field ID | Type | Note |
---|
Chunk UID | chunk-uid | string | The unique identifier of the chunk. |
Create Time | create-time | string | The creation time of the chunk in ISO 8601 format. |
End Position | end-position | integer | The end position of the chunk in the file. |
File UID | original-file-uid | string | The unique identifier of the file. |
Retrievable | retrievable | boolean | The retrievable status of the chunk. |
Start Position | start-position | integer | The start position of the chunk in the file. |
Token Count | token-count | integer | The token count of the chunk. |
#Get File in Markdown
get the file content in markdown format.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_GET_FILE_IN_MARKDOWN |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
File UID (required) | file-uid | string | The unique identifier of the file. |
Output | ID | Type | Description |
---|
File UID | original-file-uid | string | The unique identifier of the file. |
Content | content | string | The content of the file in markdown format. |
Create Time | create-time | string | The creation time of the source file in ISO 8601 format. |
Update Time | update-time | string | The update time of the source file in ISO 8601 format. |
#Match File Status
Check if the specified file's processing status is done.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_MATCH_FILE_STATUS |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to check files' processing status in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
File UID (required) | file-uid | string | The unique identifier of the file. |
Output | ID | Type | Description |
---|
Status | succeeded | boolean | The status of the file processing, if succeeded, return true. |
#Retrieve
search the chunks in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_RETRIEVE |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Text Prompt (required) | text-prompt | string | The prompt string to search the chunks. |
Top K | top-k | integer | The number of top chunks to return. The range is from 1~20, and default is 5. |
Output | ID | Type | Description |
---|
Chunks | chunks | array[object] | Chunks data from smart search. |
Output Objects in Retrieve
Chunks
Field | Field ID | Type | Note |
---|
Chunk UID | chunk-uid | string | The unique identifier of the chunk. |
Similarity | similarity-score | number | The similarity score of the chunk. |
Source File Name | source-file-name | string | The name of the source file. |
Text Content | text-content | string | The text content of the chunk. |
#Ask
Reply the questions based on the files in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_ASK |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Question (required) | question | string | The question to reply. |
Top K | top-k | integer | The number of top answers to return. The range is from 1~20, and default is 5. |
Output | ID | Type | Description |
---|
Answer | answer | string | Answers data from smart search. |
Chunks (optional) | chunks | array[object] | Chunks data to answer question. |
Output Objects in Ask
Chunks
Field | Field ID | Type | Note |
---|
Chunk UID | chunk-uid | string | The unique identifier of the chunk. |
Similarity | similarity-score | number | The similarity score of the chunk. |
Source File Name | source-file-name | string | The name of the source file. |
Text Content | text-content | string | The text content of the chunk. |
#Example Recipes
Recipe for the Ask your Catalog pipeline.
catalog-id: ${variable.catalog_name}
namespace: ${variable.namespace}
question: ${variable.question}
description: The name of your catalog i.e. "instill-ai"
description: The namespace of your catalog i.e. "instill-ai"
description: The question to ask your catalog i.e. "What is Instill AI doing?", "What is Artifact?"
value: ${artifact-0.output.answer}
Sync files from Google Drive to Instill Catalog.
shared-link: ${variable.folder-link}
refresh-token: ${secret.refresh-token-gd}
namespace: ${variable.namespace}
catalog-id: ${variable.catalog}
third-party-files: ${read-folder.output.files}