Chunk Search

This page shows you how to conduct a smart search in a Catalog using a text prompt.

This API performs semantic search by embedding user queries with the same Indexing Embed VDP Pipeline that is used by the Process Files operation. The user query embedding is then compared to the embeddings of the chunks in the specified Catalog to find and return the most contextually similar chunks.

#Chunk Search via API

cURL
Python

export INSTILL_API_TOKEN=********
curl -X POST 'https://api.instill.tech/v1alpha/namespaces/{namespaceId}/catalogs/{catalogId}/chunks/retrieve' \
--header "Authorization: Bearer $INSTILL_API_TOKEN" \
--header "Content-Type: application/json" \
--data-raw '{
"textPrompt": "example text to search",
"topK": 5
}'

Note that the {namespaceId} and {catalogId} path parameters must be replaced by the Catalog owner's ID (namespace) and the identifier of the Catalog you are searching.

#Body Parameters

  • textPrompt (body): The text prompt to search for in the Catalog.
  • topK (body): Specifies the number of similar chunks to return.

#Example Response

A successful response will return a list of similar chunks found in the Catalog:


{
"similarChunks": [
{
"chunkUid": "chunk123",
"similarityScore": 0.95,
"textContent": "similar text found",
"sourceFile": "file123.txt"
},
{
"chunkUid": "chunk124",
"similarityScore": 0.90,
"textContent": "another similar text found",
"sourceFile": "file124.pdf"
}
]
}

#Output Description

  • similarChunks: An array of objects where each object represents a similar chunk found in the Catalog.
    • chunkUid (string): The unique identifier of the chunk.
    • similarityScore (number): The similarity score between the input text prompt and the chunk content. This score ranges between 0 and 1, where a higher score indicates higher similarity.
    • textContent (string): The content of the similar chunk.
    • sourceFile (string): The unique identifier of the source file from which the chunk was extracted.