Text

The Text component is an operator that allows users to extract and manipulate text from different sources. It can carry out the following tasks:

#Release Stage

Alpha

#Configuration

The component configuration is defined and maintained here.

#Supported Tasks

#Convert To Text

Convert document to text.

InputIDTypeDescription
Task ID (required)taskstringTASK_CONVERT_TO_TEXT
Document (required)docstringBase64 encoded document (PDF, DOC, DOCX, XML, HTML, RTF, etc.) to be converted to plain text
OutputIDTypeDescription
BodybodystringPlain text converted from the document
MetametaobjectMetadata extracted from the document
MSecsmsecsnumberTime taken to convert the document
ErrorerrorstringError message if any during the conversion process

#Split By Token

Split text by token.

InputIDTypeDescription
Task ID (required)taskstringTASK_SPLIT_BY_TOKEN
Text (required)textstringText to be split
Model (required)modelstringID of the model to use for tokenization
Chunk Token Sizechunk_token_sizeintegerNumber of tokens per text chunk
OutputIDTypeDescription
Token Counttoken_countintegerTotal count of tokens in the input text
Text Chunkstext_chunksarray[string]Text chunks after splitting
Number of Text Chunkschunk_numintegerTotal number of output text chunks

Last updated: 4/5/2024, 1:22:27 PM