The Audio component is an operator component that allows users to operate audio data. It can carry out the following tasks:
#Release Stage
Alpha
#Configuration
The component definition and tasks are defined in the definition.json and tasks.json files respectively.
#Supported Tasks
#Detect Activity
Detect speech segments in audio data using Voice Activity Detection (VAD). This task processes the input audio to 16kHz mono format, identifies periods of human speech, and outputs time segments for each detected speech activity.
Input | ID | Type | Description |
---|---|---|---|
Task ID (required) | task | string | TASK_DETECT_ACTIVITY |
Audio (required) | audio | string | Audio file to analyze for speech activity. |
Minimum Silence Duration | min-silence-duration | integer | Minimum duration of silence (in milliseconds) required to split speech segments. Longer values result in fewer, longer segments. |
Speech Pad | speech-pad | integer | Additional padding (in milliseconds) added to the start and end of each detected speech segment to prevent cutting off speech. |
Output | ID | Type | Description |
---|---|---|---|
Segments | segments | array[object] | Array of time segments representing detected speech activity. Each segment contains start and end times in seconds. |
Output Objects in Detect Activity
Segments
Field | Field ID | Type | Note |
---|---|---|---|
End Time | end-time | number | The number of seconds from the beginning of the audio file to the end of this segment. |
Start Time | start-time | number | The number of seconds from the beginning of the audio file to the start of this segment. |
#Segment
Segment audio data into pieces based on the provided time segments.
Input | ID | Type | Description |
---|---|---|---|
Task ID (required) | task | string | TASK_SEGMENT |
Audio (required) | audio | string | Audio data to segment. |
Segments (required) | segments | array[object] | A list of time segments of audio data. |
Input Objects in Segment
Segments
A list of time segments of audio data.
Field | Field ID | Type | Note |
---|---|---|---|
End Time | end-time | number | The number of seconds from the beginning of the audio file to the end of this segment. |
Start Time | start-time | number | The number of seconds from the beginning of the audio file to the start of this segment. |
Output | ID | Type | Description |
---|---|---|---|
Audios | audio-segments | array[string] | A list of segmented audio data. |
#Example Recipes
Recipe for the Audio Transcription Generator pipeline.
version: v1betacomponent: audio-vad: type: audio input: audio: ${variable.audio} min-silence-duration: 300 speech-pad: 10 task: TASK_DETECT_ACTIVITY audio-segment: type: audio input: audio: ${variable.audio} segments: ${audio-vad.output.segments} task: TASK_SEGMENTvariable: audio: title: Audio to test description: Audio to test VAD and extraction format: audiooutput: samples: title: Output audio segments description: Output extracted audio segments value: ${audio-segment.output.audio-segments}