Audio

The Audio component is an operator component that allows users to operate audio data. It can carry out the following tasks:

#Release Stage

Alpha

#Configuration

The component definition and tasks are defined in the definition.json and tasks.json files respectively.

#Supported Tasks

#Detect Activity

Detect speech segments in audio data using Voice Activity Detection (VAD). This task processes the input audio to 16kHz mono format, identifies periods of human speech, and outputs time segments for each detected speech activity.

InputIDTypeDescription
Task ID (required)taskstringTASK_DETECT_ACTIVITY
Audio (required)audiostringAudio file to analyze for speech activity.
Minimum Silence Durationmin-silence-durationintegerMinimum duration of silence (in milliseconds) required to split speech segments. Longer values result in fewer, longer segments.
Speech Padspeech-padintegerAdditional padding (in milliseconds) added to the start and end of each detected speech segment to prevent cutting off speech.
OutputIDTypeDescription
Segmentssegmentsarray[object]Array of time segments representing detected speech activity. Each segment contains start and end times in seconds.
Output Objects in Detect Activity

Segments

FieldField IDTypeNote
End Timeend-timenumberThe number of seconds from the beginning of the audio file to the end of this segment.
Start Timestart-timenumberThe number of seconds from the beginning of the audio file to the start of this segment.

#Segment

Segment audio data into pieces based on the provided time segments.

InputIDTypeDescription
Task ID (required)taskstringTASK_SEGMENT
Audio (required)audiostringAudio data to segment.
Segments (required)segmentsarray[object]A list of time segments of audio data.
Input Objects in Segment

Segments

A list of time segments of audio data.

FieldField IDTypeNote
End Timeend-timenumberThe number of seconds from the beginning of the audio file to the end of this segment.
Start Timestart-timenumberThe number of seconds from the beginning of the audio file to the start of this segment.
OutputIDTypeDescription
Audiosaudio-segmentsarray[string]A list of segmented audio data.

#Example Recipes

Recipe for the Audio Transcription Generator pipeline.


version: v1beta
component:
audio-vad:
type: audio
input:
audio: ${variable.audio}
min-silence-duration: 300
speech-pad: 10
task: TASK_DETECT_ACTIVITY
audio-segment:
type: audio
input:
audio: ${variable.audio}
segments: ${audio-vad.output.segments}
task: TASK_SEGMENT
variable:
audio:
title: Audio to test
description: Audio to test VAD and extraction
format: audio
output:
samples:
title: Output audio segments
description: Output extracted audio segments
value: ${audio-segment.output.audio-segments}