Detecting voice activity

AG and CG Series boards provide a voice activity detector that suppresses user voice silence during dialogues with a voice recognition system. By preventing silent data from being sent to the application for ASR processing, host processing resources can be conserved.

The voice activity detector provides the following features:

Feature

Description

Voice activity detection

Detects audio energy and triggers data transmission only when speech is present.

Pre-speech buffering

When the voice activity detector detects speech, the board runtime immediately sends the previously filled buffer to the host, reducing the problem of clipped speech.

Voice event signaling

Sends SPEECH_BEGIN and SPEECH_END messages, and noise and signal energy to the host application.

Recorded stream control

Pauses and resumes sending recorded data to the runtime, while keeping the voice activity detector algorithm active on the DSP.

The voice activity detector enables host application control of voice activity detection features, including:

The voice activity detector has a fixed delta threshold that allows it to adapt to the background noise level. When the voice level is higher than the background noise level by a specified delta, the detector sends a SPEECH_BEGIN event to the application. When the voice level falls below the background noise level, the detector sends a SPEECH_END event.

Use the voice activity detector with any ASR application that is recording with one of the following encoding formats:

Voice activity detection does not interfere with other existing capabilities such as DTMF detection and echo cancellation.

Configuring boards for voice detection

To configure the system for voice activity detection, edit the board keyword file as follows:

For these boards...

Add this DSP file...

To this keyword...

AG

rvoice_vad.m54

DSP.C5x[x].Files[y]

CG

rvoice_vad.f54

DSP.C5x[x].Files

For example:

DSP.C5x[1..31].Files = dtmf   rvoice_vad

Configure dynamic buffer allocation on the board to prevent host underruns.

You can also configure CG boards for voice activity detection by defining a DSP resource pool and specifying rvoice_vad as the resource definition:

Resource[0].Definitions = (dtmf.det_all & rvoice_vad.rec_alaw & rvoice_vad.rec.play_alaw)

Using voice activity detection

Voice activity detection and voice activity detection messaging are disabled by default. To enable voice activity detection, call adiCommandRecord on an actively running ADI recording function (such as adiRecordAsync). ADIEVN_RECORD_STARTED must be received before calling adiCommandRecord.

You can perform the following functions using adiCommandRecord: