Detecting voice activity

AG and CG Series boards provide a voice activity detector that suppresses user voice silence during dialogues with a voice recognition system. By preventing silent data from being sent to the application for ASR processing, host processing resources can be conserved.

The voice activity detector provides the following features:

Feature	Description
Voice activity detection	Detects audio energy and triggers data transmission only when speech is present.
Pre-speech buffering	When the voice activity detector detects speech, the board runtime immediately sends the previously filled buffer to the host, reducing the problem of clipped speech.
Voice event signaling	Sends SPEECH_BEGIN and SPEECH_END messages, and noise and signal energy to the host application.
Recorded stream control	Pauses and resumes sending recorded data to the runtime, while keeping the voice activity detector algorithm active on the DSP.

The voice activity detector enables host application control of voice activity detection features, including:

Start (with new or default parameters) and stop voice activity detection.
Update voice activity detection parameters on the fly.
Enable and disable voice activity detection signaling.
Pause and resume the recorded stream from the board to the host.

The voice activity detector has a fixed delta threshold that allows it to adapt to the background noise level. When the voice level is higher than the background noise level by a specified delta, the detector sends a SPEECH_BEGIN event to the application. When the voice level falls below the background noise level, the detector sends a SPEECH_END event.

Use the voice activity detector with any ASR application that is recording with one of the following encoding formats:

ADI_ENCODE_MULAW
ADI_ENCODE_ALAW
ADI_ENCODE_PCM8M16

Voice activity detection does not interfere with other existing capabilities such as DTMF detection and echo cancellation.

Configuring boards for voice detection

To configure the system for voice activity detection, edit the board keyword file as follows:

For these boards...	Add this DSP file...	To this keyword...
AG	rvoice_vad.m54	DSP.C5x[x].Files[y]
CG	rvoice_vad.f54	DSP.C5x[x].Files

For example:

DSP.C5x[1..31].Files = dtmf rvoice_vad

Configure dynamic buffer allocation on the board to prevent host underruns.

You can also configure CG boards for voice activity detection by defining a DSP resource pool and specifying rvoice_vad as the resource definition:

Resource[0].Definitions = (dtmf.det_all & rvoice_vad.rec_alaw & rvoice_vad.rec.play_alaw)

Using voice activity detection

Voice activity detection and voice activity detection messaging are disabled by default. To enable voice activity detection, call adiCommandRecord on an actively running ADI recording function (such as adiRecordAsync). ADIEVN_RECORD_STARTED must be received before calling adiCommandRecord.

You can perform the following functions using adiCommandRecord:

Enable and disable voice activity detection
Configure voice activity detection with application parameters
Enable and disable voice activity detection messaging
Pause and resume voice streaming from the board to the host