Universal Speech Access API architecture

The Universal Speech Access API (USAI) enables you to implement MRCP services and resources using boards within the NaturalAccess environment. Applications use USAI to stream voice data from boards over RTP streams to recognizer engines and synthesizer engines on separate servers. Since the host processes no voice traffic, USAI improves the platform's bus and host processing capacity. In addition, voice activity detection (VAD) and pre-speech buffers on NaturalAccess boards reduce traffic to the ASR engines and decrease the number of required ASR ports.

NaturalAccess provides APIs for call control, system configuration, DTMF detection and tone generation, and other functions. The following table lists some of the NaturalAccess APIs that USAI applications use:

This API...	Provides...
ADI	DTMF detection and tone generation
NCC	PSTN call control
MSPP	RTP endpoint control
USAI	Universal Speech Access API speech recognition and speech synthesis

Example

The following example shows how the application processes a PSTN call and requests speech resources with USAI in the NaturalAccess development environment:

Step	Action
1	The telephony gateway accepts the call (using the NCC API) and connects the PSTN channel to a local stream.
2	The application requests a speech resource (ASR or TTS) from an MRCP server using USAI functions saiCreateRecognizer or saiCreateSynthesizer. When the speech resource is created, the MRCP server returns the created speech resource ID and the voice over IP (VoIP) port it uses to receive and transmit data.
3	The telephony gateway receives the information, creates an RTP endpoint (using MSPP API functions), and connects the endpoint to the call.
4	The application manages the speech resource with USAI functions. For example, the application can perform speech recognition or synthesis tasks, add or modify grammars, or get recognition results.

The following illustration provides an overview of the Universal Speech Access API architecture: