The Universal Speech Access API (USAI) enables you to implement MRCP services and resources using boards within the NaturalAccess environment. Applications use USAI to stream voice data from boards over RTP streams to recognizer engines and synthesizer engines on separate servers. Since the host processes no voice traffic, USAI improves the platform's bus and host processing capacity. In addition, voice activity detection (VAD) and pre-speech buffers on NaturalAccess boards reduce traffic to the ASR engines and decrease the number of required ASR ports.
NaturalAccess provides APIs for call control, system configuration, DTMF detection and tone generation, and other functions. The following table lists some of the NaturalAccess APIs that USAI applications use:
|
This API... |
Provides... |
|
ADI |
DTMF detection and tone generation |
|
NCC |
PSTN call control |
|
MSPP |
RTP endpoint control |
|
USAI |
Universal Speech Access API speech recognition and speech synthesis |
The following example shows how the application processes a PSTN call and requests speech resources with USAI in the NaturalAccess development environment:
|
Step |
Action |
|
1 |
The telephony gateway accepts the call (using the NCC API) and connects the PSTN channel to a local stream. |
|
2 |
The application requests a speech resource (ASR or TTS) from an MRCP server using USAI functions saiCreateRecognizer or saiCreateSynthesizer. When the speech resource is created, the MRCP server returns the created speech resource ID and the voice over IP (VoIP) port it uses to receive and transmit data. |
|
3 |
The telephony gateway receives the information, creates an RTP endpoint (using MSPP API functions), and connects the endpoint to the call. |
|
4 |
The application manages the speech resource with USAI functions. For example, the application can perform speech recognition or synthesis tasks, add or modify grammars, or get recognition results. |
The following illustration provides an overview of the Universal Speech Access API architecture:
