The Universal Speech Access API defines the following structures for specifying synthesizer and recognizer properties:
SAI_PROSODY defines speaker voice qualities such as pitch, duration, and volume when invoking saiTtsGetProsody and saiTtsSetProsody:
typedef struct
{
INT32 pitch;
INT32 contour;
INT32 range;
INT32 rate;
INT32 duration;
INT32 volume;
}
SAI_PROSODY;
The SAI_PROSODY structure contains the following parameters:
|
Parameter |
Description |
|
pitch |
Baseline pitch for the spoken text. |
|
contour |
Actual pitch contour for the spoken text. |
|
range |
Pitch range (variability) for the spoken text. |
|
rate |
Speaking rate for the spoken text. |
|
duration |
Value in seconds or milliseconds that sets the amount of time to take to read the text. |
|
volume |
Value from 0 through 100 that sets the volume for the spoken text. |
Parameter values must follow the W3C Speech Synthesis Markup Language specification (SSML). For more information, refer to the SSML specification.
Before setting these voice elements, the application must retrieve the current values using saiTtsGetProsody. The application can then modify prosody parameters in the SAI_PROSODY structure with saiTtsSetProsody.
SAI_RTP_ENDPOINT defines a Fusion RTP endpoint. Reference this structure when invoking saiCreateSynthesizer and saiCreateRecognizer:
typedef struct
{
DWORD port;
SAI_URL ipAddress;
}
SAI_RTP_ENDPOINT;
The SAI_RTP_ENDPOINT structure contains the following parameters:
|
Parameter |
Description |
|
port |
RTP port number associated with a Fusion RTP endpoint. |
|
ipAddress |
IP address associated with a Fusion RTP endpoint. |
SAI_SPEECH_LANGUAGE defines the recognizer or synthesizer language retrieved with saiAsrGetSpeechLanguage and saiTtsGetSpeechLanguage, or set with saiAsrSetSpeechLanguage and saiTtsSetSpeechLanguage:
typedef struct
{
SAI_SPEECH_LANGUAGE language;
SAI_SPEECH_LANGUAGE_STRING languageString;
}
SAI_SPEECH_LANGUAGE;
The SAI_SPEECH_LANGUAGE structure contains the following parameters:
|
Parameter |
Description |
|
pitch |
Baseline pitch for the spoken text. |
|
contour |
Actual pitch contour for the spoken text. |
|
range |
Pitch range (variability) for the spoken text. |
|
rate |
Speaking rate for the spoken text. |
|
duration |
Value in seconds or milliseconds that sets the amount of time to take to read the text. |
|
volume |
Value from 0 through 100 that sets the volume for the spoken text. |
The SAI_SPEECH_LANGUAGE parameter specifies a decimal value associated with a predefined language for the speech synthesizer to use. The SAI_SPEECH_LANGUAGE_STRING parameter specifies the character string associated with the language to use. The following table shows predefined language types that the speech synthesizer can use:
If the language is not predefined, the application must specify a user-defined language type. For more information about supported languages, refer to the speech vendor documentation.
Structure that defines vendor specific parameters.
typedef struct
{
char vendorPairName[SAI_MAX_SREINGLENGTH];
char vendorPairValue[SAI_MAX_SREINGLENGTH];
}SAI_VENDOR_SPECIFIC;
The SAI_VENDOR_SPECIFIC structure contains the following parameters:
|
Parameter |
Description |
|
vendorPairName |
Keyword for a vendor-specific parameter. |
|
vendorPairValue |
Value for a vendor-specific parameter. |
SAI_VOICE defines part of the speech synthesizer voice profile used when invoking saiTtsGetVoice and saiTtsSetVoice:
typedef struct
{
SAI_VOICE_GENDER gender;
INT32 age;
char variant[SAI_MAX_STRINGLENGTH];
char name[SAI_MAX_STRINGLENGTH];
}
SAI_VOICE;
The SAI_VOICE structure contains the following parameters:
|
Parameter |
Description |
|
gender |
Value of SAI_VOICE_GENDER constant. This can be neutral, female, or male. |
|
age |
Defines the age of the speaker’s voice in years. |
|
variant |
Defines a variant in the speaker’s voice according to the speech vendor's specification. |
|
name |
Defines a speaker’s voice pre-packaged name. Refer to the speech vendor documentation for a list of predefined voice packages. |
Note: Parameter values must follow the W3C Speech Synthesis Markup Language specification (SSML). For more information, refer to the SSML specification.
Before setting voice profile parameters, the application must retrieve the current values with saiTtsGetVoice. The application can then modify voice profile parameters in the SAI_VOICE structure and invoke saiTtsSetVoice.