Universal Speech Access API structures

The Universal Speech Access API defines the following structures for specifying synthesizer and recognizer properties:

SAI_PROSODY
SAI_RTP_ENDPOINT
SAI_SPEECH_LANGUAGE
SAI_VENDOR_SPECIFIC
SAI_VOICE

SAI_PROSODY

SAI_PROSODY defines speaker voice qualities such as pitch, duration, and volume when invoking saiTtsGetProsody and saiTtsSetProsody:

typedef struct
{
INT32             pitch;
INT32             contour;
INT32             range;
INT32             rate;
INT32             duration;
INT32             volume;
}
SAI_PROSODY;

The SAI_PROSODY structure contains the following parameters:

Parameter	Description
pitch	Baseline pitch for the spoken text.
contour	Actual pitch contour for the spoken text.
range	Pitch range (variability) for the spoken text.
rate	Speaking rate for the spoken text.
duration	Value in seconds or milliseconds that sets the amount of time to take to read the text.
volume	Value from 0 through 100 that sets the volume for the spoken text.

Parameter values must follow the W3C Speech Synthesis Markup Language specification (SSML). For more information, refer to the SSML specification.

Before setting these voice elements, the application must retrieve the current values using saiTtsGetProsody. The application can then modify prosody parameters in the SAI_PROSODY structure with saiTtsSetProsody.

SAI_RTP_ENDPOINT

SAI_RTP_ENDPOINT defines a Fusion RTP endpoint. Reference this structure when invoking saiCreateSynthesizer and saiCreateRecognizer:

typedef struct
{
DWORD                   port;
SAI_URL                 ipAddress;
}
SAI_RTP_ENDPOINT;

The SAI_RTP_ENDPOINT structure contains the following parameters:

Parameter	Description
port	RTP port number associated with a Fusion RTP endpoint.
ipAddress	IP address associated with a Fusion RTP endpoint.

SAI_SPEECH_LANGUAGE

SAI_SPEECH_LANGUAGE defines the recognizer or synthesizer language retrieved with saiAsrGetSpeechLanguage and saiTtsGetSpeechLanguage, or set with saiAsrSetSpeechLanguage and saiTtsSetSpeechLanguage:

typedef struct
{
SAI_SPEECH_LANGUAGE             language;
SAI_SPEECH_LANGUAGE_STRING      languageString;
}
SAI_SPEECH_LANGUAGE;

The SAI_SPEECH_LANGUAGE structure contains the following parameters:

Parameter	Description
pitch	Baseline pitch for the spoken text.
contour	Actual pitch contour for the spoken text.
range	Pitch range (variability) for the spoken text.
rate	Speaking rate for the spoken text.
duration	Value in seconds or milliseconds that sets the amount of time to take to read the text.
volume	Value from 0 through 100 that sets the volume for the spoken text.

The SAI_SPEECH_LANGUAGE parameter specifies a decimal value associated with a predefined language for the speech synthesizer to use. The SAI_SPEECH_LANGUAGE_STRING parameter specifies the character string associated with the language to use. The following table shows predefined language types that the speech synthesizer can use:

If the language is not predefined, the application must specify a user-defined language type. For more information about supported languages, refer to the speech vendor documentation.

SAI_VENDOR_SPECIFIC

Structure that defines vendor specific parameters.

typedef struct
{
char vendorPairName[SAI_MAX_SREINGLENGTH];
char vendorPairValue[SAI_MAX_SREINGLENGTH];
}SAI_VENDOR_SPECIFIC;

The SAI_VENDOR_SPECIFIC structure contains the following parameters:

Parameter	Description
vendorPairName	Keyword for a vendor-specific parameter.
vendorPairValue	Value for a vendor-specific parameter.

SAI_VOICE

SAI_VOICE defines part of the speech synthesizer voice profile used when invoking saiTtsGetVoice and saiTtsSetVoice:

typedef struct
{
    SAI_VOICE_GENDER            gender;
    INT32                       age;
    char                        variant[SAI_MAX_STRINGLENGTH];
    char                        name[SAI_MAX_STRINGLENGTH];
}
SAI_VOICE;

The SAI_VOICE structure contains the following parameters:

Parameter	Description
gender	Value of SAI_VOICE_GENDER constant. This can be neutral, female, or male.
age	Defines the age of the speaker’s voice in years.
variant	Defines a variant in the speaker’s voice according to the speech vendor's specification.
name	Defines a speaker’s voice pre-packaged name. Refer to the speech vendor documentation for a list of predefined voice packages.

Note: Parameter values must follow the W3C Speech Synthesis Markup Language specification (SSML). For more information, refer to the SSML specification.

Before setting voice profile parameters, the application must retrieve the current values with saiTtsGetVoice. The application can then modify voice profile parameters in the SAI_VOICE structure and invoke saiTtsSetVoice.