<prosody>

Specifies the pitch, duration, speaking rate, and volume for the enclosed speech output.

Syntax

<prosody
    contour = "CDATA"
    duration = "CDATA"
    pitch = "CDATA"
    range = "CDATA"
    rate = "CDATA"
    volume = "CDATA"
/>

Attributes

Attribute

Data Type

Required?

Default

Description

contour

CDATA

no

NA

Pitch contour of the speech output, formatted as a value pair:

  • The first value is a percentage of the period of the contained text (a number followed by %).

  • The second value is the value of the pitch attribute.

duration

CDATA

no

NA

Time duration for reading the speech output, in seconds (s) or milliseconds (ms). For example, 5s or 3500ms.

pitch

CDATA

no

NA

Level and intensity of the speech output. Valid values:

  • A number followed by Hz.

  • A relative change, as compared to the default pitch.

  • One of the following values: high, low, medium, x-high, x-low, default.

A relative change is expressed using the plus sign (+) or minus sign (-), followed by Hz (hertz) or st (semitones). It can also be expressed as a percentage change preceded by an optional + or -. The TTS engine determines the default pitch.

range

CDATA

no

NA

Pitch range of the speech output. Valid values:

  • A number followed by Hz, where higher values increase the pitch range.

  • A relative change, as compared to the default range.

  • One of the following values: high, low, medium, x-high, x-low, default.

A relative change is expressed using the plus sign (+) or minus sign (-), followed by a number, followed by Hz (hertz) or st (semitones). It can also be expressed as a percentage change preceded by an optional + or -. The TTS engine determines the default range.

rate

CDATA

no

NA

Speaking rate of the speech output. Valid values:

  • A number followed by Hz.

  • A relative change, as compared to the default rate.

  • One of the following values: high, low, medium, x-high, x-low, default.

A relative change is expressed as a number that acts as a multiplier of the default rate. Thus, a value of 2 means the rate should be twice the default rate; a value of 0.5 means the rate should be half the default rate. The TTS engine determines the default rate.

volume

CDATA

no

NA

Volume of the speech output. Valid values:

  • A number between 0.0 and 100.0, where higher values are louder.

  • A relative change, as compared to the default volume.

  • One of the following values: x-loud, loud, medium, soft, x-soft, silent, default.

A relative change is expressed using the plus sign (+) or minus sign (-), followed by a number. It can also be expressed as a percentage change preceded by an optional + or -. The TTS engine determines the default volume.

Parents

<audio>, <emphasis>, <enumerate>, <p>, <prompt>, <prosody>, <s>, <voice>

Children

<audio>, <break>, <emphasis>, <enumerate>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <value>, <voice>

Example

<?xml version="1.0"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml">
    <form>
        <block>
            <prompt>
                The price of XYZ is
                <prosody volume="loud" rate="0.5">
                <say-as interpret-as="vxml:currency">$45</say-as>
                </prosody>
            </prompt>
        </block>
    </form>
</vxml>

See Also

<emphasis>, <say-as>, <voice>