Caution
There is no support for CSS Speech properties in reading systems at this time.
Summary
CSS Speech properties offer the promise of improved aural rendering.
Examples
Explanation
The CSS Speech module provides additional text-to-speech (TTS) enhancement functionality. Unlike PLS lexicons and SSML markup, the Speech module properties are not focused on defining the correct pronunciation of words.
The primary property the CSS Speech module adds for enhancing TTS playback is speak-as
.
This property provides the ability to control whether the TTS engine will read each character
(setting to spell-out
) or number (digit
) in a string out.
(See Example 1 and Example 2.) TTS engines often use
unreliable tests based on the apparent wordiness of acronyms to determine whether to voice them, but
this property allows you to override that behavior.
The speak-as
property also takes the complimentary values
literal-punctuation
and no-punctuation
. The values, as expected,
control whether the TTS engine will voice punctuation.
The module also includes the speak
property, which provides the ability to control TTS
rendering of content, regardless of whether the containing element is visible or not. Setting the
none
value disables rendering on an element, and setting the normal
value enables.
The following properties are focused on non-prosodic aspects of TTS playback.
-
pause
-
The
pause
property controls the amount of pause that occurs before and after the element that it is applied to. Pauses are typically used to identify transitions between major structures, such as between paragraphs and when new sections are beginning. TTS engines use punctuation to provide pauses within the flow of the narrative.The value of the
pause
property is a time value indicating the pause length. If only a single value is specified:pause: 50ms
that time is applied both before and after the associated element.
You can individually control the time to pause before and after by including a second time value:
pause: 50ms 0ms
The amount of pause specified occurs before any aural
cue
andrest
at the start of the associated element, and after anyrest
andcue
at the end of the element. -
cue
-
The
cue
property provides the ability to uniquely identify elements with an aural sound. Cues are helpful in distinguishing new headings, for example, as pauses alone are not a good indicator.Note that the cue property will render the associated audio clip both before and after the heading if only a single value is specified:
cue: url('audio/ping.mp3');
Users typically only expect a cue to signal the start, so use the
null
value to disable cues after the associated element has been rendered:cue: url('audio/ping.mp3') null;
The aural cue occurs between any
pause
andrest
at the start of the associated element, and between anyrest
andcue
at the end of the element. -
rest
-
The
rest
property controls the pause that occurs between the any aural cues and the rendering of the associated element, both before and after.The value of the
rest
property is a time value indicating the pause length. If only a single value is specified:rest: 25ms
that time is applied both before and after the associated element.
You can individually control the time to pause before and after by including a second time value:
rest: 25ms 0ms
The amount of rest specified occurs after any
pause
andcue
at the start of the associated element, and before anycue
andpause
at the end of the element. -
voice-family
-
The
voice-family
property provides control over the gender and type of voice used for TTS playback, allowing content producers to create more realistic TTS playback (e.g., alternating gender to match the character).Although it's possible to name the voice to use:
voice-family: 'Dave';
in practice, with the wide variety of devices an EPUB may be played on, such specificity is only so useful as it requires knowing the names of all voices available on all devices.
Instead, it is better to request a voice using the pattern: age?, gender, integer? (where the question mark indicates the field is optional):
.king-lear { voice-family: old male 1; }
The age value may be
child
,young
orold
; the gendermale
,female
orneutral
; and, when specified, the integer indicates the ordinal position of the voice to use (i.e., when more than one matching voice is available).
Frequently Asked Questions
- Do I need to use the
-epub-
prefix on the properties? -
No. As the properties were first introduced before the specification was stable, they were required to use a prefix. The unprefixed versions are now valid to use.
- Can I force the TTS engine to say acronyms instead of spell them?
-
The Speech module does not provide a way to tell an engine it must voice a capitalized term. When including an acronym like EPUB, you would have to use a lexicon or attach an SSML pronunciation attribute to absolutely ensure that it does not get spelled out.
- Why do I need to control the voicing of punctuation?
-
Although most engines will voice significant pause points, such as colons, they will typically not render each punctuation point in a document as it would ruin the reading experience. There are times when it is critical to ensure that the user is able to hear all the punctuation in a sentence or phrase, such as in grammar textbooks, programming guides and the like. (See Example 3.)
Accessible technologies also enable the pronunciation of all punctuation by default in elements such as
pre
andcode
. Although the benefit of reading all punctuation in computer code should be obvious, it is not always the case that preformatted text needs to such detailed rendering. Applyingno-punctuation
to apre
block of text ensures that it will be read without punctuation being announced.
Related Links
- EPUB 3 TTS Enhancements — CSS speech
- CSS — Speech Module