The ability to synthetically voice a publication is an important accessibility feature that many users rely on, regardless of whether human narration is also provided (e.g., many users prefer the faster playback that TTS engines make possible).
While basic playback is possible so long as a reading system includes TTS technology, or access to a similarly-enabled assistive technology, any complexity in the vocabulary used typically leads to mispronunciations by synthetic speech engines without enhancement.
There are three technologies that enable content authors to enhance the quality of TTS playback:
- PLS lexicons
The Pronunciation Lexicon Specification defines an XML format for defining globally-applicable pronunciations. When words are encountered in the prose that match the defined entries, the provided pronunciation is used in place of the engine's default rendering. Lexicons provide a simple way to define pronunciations for words whose meanings do not change based on context.
- SSML markup
The Synthetic Speech Markup Language (SSML) allows pronunciations to be embedded directly in the markup. When SSML attributes are encountered on elements, the provided pronunciation is used in place of either the engine's default rendering or a PLS entry. SSML can be used to define all pronunciations, but is better used as a compliment to PLS lexicons (e.g., to disambiguate heteronyms and ambiguous number forms).
- CSS3 Speech properties
The CSS3 Speech modules includes a grab-bag of properties that can be used to control playback. From providing control over the spelling out of words and numbers to inserting aural cues and pauses, these properties allow control of playback beyond the traditional enhancement of pronunciation.
Note, however, that none of these technologies is supported in reading systems. The W3C Web Accessibility Initiative (WAI) currently has a Pronunciation task force looking into the issues of supporting improved speech support. These pages will be updated if new direction comes out of that work.
The EPUB Samples Project contains the following publications that implement enhanced TTS functionality:
- EPUB 3 — PLS Documents
- Pronunciation Lexicon Specification (PLS) Version 1.0
- EPUB 3 — SSML Attributes
- Speech Synthesis Markup Language (SSML) Version 1.1
- EPUB 3 — CSS 3.0 Speech
- CSS3 Speech Module (working draft referenced by EPUB 3)
- Computer-coding the IPA: a proposed extension of SAMPA (X-SAMPA)
- Reproduction of The International Phonetic Alphabet (Revised to 2005)