Note
There is no support for the EPUB SSML attributes in reading systems at this time.
Summary
The EPUB SSML attributes offer the promise of improved voicing of text.
Examples
Explanation
SSML — the Speech Synthesis Markup Language — provides a way for content creators to enhance the default synthetic speech rendering of their publications at the markup level. The liberal use of SSML ensures that anyone listening to your work via TTS playback hears the prose as intended, not based on the best guess of their rendering engine.
The phoneme
element from SSML has been implemented in EPUB 3 as a pair of attributes for
defining pronunciations at the markup level:
- The
ssml:alphabet
attribute is used to set the default phonetic alphabet. - The
ssml:ph
attribute is used to define the pronunciation for any element with text content or for which a phonetic pronunciation can be associated (e.g., an empty element whose voicing is derived from an attached attribute).
(Support for the full SSML specification is not available in EPUB 3.)
Unlike PLS lexicons, SSML provides fine-grained control over pronunciation at the markup level. SSML can be used to override a default pronunciation for heteronyms, to correctly pronounce complex word and number forms, etc.
To use the SSML attributes, you must first declare the SSML namespace. The declaration is typically
made once per document on the root html
element. (See Example
1.)
A default alphabet is also typically defined once on the root html
element, as it is
rare to need to switch phonetic alphabets within any single document. Adding the
ssml:alphabet
attribute to the root ensures that all instance of the
ssml:ph
attribute have an in-scope alphabet defined. It is an error to define a
pronunciation in an ssml:ph
attribute without an in-scope alphabet, and will result in
rendering errors. (See Example 1.)
When an ssml:ph
attribute is encountered, it's value is passed to the text-to-speech
(TTS) engine in place of the element's content, providing the lowest-level override. The
pronunciation of SSML attributes also takes precedence over PLS lexicon entries, ensuring that
heteronyms and other exceptions to the rule can be properly handled.
Note that the value of the ssml:ph
attribute entirely replaces the content of the
element that it is attached to, including all descendant elements. The attribute should not be
attached to a p
tag to define the pronunciation of one word contained in the paragraph,
for example, as only that one word will be read in place of the entire paragraph. The use of
span
elements is recommended when no markup exists on the word(s) that need a
pronunciation attached.
The SSML attributes are not valid on SVG or MathML content, but are valid on any XHTML content that can be embedded in those grammars.
Frequently Asked Questions
- Should I use IPA or X-SAMPA or something else to write my pronunciations?
-
Although IPA is arguably the most widely recognized phonetic alphabet, that does not mean that it has full support even in existing synthetic speech engines. Some engines support only their own alphabets, for example. IPA is also less developer-friendly than X-SAMPA because it uses Unicode characters that require modifying most keyboard layouts to input, whereas X-SAMPA is ASCII-based. Internal workflows should be a determining factor at this time. The ultimate answer will depend on what engines are employed in reading systems.
Note that it is possible to translate one alphabet representation to the other, so work in either alphabet shouldn't ever be
lost
if there does turn out to be a clear winner and loser. - Should I use PLS lexicons or SSML?
-
The inclusion of the technologies in EPUB 3 was not to require a choice to be made; the technologies are meant to complement each other. PLS lexicons allow you to define a word once and have the TTS engine do the work of replacing it each time it occurs in the prose. SSML, on the other hand, provides the fine-grained control that is just not possible in a lexicon, at the price of having to tag each instance of a term that has to be replaced.
It is possible to use SSML exclusively, but it is costly in terms of production time and can excessively bloat the size of your content files depending on how many unique terms have to be handled and how often they occur.