There is no support for PLS lexicons in reading systems at this time.
Pronunciation lexicons offer the promise of improved voicing of text.
PLS lexicons provide control over the text-to-speech (TTS) playback rendering on conforming reading systems. A lexicon file is like a dictionary or look-up guide, allowing the pronunciations defined in it to be used in place of the default rendering when matching words are encountered. Defining words in a lexicon ensures that users hear your work played back as expected, not based on the heuristics applied by the TTS engine on their reading system.
Each PLS lexicon is an XML file with a root
lexicon element. Lexicons are comprised of
one or more
lexeme entries, each of which defines the word(s) to match in
grapheme element(s) and the replacement pronunciation to use in a
phoneme element. (See Example 1.)
alias element can also be used to replace one word with another. (See Example 5.)
The language of the lexicon and the phonetic alphabet used must both be defined on the root
PLS entries should be created for any complex word that is important to the publication and that a TTS engine is likely to mispronounce. The list includes, but is not limited to, proper names and nouns, technical, scientific and legal terms, and complex compound words. The default rendering for heteronyms can also be defined in a PLS lexicon so that only variations need to be handled by SSML tagging.
Note that PLS lexicons are not activated simply by being included in the EPUB container. You must
reference the applicable lexicon(s) from each content document in order for them to be applied to
the content. The
hreflang attribute should also always be set to the language of the
referenced PLS file. (See Example 6.)
Multiple lexicons can be attached to a content document to handle embedded foreign languages. (See Example 7.)
Localizations are not possible within a single PLS lexicon file, but you can attach multiple lexicons to voice words differently for different regions. (See the faq question below for more information.)
Frequently Asked Questions
- Should I use IPA or X-SAMPA or something else to write my pronunciations?
Although IPA is arguably the most widely recognized phonetic alphabet, that does not mean that it has full support even in existing synthetic speech engines. Some engines support only their own alphabets, for example. IPA is also less developer-friendly than X-SAMPA because it uses Unicode characters that require modifying most keyboard layouts to input, whereas X-SAMPA is ASCII-based. Internal workflows should be a determining factor at this time. The ultimate answer will depend on what engines are employed in reading systems.
Note that it is possible to translate one alphabet representation to the other, so work in either alphabet shouldn't ever be
lostif there does turn out to be a clear winner and loser.
- Are lexicons case sensitive?
The need to be able to define case-sensitive pronunciations is clear, but how PLS lexicons are processed less so. The specification itself says nothing about case sensitivity of graphemes, with only a requirement for case-sensitive processing defined in an informative appendix. Until reading systems that support PLS lexicons appear, any answer is speculative, but assume case sensitivity because of the critical role it plays.
Note that you should also consider that certain terms will appear both in lower case and title case in a publication without changing the pronunciation, and add
graphemeelements for both cases:
<lexeme> <grapheme>acetaminophen</grapheme> <grapheme>Acetaminophen</grapheme> <phoneme>@"sit@'mIn@f@n</phoneme> </lexeme>
When case conflicts occur, use SSML in the markup to correct the pronunciation of the less common term. For example, both spellings
Mobilemay refer to human mobility in a document that studies age-related health issues in Mobile, Alabama. Defining the pronunciation of
ˈmoʊbaɪlwill cause the city name to be mispronounced (and likewise the other way around).
- Are there any dangers in mixing languages?
Yes, if the rendering engine does not support voicing the specified language, the user may get an error or the text may be silently skipped. Error handling in such situations cannot be guaranteed. Language-specific lexicons will typically not be loaded.
- Can I add localizations?
Not within a single PLS file. The
phonemeelement does not allow an
xml:langto be attached to it. Multiple localized lexicons could be attached to a content document that only specifies the stem language code, so that the user's localization preference setting can be used to determine the proper lexicon to apply (e.g., the content document specifies it is
enand the lexicons specify
Care should be taken not to exclude users by specifying localizations. If a reading system does not include a voice that can handle the localizations, the lexicon will not be loaded.
A better solution is to define one lexicon for all reading systems that can handle the region-independent language. If the publication is written in US English, for example, it would be better to use the default
encode for the standard pronunciation lexicon and specify a locale only for targeted regions:
<html … xml:lang="en"> <head> … <link rel="pronunciation" href="#lex/en.pls" type="application/pls+xml" hreflang="en" /> <link rel="pronunciation" href="#lex/en-GB.pls" type="application/pls+xml" hreflang="en-GB" /> … </head> … </html>
This way any user with an English-language reading system will at least hear the correct US pronunciations.
- Should I use PLS lexicons or SSML?
The inclusion of the technologies in EPUB 3 was not to require a choice to be made; the technologies are meant to complement each other. PLS lexicons allow you to define a word once and have the TTS engine do the work of replacing it each time it occurs in the prose. SSML, on the other hand, provides the fine-grained control that is just not possible in a lexicon, at the price of having to tag each instance of a term that has to be replaced.