Pronunciation Lexicons

Caution

There is no support for PLS lexicons in reading systems at this time.

Summary

Pronunciation lexicons offer the promise of improved voicing of text.

Examples

Example 1 — A minimal lexicon file

<lexicon
  version="1.0"
  alphabet="x-sampa"
  xml:lang="en"
  xmlns="http://www.w3.org/2005/01/pronunciation-lexicon">
   <lexeme>
	  <grapheme>acetaminophen</grapheme>
	  <phoneme>@"sit@'mIn@f@n</phoneme>
   </lexeme>
</lexicon>

Example 2 — Handling regional variations in spelling

<lexeme>
   <grapheme>defence</grapheme>
   <grapheme>defense</grapheme>
   <phoneme>dI'fEns</phoneme>
</lexeme>

Example 3 — Handling alternate forms

<lexeme>
   <grapheme>vitæ</grapheme>
   <grapheme>vitae</grapheme>
   <phoneme>vitaI</phoneme>
</lexeme>

Example 4 — Including more than one phonetic spelling

<lexeme>
   <grapheme>defence</grapheme>
   <grapheme>defense</grapheme>
   <phoneme>dI'fEns</phoneme>
   <phoneme alphabet="ipa">dɪˈfɛns</phoneme>
</lexeme>

Example 5 — Replacing one term with another

<lexeme>
   <grapheme>50-50</grapheme>
   <phoneme>fifty fifty</phoneme>
</lexeme>

Example 6 — Adding a PLS lexicon to the manifest

<item
	  id="pls"
	  href="#EPUB/lexicon.pls"
	  media-type="application/pls+xml"/>

Example 7 — Linking PLS lexicons to a content document

<html … xml:lang="en">
   <head>
	  …
	  <link
		 rel="pronunciation"
		 href="#lex/en.pls"
		 type="application/pls+xml"
		 hreflang="en" />
	  <link
		 rel="pronunciation"
		 href="#lex/fr.pls"
		 type="application/pls+xml"
		 hreflang="fr" />
	  …
   </head>
   …
</html>

Explanation

PLS lexicons provide control over the text-to-speech (TTS) playback rendering on conforming reading systems. A lexicon file is like a dictionary or look-up guide, allowing the pronunciations defined in it to be used in place of the default rendering when matching words are encountered. Defining words in a lexicon ensures that users hear your work played back as expected, not based on the heuristics applied by the TTS engine on their reading system.

Each PLS lexicon is an XML file with a root lexicon element. Lexicons are comprised of one or more lexeme entries, each of which defines the word(s) to match in grapheme element(s) and the replacement pronunciation to use in a phoneme element. (See Example 1.)

The alias element can also be used to replace one word with another. (See Example 5.)

The language of the lexicon and the phonetic alphabet used must both be defined on the root lexicon element.

PLS entries should be created for any complex word that is important to the publication and that a TTS engine is likely to mispronounce. The list includes, but is not limited to, proper names and nouns, technical, scientific and legal terms, and complex compound words. The default rendering for heteronyms can also be defined in a PLS lexicon so that only variations need to be handled by SSML tagging.

Note that PLS lexicons are not activated simply by being included in the EPUB container. You must reference the applicable lexicon(s) from each content document in order for them to be applied to the content. The hreflang attribute should also always be set to the language of the referenced PLS file. (See Example 6.)

Multiple lexicons can be attached to a content document to handle embedded foreign languages. (See Example 7.)

Localizations are not possible within a single PLS lexicon file, but you can attach multiple lexicons to voice words differently for different regions. (See the faq question below for more information.)

Frequently Asked Questions

Should I use IPA or X-SAMPA or something else to write my pronunciations?

Although IPA is arguably the most widely recognized phonetic alphabet, that does not mean that it has full support even in existing synthetic speech engines. Some engines support only their own alphabets, for example. IPA is also less developer-friendly than X-SAMPA because it uses Unicode characters that require modifying most keyboard layouts to input, whereas X-SAMPA is ASCII-based. Internal workflows should be a determining factor at this time. The ultimate answer will depend on what engines are employed in reading systems.

Note that it is possible to translate one alphabet representation to the other, so work in either alphabet shouldn't ever be lost if there does turn out to be a clear winner and loser.

Are lexicons case sensitive?

The need to be able to define case-sensitive pronunciations is clear, but how PLS lexicons are processed less so. The specification itself says nothing about case sensitivity of graphemes, with only a requirement for case-sensitive processing defined in an informative appendix. Until reading systems that support PLS lexicons appear, any answer is speculative, but assume case sensitivity because of the critical role it plays.

Note that you should also consider that certain terms will appear both in lower case and title case in a publication without changing the pronunciation, and add grapheme elements for both cases:

<lexeme>
   <grapheme>acetaminophen</grapheme>
   <grapheme>Acetaminophen</grapheme>
   <phoneme>@"sit@'mIn@f@n</phoneme>
</lexeme>

When case conflicts occur, use SSML in the markup to correct the pronunciation of the less common term. For example, both spellings mobile and Mobile may refer to human mobility in a document that studies age-related health issues in Mobile, Alabama. Defining the pronunciation of Mobile as ˈmoʊbaɪl will cause the city name to be mispronounced (and likewise the other way around).

Are there any dangers in mixing languages?

Yes, if the rendering engine does not support voicing the specified language, the user may get an error or the text may be silently skipped. Error handling in such situations cannot be guaranteed. Language-specific lexicons will typically not be loaded.

Can I add localizations?

Not within a single PLS file. The phoneme element does not allow an xml:lang to be attached to it. Multiple localized lexicons could be attached to a content document that only specifies the stem language code, so that the user's localization preference setting can be used to determine the proper lexicon to apply (e.g., the content document specifies it is en and the lexicons specify en-US and en-GB).

Care should be taken not to exclude users by specifying localizations. If a reading system does not include a voice that can handle the localizations, the lexicon will not be loaded.

A better solution is to define one lexicon for all reading systems that can handle the region-independent language. If the publication is written in US English, for example, it would be better to use the default en code for the standard pronunciation lexicon and specify a locale only for targeted regions:

<html … xml:lang="en">
   <head>
	  …
	  <link
		 rel="pronunciation"
		 href="#lex/en.pls"
		 type="application/pls+xml"
		 hreflang="en" />
	  <link
		 rel="pronunciation"
		 href="#lex/en-GB.pls"
		 type="application/pls+xml"
		 hreflang="en-GB" />
	  …
   </head>
   …
</html>

This way any user with an English-language reading system will at least hear the correct US pronunciations.

Should I use PLS lexicons or SSML?

The inclusion of the technologies in EPUB 3 was not to require a choice to be made; the technologies are meant to complement each other. PLS lexicons allow you to define a word once and have the TTS engine do the work of replacing it each time it occurs in the prose. SSML, on the other hand, provides the fine-grained control that is just not possible in a lexicon, at the price of having to tag each instance of a term that has to be replaced.

Summary

Examples

Explanation

Frequently Asked Questions

Related Links