Summary
Setting the language ensures that assistive technologies correctly interpret and render the text and that reading systems can make language enhancements available for users.
Techniques
-
Set the default language of the package document using an
xml:lang
attribute on thepackage
element. [[WCAG-3.1.1]] -
Set the
xml:lang
attribute on package document elements whenever the default language changes. [[WCAG-3.1.2]] -
Identify the primary languages of the publication in
dc:language
elements in package document's metadata section. [[WCAG-3.1.1]] -
Set the default language of EPUB content documents. [[WCAG-3.1.1]]
-
Identify changes in language in EPUB content documents. [[WCAG-3.1.2]]
-
Ensure language codes conform to BCP 47. [[WCAG-3.1.1]]
Example
Frequently Asked Questions
- Do I need to list every language used in the publication?
No, the
dc:language
elements should only list the primary languages of the content. If a publication contains a few phrases in a foreign language, for example, that language is not listed.
Explanation
Setting the language of a publication is an important step in ensuring its accessibility as it helps assistive technologies pronounce the text correctly. Without language declarations, assistive technologies will read the text in the default language of the user. This can lead to the entire text being mispronounced (when reading a publication in another language) or individual phrases being mangled (for inline foreign phrases).
This tutorial covers how to set the language in the EPUB package document as well as in XHTML and SVG content documents so that the information is available to assistive technologies and reading systems.
Language declaration mechanisms
With an understanding now of what language tags are, it is time to turn to how to express those tags in markup languages.
In XML-based markup languages, like XHTML, SVG and the EPUB package document, the standard
mechanism for declaring the language of the text is the xml:lang
attribute, where
the value of this attribute is a language tag.
Best practice is to always declare a language on the root element (i.e., the element that contains all the other markup). For example, the language of an XHTML document can be specified as follows:
<html … xml:lang="en-US">
…
</html>
Language information is inherited, so by setting the attribute on the root element you automatically declare the language for all the elements and text in the document.
Overriding the language
Not all publications are written in a single language. Multilingual publications may switch between languages often, while other publications may contain short phrases or single words in another language.
To indicate a change of language, you only need to declare the new language on a tag that surrounds the foreign text. The change in language only exists within that tag, as shown in the following example:
<p xml:lang="en">
This is in English
<span xml:lang="fr">mais ceci en français</span>
and back to English again.
</p>
Note
The lang
attribute is omitted from these XHTML examples for clarity. Refer to
the section on lang
in XHTML and SVG for why it is
useful to include.
The text of markup documents always inherits the language of the nearest ancestor tag with a language declaration, so there is no limit on how many times the language can change:
<p xml:lang="en">
English
<span xml:lang="fr">
French
<span xml:lang="es">Spanish</span>
French
</span>
English
</p>
It is important to indicate when the primary language changes so that text-to-speech engines can pronounce the foreign language phrases correctly. Without the correct language information they will try to pronounce the text according to the rules for the default language.
It is not necessary to indicate a language change for terms and phrases that have become part of the default language, however. Words like "café" and "coup d'état", although French in origin, are now considered common English phrases. Text-to-speech engines can typically handle these words as English.
Setting the package document language
As an EPUB publication is a collection of documents, there are multiple places where the language of the content must be specified. The first spot we will look at is the package document.
The package document is central to an EPUB publication as it contains the metadata about the work, the resources that belong to it, and how to order those resources into a reading order. As you may have guessed already, because the package document contains metadata such as the title and author names, it is important to tell reading systems what language this information is in.
The most common way to do this is to declare a language tag on the package
element,
as in the following example:
<package … xml:lang="en">
Because the package
element is the root element (i.e., it contains all the other
elements), the language you specify on this element will apply to all the metadata it contains.
Note
EPUB 2 does not allow a global language declaration using the xml:lang
attribute
on the package
element. You must declare an xml:lang
attribute on
every metadata tag.
With a global language declaration on the package
element, you only need to override
that declaration if metadata is written in another language. For example, if the book is a translation,
you can indicate the language of the author's name by adding a language declaration to their
dc:creator
tag:
<dc:creator xml:lang="fr">Albert Camus</dc:creator>
One limitation of the package document metadata is that it is not possible to override the language of the text within a metadata tag. If you have a title that includes a foreign-language term or phrase, for example, you cannot identify that that text is in a different language. It will have to be read in the default language for the tag.
Note that it is rarely helpful to use region codes (e.g., adding "-US" for American English) in the package document metadata. Users will typically expect to hear the metadata announced in their preferred regional dialect.
Setting the language of the package document metadata is only the first step in defining the needed language information for reading systems. It is also necessary to specify the language of the publication content in the package document, as is covered in the next section.
Note
EPUB does not currently have a method for adding translations of metadata. Consider the following two titles:
<dc:title>King Lear</dc:title>
<dc:title xml:lang="fr">Le roi Lear</dc:title>
A reading system will treat second title as a French subtitle (if it recognizes it at all).
It is possible, however, to provide metadata in an alternate script using the alternate-script
property.
Setting the publication language
Although the xml:lang
attribute specifies the language of the package document
metadata, it does not tell reading systems the language of the content of the publication.
The language of the metadata and content is often the same, but there are good reasons why a
separate method of specifying this is included. For example, the work may be multilingual, or
it may be written in a specific regional dialect.
EPUB requires authors to include at least one dc:language
tag in the package
document metadata to identify the primary language(s) of the content. Like with the
xml:lang
attribute, the value of this element is a language code:
<dc:language>es</dc:language>
If a publication is written in more than one language (e.g., a new language learning guide), you
can repeat the dc:language
element for each language (refer to example 2). Do not place all the languages into a single tag. The order
in which you list the languages indicates their primacy (i.e., the first
dc:language
element defines the primary language of the work).
The language information contained in the dc:language
tags is only informative,
however. Setting this property helps reading systems optimize the rendering of the publication.
They might use this information to preload a language-specific dictionary, for example, or to
preload a text-to-speech engine so that users do not encounter a delay when they try to voice
the content. It is still necessary to set both the language of the
package document metadata and the language of each content
document in the publication.
Setting the content language
Although language settings in the package document are important to set, it is even more critical to specify the language of each content document. The information set in the package document does not automatically filter down.
Setting the language of XHTML and SVG content documents, the two primary formats EPUB supports, is no different than setting the language in the package document. The primary language of the documents is set on the respective root element of each document (refer to example 4). You can then indicate that terms and phrases are in another language by wrapping them in any of HTML's or SVG's various tags.
<html … xml:lang="en" lang="en">
…
<body>
…
<p>
As the French would say, there is a
certain "<span xml:lang="fr" lang="fr">je ne
sais quoi</span>" about the way that …
</p>
…
</body>
</html>
Note
For more information about setting the language in XHTML documents, refer to the HTML Language topic in the knowledge base.
The lang
attribute
Although you are only required to use the xml:lang
attribute with XHTML
and SVG documents, it is best practice to also add a lang
attribute. When
doing so, the language tag expressed in the xml:lang
and lang
attributes must match. For example:
<html … xml:lang="en-US" lang="en-US">
The reason it is recommended to add both attributes is that XHTML and SVG documents
in EPUB publications may not always be processed as XML, despite the requirements of the
standard. A browser-based reading system might, for example, default to processing all the
XHTML documents as regular HTML. In this case, HTML processors ignore the
xml:lang
attribute as they only recognize the lang
attribute. By
always adding both attributes, you help ensure that the correct language information is
available to users regardless of how the document is processed.