Note

Instructions on how to obtain and configure a system to run EPUBCheck are available from the EPUBCheck Usage wiki This page does not cover installation issues.

Getting Started

EPUBCheck is a free command-line validation tool that automatically checks EPUB publications for conformance to the standard. It reports issues that are in violation of the requirements of the standard and/or that could cause a publication not to open or render in reading systems.

The EPUBCheck program is written as a Java library, so running it requires some basic knowledge of entering instructions from a command-line interface. Which interface to use depends on the operating system. On Windows, both the Command Prompt (cmd.exe) and PowerShell (powershell.exe) can be used to run EPUBCheck. On Macs, the Terminal app can be used.

When starting a new command-line interface, the starting directory is usually listed before the input prompt:

C:\users\default>_

Although EPUBCheck can be run from almost any directory, it is often simpler to change directories to where the publication to be validated is located to avoid entering lengthy file paths. The cd command is used to change the directory.

The following command will change the command line interface to the c:\epubs directory:

> cd c:\epubs

The command to run EPUBCheck is the same regardless of the operating system used. It consists of the java command followed by two arguments:

  1. -jar path\to\epubcheck.jar — The first argument is a reference to the Java jar file (epubcheck.jar) that contains the EPUBCheck program. The exact path used to reference this file depends on where EPUBCheck is installed and what directory EPUBCheck is run from.
  2. publication.epub — The second argument is the name of the packaged EPUB publication to check. The full path to the file has to be specified if it is not in the same directory where EPUBCheck is run.

The following is an example of how EPUBCheck could be called from the command line to validate the accessible_epub_3.epub file:

> java -jar c:\epubcheck\epubcheck.jar c:\epubs\accessible_epub_3.epub

Note

Full paths to the epubcheck.jar file and publication to be checked are omitted from the rest of the examples on this page for readability purposes.

If EPUBCheck does not detect any problems, it will emit a series of messages like the following:

Validating using EPUB version 3.2 rules.
No errors or warnings detected.
Messages: 0 fatals / 0 errors / 0 warnings / 0 infos

EPUBCheck completed

The first line specifies what version of EPUB is being validated. In the case of the preceding example, the progam is checking against the requirements of EPUB 3.2.

The next line two lines indicate that no issues were found, but this is where any issues will be reported when there are problems.

The final statement just confirms that the program has terminated successfully.

A clean validation result is preferred outcome before moving on to check the accessibility of a publication. In practice, however, EPUBCheck will often uncover warnings and errors that need fixing first.

The following is an example of a report containing a markup error:

Validating using EPUB version 3.2 rules.
ERROR(RSC-005): accessible_epub_3.epub/EPUB/ch01.xhtml(98,5): Error while
parsing file: element "p" not allowed here; expected the element end-tag

Check finished with errors
Messages: 0 fatals / 1 error / 0 warnings / 0 infos

EPUBCheck completed

The EPUBCheck error message identifies both the file containing the error (accessible_epub_3.epub/EPUB/ch01.xhtml) as well as the line number (98) and character offset (5) where it occurs. It also provides a brief message explaining the problem — in this case, a p tag is used where it is not allowed, as shown in the following screen grab of the file:

A p tag is included after the closing body tag

Unfortunately, not every message is as easy to understand as this one, and there are times when EPUBCheck will not provide useful a line number or character offset (e.g., it often reports (-1,-1) when a problem cannot be traced to a specific location). As a result, fixing errors will often require some sleuth work.

The message counts that EPUBCheck outputs is more of a convenience for tracking that problems are being solved between runs, as the numbers reported can often be misleading.

For example, one markup error could be the cause of many other errors, so fixing one problem can lead to many or all the other problems disappearing. It is recommended to re-validate publications often when fixing bugs to avoid spending time look for problems that have already been solved.

It is also not the case that every issue that EPUBCheck raises has to be fixed, as will be covered in the next section on message severity.

Message Severity

EPUBCheck reports five different categories of messages — four are output by default and one type, usage messages, has to be turned on in the arguments. These categories reflect the severity of the problem being reported, ranging from fatal at the highest to info at the lowest.

Each issue raised by EPUBCheck starts with one of these labels:

WARNING(OPF-053): 30/accessible_epub_3.epub/EPUB/package.opf(9,29): Date value 'Tuesday' does not follow recommended syntax ...
ERROR(RSC-005): 30/accessible_epub_3.epub/EPUB/ch01.xhtml(91,19): Error while parsing file: attribute "border" not allowed here ...

The severity is followed by an error code in parentheses, but these codes can typically be ignored. They are an internal classification system used by the program and not helpful outside of debugging issues with EPUBCheck itself.

A summary of the total number of each is type provided when EPUBCheck finishes validating:

Messages: 2 fatals / 4 error / 18 warnings / 1 infos / 147 usages

The meaning of each category is explained in the following list.

Fatal

Fatal errors are emitted for the most severe problems. A fatal error is one that will prevent reading systems from opening a publication (e.g., packaging problems), or that will prevent the content from being readable (e.g., critical xml errors in the content).

Fatal errors must be corrected to ensure the usability of the publication.

Error

Errors represent the next lower severity level. Errors identify where a publication deviates from the requirements of the EPUB standard. They typically will not prevent an EPUB publication from being opened, but they often lead to unintended consequences (e.g., content not rendering as expected).

Errors typically need to be corrected to ensure the usability of the publication, but some format-specific errors do not affect the accessibility (see the note on understanding errors in relation to WCAG).

Warning

Warnings are a level below Errors. They identify deviations from recommended practices in the EPUB standard. A publication with warnings is not invalid, and may have no unintended rendering issues, but not following recommended practices can lead to interoperability issues.

Warnings do not have to be corrected, but it is generally advised to follow recommended practices. Many vendors will not accept publications with warnings.

Info

Info messages are the lowest level message output by default by EPUBCheck. These messages usually provide alerts about issues with EPUBCheck itself (e.g., that it might not be able to decrypt a file to check it, or that a potential issue is under discussion). They are not common.

Info messages do not necessarily require any action but are important to review to discover what EPUBCheck has had a problem with.

Usage

Usage messages are like best practices: they identify preferred practices, but it is not strictly necessary to follow the guidance. Depending on the publication, EPUBCheck may emit a large number of these messages so they are turned off by default.

To get a list of usage messages, the usage flag (-u) has to be set when starting EPUBCheck.

It is not required to fix all usage messages, but consideration should be given to following the best practices they highlight.

WCAG Parsing Requirements

Although it is generally best practice to fix any errors in an EPUB publication, WCAG does not enforce strict validity — not all errors are critical to the accessible reading of content.

WCAG only requires invalid content be fixed where the errors or warnings are from problems that will cause accessibility issues. Some examples include:

  • duplicate IDs that break ARIA attribute references (e.g., in custom controls or description links);
  • duplicate IDs that break control labels or table headings;
  • more than one role attribute specified on an element.

In the past, success criterion 4.1.1 was used as a catch-all for HTML parsing errors (i.e., well-formedness errors) that could affect the ability of assistive technologies to process a document. As modern HTML parsing now standardizes the method for handling these well-formedness errors, 4.1.1 always passes conformance evaluations. Parsing errors that affect the accessibility of publications, such as those described above, are covered by other success criteria and should be reported under their more applicable success criterion.

For more information, refer to the explainer for success criterion 4.1.1.

Arguments

An argument is an additional option that can be specified when EPUBCheck is run. Arguments are typically included after calling the EPUBCheck jar file and before specifying the publication or file to validate.

The following example shows how to request that EPUBCheck include usage messages in its output using the -u argument:

> java -jar epubcheck.jar -u accessible_epub_3.epub

Many of EPUBCheck's arguments have both a verbose and compact form. The verbose form has two dashes in front of the name, while the compact is a single dash followed by a letter. Having EPUBCheck output usage messages can be accomplished either using the compact -u argument, as in the last example, or the verbose --usage.

Although EPUBCheck includes many arguments that allow it to be run in different ways and for different purposes, this section only identifies some of the most useful.

Note

For a complete list of all the arguments EPUBCheck accepts, run the program with only the help (-h) argument: java -jar epubcheck.jar -h

Version (-v)

The version argument is used to specify what version of EPUB to validate against. It takes either the value "2.0" or "3.0" to indicate whether the content conforms to EPUB 2 or EPUB 3:

> java -jar epubcheck.jar -v 2.0 ...

It is not necessary to specify this argument when validating full publications as EPUBCheck will automatically determine the correct version from the package document. It is typically paired with the mode argument to validate individual files within a publication.

The version argument does not have a verbose form. The use of --version is reserved for discovering which version of EPUBCheck is installed.

Mode (--mode or -m)

EPUBCheck is capable of running on more than just packaged EPUB files. The mode argument is used in these cases to guide the program to the correct type of validation to perform.

The mode argument is typically paired with a version argument as in the following example for validating an EPUB 2 package document:

> java -jar epubcheck.jar -mode opf -v 2.0 package.xml

The various mode values are explained in the following list.

opf

The opf value indicates that an EPUB 2 or 3 package document is to be validated.

> java -jar epubcheck.jar -mode opf -v 3.0 package.xml
xhtml

The xhtml value indicates that an EPUB 2 or 3 XHTML content document is to be validated.

> java -jar epubcheck.jar -mode xhtml -v 3.0 chapter01.xhtml
svg

The svg value indicates that an EPUB 2 or 3 SVG content document is to be validated.

> java -jar epubcheck.jar -mode svg -v 2.0 heart.svg
nav

The nav value indicates that an EPUB 3 navigation document is to be validated.

> java -jar epubcheck.jar -mode nav nav.xhtml

EPUBCheck does not validate individual EPUB 2 NCX documents.

mo

The mo value indicates that an EPUB 3 media overlays is to be validated. EPUB 2 does not have an equivalent feature.

> java -jar epubcheck.jar -mode mo chapter01.smil
exp

The exp value is used to indicate that an unpackaged EPUB 2 or 3 publication is to be validated (i.e., the contents are "expanded" in their own directory). In this mode, the directory containing the unzipped files is used instead of a file name.

> java -jar epubcheck.jar -mode exp accessible_epub_3

It is not necessary to specify the version argument for expanded EPUB validation, as EPUBCheck will automatically determine the version from the package document.

Usage Messages (--usage or -u)

As covered in the section on message severity, usage messages are not turned on by default. The usage argument must be specified to enable them.

> java -jar epubcheck.jar -u accessible_epub_3.epub
Save (--save or -s)

The save argument provides a helpful option to create a packaged EPUB file from an expanded directory, avoiding the troubles that can arise from manually zipping an EPUB. It only works when the expanded mode argument is specified:

> java -jar epubcheck.jar -mode exp -s accessible_epub_3

If EPUBCheck completes without any fatal error or error messages, it will put a copy of the zipped EPUB file in the same directory that contains the folder with the publication (e.g., if run on the folder c:\epubs\accessible_epub_3 the saved file will be located at c:\epubs\accessible_epub_3.epub).

Redirecting Output

Reading the output from EPUBCheck in a command-line window can be a challenge, especially when there are a lot of messages to work through. Fortunately, all command-line interfaces have a feature that allows the output from a program to be captured and redirected to a file. Reading EPUBCheck messages in a text editor is much simpler, as line wrap can be turned off and the text is more easily scrolled and searched.

To redirect the output of EPUBCheck, a greater-than character (>) is used at the end of the command followed by the file to redirect the output to.

The following command will redirect the output to the file output.txt:

> java -jar epubcheck.jar accessible_epub_3.epub > output.txt

Other than outputting to a directory that can be written to (e.g., operating systems do not allow writing files into special system directories), there are no restrictions on the output file's name or path.

Each time EPUBCheck is run this way it will overwrite the previous file with the new results so it is not necessary to specify a new file name each time. Many text editors will automatically reload the new output when EPUBCheck exits, too, further simplifying the process.

Limitations

Although EPUBCheck is capable of capturing the most serious usability issues that can arise with EPUB publications, it has a number of limitations on what it can effectively check. These include:

As a result, manual checking of EPUBs for usability in reading systems is always recommended.

And, more importantly, to ensure the accessibility of publications, the Ace and SMART tools also need to be run. Refer to their respective pages in the knowledge base for more information on testing for accessibility.

Alternatives

If the idea of running EPUBCheck from the command line is overwhelming, there are simpler options available. The following are some commonly used alternatives:

A list of additional alternatives is available on the EPUBCheck wiki.

Related Links