Intro

HTML Tidy

Documentation Introduction

On this page you can refer to nearly everything you need to know about HTML Tidy. If you’re on macOS, Linux, or UNIX you can also use man tidy and read the purpose-built documentation for the version of Tidy that you have installed.

You can find configuration quick references in the API and Quick Reference Site.

If you’re a developer using libtidy please consult the API and Quick Reference Site here, and the libtidy Introduction page.

And if you simply want to use Tidy, then please read on.

What

What Tidy does

Tidy corrects and cleans up HTML content by fixing markup errors. Here are a few examples:

  • Mismatched end tags:

    <h2>subheading</h3>

    …is converted to:

    <h2>subheading</h2>

  • Misnested tags:

    <p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?

    …is converted to:

    <p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?

  • Missing end tags:

    <h1>heading
    <h2>subheading</h2>
    

    …is converted to:

    <h1>heading</h1>
    <h2>subheading</h2>
    

    …and

    <h1><i>italic heading</h1>

    …is converted to:

    <h1><i>italic heading</i></h1>

  • Mixed-up tags

    <i><h1>heading</h1></i>
    <p>new paragraph <b>bold text
    <p>some more bold text
    

    …is converted to:

    <h1><i>heading</i></h1>
    <p>new paragraph <b>bold text</b>
    <p><b>some more bold text</b>
    
  • Tag in the wrong place:

    <h1><hr>heading</h1>
    <h2>sub<hr>heading</h2>
    

    …is converted to:

    <hr>
    <h1>heading</h1>
    <h2>sub</h2>
    <hr>
    <h2>heading</h2>
    
  • Missing “/” in end tags:

    <a href="#refs">References<a>

    …is converted to:

    <a href="#refs">References</a>

  • List markup with missing tags:

    <body>
    <ul>
    <li>1st list item
    <li>2nd list item
    

    …is converted to:

    <body>
    <ul>
    <li>1st list item</li>
    <li>2nd list item</li>
    </ul>
    

    Note Tidy will warn about the missing ul close tag, but not about the optional li close tag.

  • Missing quotation marks around attribute values

    Tidy inserts quotation marks around all attribute values for you. It can also detect when you have forgotten the closing quotation mark, although this is something you will have to fix yourself.

  • Unknown/proprietary attributes

    Tidy has a comprehensive knowledge of the attributes defined in HTML5. That often allows you to spot where you have mis-typed an attribute.

  • Tags lacking a terminating >

    This is something you then have to fix yourself as Tidy cannot determine where the > was meant to be inserted.

Use

Running Tidy

Running Tidy in a Terminal (Console)

This is the syntax for invoking Tidy from the command line:

tidy [[options] filename]

Tidy defaults to reading from standard input, so if you run Tidy without specifying the filename argument, it will just sit there waiting for input to read.

Tidy defaults to writing to standard output. So you can pipe output from Tidy to other programs, as well as pipe output from other programs to Tidy. You can page through the output from Tidy by piping it to a pager, e.g.:

tidy file.html | less

To have Tidy write its output to a file instead, either use the

-o filename or -output filename

option, or redirect standard output to the file. For example:

tidy -o output.html index.html
tidy index.html > output.html

Both of those run Tidy on the file index.html and write the output to the file output.html, while writing any error messages to standard error.

Tidy defaults to writing its error messages to standard error (that is, to the console where you’re running Tidy). To page through the error messages along with the output, redirect standard error to standard output, and pipe it to your pager:

tidy index.html 2>&1 | less

To have Tidy write the errors to a file instead, either use the

-f filename or -file filename

option, or redirect standard error to a file:

tidy -o output.html -f errs.txt index.html
tidy index.html > output.html 2> errs.txt

Both of those run Tidy on the file index.html and write the output to the file output.html, while writing any error messages to the file errs.txt.

Writing the error messages to a file is especially useful if the file you are checking has many errors; reading them from a file instead of the console or pager can make it easier to review them.

You can use the or -m or -modify option to modify (in-place) the contents of the input file you are checking; that is, to overwrite those contents with the output from Tidy. For example:

tidy -f errs.txt -m index.html

That runs Tidy on the file index.html, modifying it in place and writing the error messages to the file errs.txt.

Caution: If you use the -m option, you should first ensure that you have a backup copy of your file.

Running Tidy in Scripts

If you want to run Tidy from a Perl, bash, or other scripting language you may find it of value to inspect the result returned by Tidy when it exits: 0 if everything is fine, 1 if there were warnings and 2 if there were errors. This is an example using Perl:

if (close(TIDY) == 0) {
  my $exitcode = $? &gt;&gt; 8;
  if ($exitcode == 1) {
    printf STDERR "tidy issued warning messages\n";
  } elsif ($exitcode == 2) {
    printf STDERR "tidy issued error messages\n";
  } else {
    die "tidy exited with code: $exitcode\n";
  }
} else {
  printf STDERR "tidy detected no errors\n";
}
Configure

Tidy Options and Configuration

From the Terminal (Console)

To get a list of available options, use:

tidy -help

To get a list of all configuration settings, use:

tidy -help-config

To see the default configuration values, use:

tidy -export-default-config

To read the help output a page at time, pipe it to a pager, e.g.,:

tidy -help | less
tidy -help-config | less

Single-letter options other than -f may be combined; for example:

tidy -f errs.txt -imu foo.html

Using a configuration file

The most convenient way to configure Tidy is by using separate configuration file.

Assuming you have created a Tidy configuration file named config.txt (the name and extension don’t matter), you can instruct Tidy to use it via the command line option -config config.txt; for example:

tidy -config config.txt file1.html file2.html

Alternatively, you can name the default config file via the environment variable named HTML_TIDY, the value of which is the absolute path for the config file.

You can also set config options on the command line by preceding the name of the option immediately (no intervening space) with the string “--”; for example:

tidy --break-before-br true --show-warnings false

You can find Quick Reference documentation for your version of Tidy that describe the full set of configuration options on our API and Quick Reference Page.

Sample Configuration File

The following is an example of a Tidy config file.

// sample config file for HTML tidy
indent: auto
indent-spaces: 2
wrap: 72
markup: yes
output-xml: no
input-xml: no
show-warnings: yes
numeric-entities: yes
quote-marks: yes
quote-nbsp: yes
quote-ampersand: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: latin1
new-inline-tags: cfif, cfelse, math, mroot,
  mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
  munder, mover, mmultiscripts, msup, msub, mtext,
  mprescripts, mtable, mtr, mtd, mth
new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse
Featured

Featured Options and Solutions

Indenting output for readability

Indenting the source markup of an HTML document makes the markup easier to read. Tidy can indent the markup for an HTML document while recognizing elements whose contents should not be indented. In the example below, Tidy indents the output while preserving the formatting of the <pre> element:

Input:

<html>
 <head>
 <title>Test document</title>
 </head>
 <body>
 <p>This example shows how Tidy can indent output while preserving
 formatting of particular elements.</p>

 <pre>This is
 <em>genuine
       preformatted</em>
    text
 </pre>
 </body>
 </html>

Output:

<html>
  <head>
    <title>Test document</title>
  </head>

  <body>
    <p>This example shows how Tidy can indent output while preserving
    formatting of particular elements.</p>
<pre>
This is
<em>genuine
       preformatted</em>
   text
</pre>
  </body>
</html>

Tidy’s indenting behavior is not perfect and can sometimes cause your output to be rendered by browsers in a different way than the input. You can avoid unexpected indenting-related rendering problems by setting indent:no or indent:auto in a config file.

Preserving original indenting not possible

Tidy is not capable of preserving the original indenting of the markup from the input it receives. That’s because Tidy starts by building a clean parse tree from the input, and that parse tree doesn’t contain any information about the original indenting. Tidy then pretty-prints the parse tree using the current config settings. Trying to preserve the original indenting from the input would interact badly with the repair operations needed to build a clean parse tree, and would considerably complicate the code.

Encodings and character references

Tidy defaults to assuming you want output to be encoded in UTF-8. But Tidy offers you a choice of other character encodings: US ASCII, ISO Latin-1, and the ISO 2022 family of 7 bit encodings.

Tidy doesn’t yet recognize the use of the HTML <meta> element for specifying the character encoding.

The full set of HTML character references are defined. Cleaned-up output uses named character references for characters when appropriate. Otherwise, characters outside the normal range are output as numeric character references.

Accessibility

Tidy offers advice on potential accessibility problems for people using non-graphical browsers. Have a look at our rescued HTML Tidy Accessibility Checker page.

Cleaning up presentational markup

Some tools generate HTML with presentational elements such as <font>, <nobr>, and <center>. Tidy’s ‑clean option will replace those elements with <style> elements and CSS.

Some HTML documents rely on the presentational effects of <p> start tags that are not followed by any content. Tidy deletes such <p> tags (as well as any headings that don’t have content). So do not use <p> tags simply for adding vertical whitespace; instead use CSS, or the <br> element. However, note that Tidy won’t discard <p> tags that are followed by any non-breaking space (that is, the &nbsp; named character reference).

Teaching Tidy about new tags

You can teach Tidy about new tags by declaring them in the configuration file, the syntax is:

new-inline-tags: tag1, tag2, tag3
new-empty-tags: tag1, tag2, tag3
new-blocklevel-tags: tag1, tag2, tag3
new-pre-tags: tag1, tag2, tag3

The same tag can be defined as empty and as inline, or as empty and as block.

These declarations can be combined to define a new empty inline or empty block element, but you are not advised to declare tags as being both inline and block.

Note that the new tags can only appear where Tidy expects inline or block-level tags respectively. That means you can’t place new tags within the document head or other contexts with restricted content models.

Ignoring PHP, ASP, and JSTE instructions

Tidy will gracefully ignore many cases of PHP, ASP, and JSTE instructions within element content and as replacements for attributes, and preserve them as-is in output; for example:

<option <% if rsSchool.Fields("ID").Value
  = session("sessSchoolID")
  then Response.Write("selected") %>
  value='<%=rsSchool.Fields("ID").Value%>'>
  <%=rsSchool.Fields("Name").Value%>
  (<%=rsSchool.Fields("ID").Value%>)
</option>

But note that Tidy may report missing attributes when those are “hidden” within the PHP, ASP, or JSTE code. If you use PHP, ASP, or JSTE code to create a start tag, but place the end tag explicitly in the HTML markup, Tidy won’t be able to match them up, and will delete the end tag. In that case you are advised to make the start tag explicit and to use PHP, ASP, or JSTE code for just the attributes; for example:

<a href="<%=random.site()%>">do you feel lucky?</a>

Tidy can also get things wrong if the PHP, ASP, or JSTE code includes quotation marks; for example:

value="<%=rsSchool.Fields("ID").Value%>"

Tidy will see the quotation mark preceding ID as ending the attribute value, and proceed to complain about what follows.

Tidy allows you to control whether line wrapping on spaces within PHP, ASP, and JSTE instructions is enabled; see the wrap-php, wrap-asp, and wrap-jste config options.

Correcting well-formedness errors in XML markup

Tidy can help you to correct well-formedness errors in XML markup. Tidy doesn’t yet recognize all XML features, though; for example, it doesn’t understand CDATA sections or DTD subsets.

Build

Building Tidy

Source code

Tidy’s sourcecode can be found at https://github.com/htacg/tidy-html5. There are sometimes several branches, but in general Master is the most recently updated version. Note that as “cutting edge,” it may have bugs or other unstable behavior. If you prefer a stable, officially released version, be sure to have a look at Releases on the github page.

In general you can use the Download ZIP button on the github page to download the most recent version of a branch. If you prefer Git then you can use, e.g.:

git clone git@github.com:htacg/tidy-html5.git

…to clone the repository to your working machine.

Build the tidy command-line tool and libtidy library

For Linux/BSD/Mac platforms, you can build and install the tidy command-line tool from the source code using the following steps:

  1. cd {your-tidy-html5-directory}/build/cmake

  2. cmake ../.. [-DCMAKE_INSTALL_PREFIX=/path/for/install]

  3. Windows: cmake --build . --config Release
    Unix/OS X: make

  4. Install, if desired:
    Windows: cmake --build . --config Release --target INSTALL
    Unix/OS X: [sudo] make install

Note that you will either need to run make install as root, or with sudo make install.

FAQ

FAQs

What now?

If you have a popup screen that reads similar to the below:

HTML Tidy for Windows <vers 1st August 2002; built on Aug 8 2002, at 15:41:13>
Parsing Console input <stdin>

…and do not know what to do next, read on.

Tidy is waiting for your HTML to come in so that it can parse it. Tidy is fundamentally a tool that reads in HTML, cleans it up, and then writes it out again. It was developed as a program you run from the console prompt, but there are GUI encapsulations available, e.g. HTML-Kit, which you might prefer.

From the console prompt you can run Tidy like this:

C> tidy -m mywebpage.html

In this case, the -m option requests Tidy to write the tidied file back to the same filename as it read from (mywebpage.html). Tidy will give you a breakdown of the problems it found and the version of HTML the file appears to be using.

To get a listing of Tidy command line options, just type tidy -?. To see a listing on configuration options, try tidy -help-config. To get more info on the config options, see the applicable Quick Reference.

How to get support and/or file a bug report and/or feature request

For support and/or to file a bug report for HTACG’s HTML Tidy, please use our bug tracker. For general Tidy support, including for different versions of Tidy and for products that use libtidy, a good location is the original W3C mailing list html-tidy@w3.org.

Best practice to submit a bug report

Prior to submitting a bug report, please check that the bug is not already known. Many are. If you are not sure, just ask. If it is new bug, make sure to include at least the following information in your report:

  • A description of what you think went wrong.
  • The HTML Tidy version (find it out by running tidy -v), and operating system you are running.
  • The input that exposes the bug. A small HTML document that reproduces the problem is best.
  • The configuration options you’ve used. Command line options like -asxml, configuration files, etc. You may use tidy -show-config to get an overview of the active Tidy settings.
  • Your e-mail address for further questions and comments.

This information is necessary to reproduce whatever is failing; without them we cannot help you.

Please include only one bug per report. Reports with multiple bugs are less easy to track and some bugs may get missed.

Best practice to submit a feature request

If you want Tidy to do something new that it doesn’t do today (or to stop doing something), then it is probably a feature request.

As with bugs, please be sure that the feature has not already been requested. If the feature has already been requested, you can add your comments to the issue tracker. If the feature has not already been requested, send the same information as for a bug report, but place special emphasis on the desired output for a given input, desired options, etc. Please be as specific as possible about what you want Tidy to do.

How Do I Control the Output Layout?

There are three primary options that control how Tidy formats your markup:

  • indent
  • indent-attributes
  • vertical-space

Briefly, indent sets the level of left-to-right indenting and, somewhat, how often elements are put onto a new line. The options are yes, no, and auto.

indent-attributes is a flag that, when set, tells Tidy to put each attribute on a new line.

vertical-space is a flag that, when set, tells Tidy to add some empty lines for readability.

The default for all three is no. These options may be used in any combination to control how you want your markup to look. The best thing is to experiment a bit to see what you like. Be aware that indent yes is deprecated for production use as it will cause visual changes in most browsers.

To get Tidy Classic --indent auto layout, use the following options:

indent: auto
indent-attributes: no
vertical-space: yes

You can read about more pretty print options in the applicable Quick Reference.

What version of Tidy should I Use?

The current HTACG builds are recommended. You can find these on the github repository or from our website.

Please continue to report examples where Tidy does not catch some ill-formed HTML, or (worse) generates ill-formed HTML. These cases have been significantly reduced. That said, be sure to test Tidy with some representative files from your environment.

For building a front end (e.g. GUI or language binding), the simplest approach is to use libtidy. For more information about building and coding with libtidy, see the Introduction To libtidy.

How do I Run a regression test?

You might ask, “Why should I run a regression test?” If you are a Tidy user, you might want to compare a new version of Tidy to the version you are currently running. This is a good idea if you are using Tidy in production applications such as web publishing. If you are a Tidy developer, it is a good idea to run the regression test suite to make sure your fix or enhancement doesn’t add new bugs.

Detecting new bugs is easier said than done because sometimes they are subtle and can only be seen in browsers (or one particular browser you don’t even have). You can catch most crashes and many layout problems by running the test suite as described here.

The basic process is simple: run the test suite before and after making changes to libtidy and compare the output markup and messages. Be aware that the test scripts for Windows (alltest.cmd) and Linux/Unix (testall.sh) place the output files in tidy/test/tmp. If you forget to run the before test, you can always download a binary or checkout the previous version of the branch you are testing.

Here are the steps to evaluate the impact of a libtidy change.

Note: these steps may or may not be accurate as of 2015-October-16. Please submit a bug report if you verify whether or not these instructions still work before we do.

Regression test for Windows

Before making changes:

C:\tidy\test> alltest.cmd
C:\tidy\test> ren tmp baseline

After making changes and building Tidy:

C:\tidy\test> alltest.cmd
C:\tidy\test> windiff tmp baseline
Regression test for Mac/Linux/Unix

Before making changes:

~/tidy/test$ ./testall.sh
~/tidy/test$ mv tmp baseline

After making changes and building Tidy:

~/tidy/test$ ./testall.sh
~/tidy/test$ diff -u tmp baseline > diff.txt
License

License

HTML parser and pretty printer

Copyright © 1998-2003 World Wide Web Consortium (Massachusetts Institute of Technology, European Research Consortium for Informatics and Mathematics, Keio University). All Rights Reserved.

Copyright © 2003-2015 by additional contributors.

This software and documentation is provided “as is,” and the copyright holders and contributing author(s) make no representations or warranties, express or implied, including but not limited to, warranties of merchantability or fitness for any particular purpose or that the use of the software or documentation will not infringe any third party patents, copyrights, trademarks or other rights.

The copyright holders and contributing author(s) will not be held liable for any direct, indirect, special or consequential damages arising out of any use of the software or documentation, even if advised of the possibility of such damage.

Permission is hereby granted to use, copy, modify, and distribute this source code, or portions hereof, documentation and executables, for any purpose, without fee, subject to the following restrictions:

  1. The origin of this source code must not be misrepresented.
  2. Altered versions must be plainly marked as such and must not be misrepresented as being the original source.
  3. This Copyright notice may not be removed or altered from any source or altered source distribution.

The copyright holders and contributing author(s) specifically permit, without fee, and encourage the use of this source code as a component for supporting the Hypertext Markup Language in commercial products. If you use this source code in a product, acknowledgment is not required but would be appreciated.