An experiment on SGML-based syntax for HTML5


HTML5 is not SGML-based, and there will be no official DTD for it. SGML can express only a rather limited set of rules for syntax. Yet, a DTD is useful for validation.

SGML validation helps to detect unintentional low-level mistakes in code, such as a forgotten end tag (e.g., a missing </i> may turn large parts of text to italic) or a mistyped attribute name (e.g., aling for align).

SGML validation also helps you the check that you only use the tags and attributes you intended to use and, when dealing with old pages, to detect use of features that you would rather get rid of. This is best achieved when you are in full command of the tags and attributes that are allowed. My DTD generator is a step in that direction.

Note that the HTML5 mode in the W3C Markup Validator as well as in the service apply a fixed set of rules, which generally reflects the current state of HTML5 drafts with some delay. This means, among other things, that it reports as errors the use of many tags and attributes that have been in HTML for a long time and are universally or almost universally supported by browsers. To people who wish to or need to keep using such features extensively, such checkers are of limited usefulness. It is difficult to pick up the real syntax errors from a pile of messages expressing dislike for some markup.

How to use it

To use my experimental HTML5 DTD, more exactly a DTD for a markup language closely resembling HTML5, take the following steps:


You can alternatively use the permissive HTML DTD, which additionally contains features mentioned in HTML5 drafts but declared obsolete there, such as the font tag and the align attribute.

You can also use my DTD generator to select a set of tags as you like.

Limitations and features