HTML5 is not SGML-based, and there will be no official DTD for it. SGML can express only a rather limited set of rules for syntax. Yet, a DTD is useful for validation.
SGML validation helps to detect unintentional
low-level mistakes in code, such as a forgotten end tag
(e.g., a missing </i>
may turn large parts of text to italic)
or a mistyped attribute name (e.g., aling
for align
).
SGML validation also helps you the check that you only use the tags and attributes you intended to use and, when dealing with old pages, to detect use of features that you would rather get rid of. This is best achieved when you are in full command of the tags and attributes that are allowed. My DTD generator is a step in that direction.
Note that the HTML5 mode in the W3C Markup Validator as well as in the Validator.nu service apply a fixed set of rules, which generally reflects the current state of HTML5 drafts with some delay. This means, among other things, that it reports as errors the use of many tags and attributes that have been in HTML for a long time and are universally or almost universally supported by browsers. To people who wish to or need to keep using such features extensively, such checkers are of limited usefulness. It is difficult to pick up the real syntax errors from a pile of messages expressing dislike for some markup.
To use my experimental HTML5 DTD, more exactly a DTD for a markup language closely resembling HTML5, take the following steps:
doctype
declaration, if present):<!DOCTYPE HTML SYSTEM "absurl/html5.dtd">
http://
) of the DTD.
You can alternatively use the
permissive HTML DTD,
which additionally contains features mentioned in HTML5 drafts but
declared obsolete there, such as the
font
tag and the
align
attribute.
You can also use my DTD generator to select a set of tags as you like.
NAME
or ID
or IDREF
in
HTML 4.01) is more restrictive than in HTML5. Without this
restriction, validators would not e.g. check the uniqueness
of id
attributes.
frame
attribute is allowed in the
table
element, despite being declared obsolete in HTML5.
This is needed to describe the shorthand attribute border
in SGML.
data-
attributes are not allowed, as the rule for allowing them
cannot be expressed in SGML. (It would be possible to add the feature of allowing a given
set of additional attributes.)
aria-
attributes are not supported.
datasrc
, datafld
, and dataformatas
(obsolete per HTML5) are not supported.
math
element is defined as having just flow content, and no
other MathML markup is recognized (partly because it would complicate things a lot and
would require the problematic entities).
svg
.
rb
element (obsolete per HTML5) is not supported.
hidden=""
as opposite to hidden
or hidden="hidden"
).