Obviously, since some characters such as < are used with a very special meaning in HTML, there must be some way of expressing them as data characters, i.e. when they should appear e.g. as part of the document itself or in a URL. The convention is that the following notations are used:
character | notation | usual name(s) of the character |
---|---|---|
< | < | less than character, left angle bracket |
> | > | greater than character, right angle bracket |
& | & | ampersand |
Technically speaking, it is not always necessary to use the escape notations for characters listed above It is, however, easier and safer to follow the simple rules which work always.
There was notation " for the double quote (") in HTML 2.0, but it does not belong to HTML 3.2 (for certain technical reasons). The double quote can be typed as such within normal text, and (in principle at least) within quoted strings as well if the single quotes are used as the outermost quotes.
Notice that the semicolon is part of the escape sequence. In principle, it is necessary only if the following character would otherwise be recognized as part of the name. In practice, it is best to adopt the habit of always terminating an escape sequence with a semicolon.
In escape sequences, the case of letters is significant. For example, the ampersand & may not be represented as & (this escape sequence is undefined), and the escape sequences ä and Ä denote two distinct characters, a umlaut (a dieresis, the letter a with two dots above it) in lower case and in upper case (ä and Ä); notice the principle of uppercasing only the first letter in the escape notation (&AUML; is undefined).
The need for the above-mentioned escape sequences arises from the syntax of HTML. In fact there are escape sequences for all characters in the ISO Latin 1 character set. There are
© | copyright sign, © |
® | registered trademark sign, ® |
| non-breaking space |
However, there is usually little reason to use other escape sequences than < and > and &. Using ä instead of ä might seem to give some character code independency, but it does not; if a browser can display ä correctly, it can also display correctly a document in which the character ä is specified directly. But notice that sometimes you cannot input some special characters directly due to keyboard restrictions, and in such cases you can have use for notations like ä.
And please notice that "character ä" means the ISO Latin 1 character with name "small letter a with diaeresis" (diaeresis = umlaut), with code 344 in octal, 228 in decimal. It can be entered into an HTML document in various ways. It is possible that pressing a key labeled with ä or Ä is not among those ways. For instance, on a Macintosh with Scandinavian keyboard the ä key normally produces a character quite different from ä in ISO Latin 1. Various programs may or may not handle this by performing character code conversions.
Some browsers support other escape sequences than those mentioned above, for example ™ and &cbsp;. The use of such notations is strongly discouraged. (Notation ™ refers to a symbol which does not belong to ISO Latin 1 at all; you may wish to use the HTML 3.2 conformant notation <SUP>(TM)</SUP> instead. Notation &cbsp; stands for "conditional breaking space", not in ISO Latin 1 and possibly not intended to be a character at all.)
This name concept occurs in the description of HTTP-EQUIV and NAME attributes of the META element and in the description of NAME attribute of the PARAM element.
In other contexts, a string which is used to name something may contain other characters as well but then it must be quoted.
It is of course possible that due to software or hardware limitations all colors cannot be presented. On some devices, the actual rendering might be just black and white or different shades of grey.
When a color is specified as the value of an attribute, there are two possibilities:
The following table lists the predefined color names and their numerical equivalents.
Black = "#000000" | Green = "#008000" |
Silver = "#C0C0C0" | Lime = "#00FF00" |
Gray = "#808080" | Olive = "#808000" |
White = "#FFFFFF" | Yellow = "#FFFF00" |
Maroon = "#800000" | Navy = "#000080" |
Red = "#FF0000" | Blue = "#0000FF" |
Purple = "#800080" | Teal = "#008080" |
Fuchsia = "#FF00FF" | Aqua = "#00FFFF" |
These colors were originally picked as being the standard 16 colors supported with the Windows VGA palette. The HTML 3.2 Reference Specification contains a section on colors with sample images in each of the 16 colors.
See also
Pixel values used in several contexts like width specifications refer to screen pixels. The physical size of a pixel depends on the user's screen.
People often ask "for what resolution should I write". See WDG Web Authoring FAQ, question For what screen size should I write? for a short answer.
A browser should multiply the pixel values by an appropriate factor when rendering to very high resolution devices such as laser printers. For instance if a browser has a display with 75 pixels per inch and is rendering to a laser printer with 600 dots per inch, then it should multiply the pixel values given in HTML attributes by a factor of 8.
The HTML 2.0 specification says:
Use of the non-breaking space and soft hyphen indicator characters is discouraged because support for them is not widely deployed.
This is somewhat misleading. The soft hyphen should really be avoided; it serves no useful purpose in HTML. But as regards to non-breaking space, it seems to be honored rather well in its basic meaning described above. And although the HTML 3.2 Reference Specification is not explicit about the matter in general, it suggests, in the discussion of the NOWRAP attribute of TH and TD elements, that should act as non-breaking space within table cells at least.
If you use non-breaking spaces, use them instead of normal
spaces, not in addition to them. For instance, if you wish to prevent a line
break between
version
and 3
, type
version 3
(not version 3
).
On the other hand, within a table in HTML 3.2, can have quite different meaning, which can be described as non-empty space: on several browsers, when a table is presented with borders, cells with empty contents are drawn without them, and spaces only do not constitute contents - but does! So there is a difference between <TD></TD> and <TD> </TD>. (Netscape also ignores background color suggestions for a table cell unless there is some content, at least , in the cell.) Notice that there can be better ways to deal with empty cells than to use no-break spaces.
For further confusion, some people use to force spaces into the visible presentation of a document, e.g. by putting an or a few of them into the beginning of a paragraph to get its first line indented. This actually works on most browsers, but it is unwise to rely on that, and it is normally useless to try to enforce such presentation features anyway. Indentation can be rather successfully suggested using stylesheets. (And consider what happens when a user has carefully designed a user stylesheet which makes paragraphs presented that way. If you use the hack, that user - who assumably really cares about the presentation of paragraphs - will see first lines of paragraphs on your pages doubly indented!) The trick of using between words inside a paragraph to create wider spacing is probably less risky. Other tricks which utilize the common but non-guaranteed treatment of by browsers include using it to create a "flexible pseudo-table" and to try to make options in a SELECT menu be of equal width.
See also notes on the no-break space in ISO-8859 briefing and resources by Alan Flavell.
You can begin a comment with the four-character sequence <!-- (less than sign, exclamation sign, two hyphens) and terminate it with the three-character sequence --> (two hyphens, greater than sign). Don't use the character pair -- or the character > within a comment. For example:
<!-- Written by Jukka Korpela -->The reason for the above rule for not using > within a comment is not the syntax of HTML but known deficiencies of popular browser. A practical consequence is that you should not try to "comment out" parts of your document; any HTML markup in such parts would confuse many browsers.
For a more thorough discussion of comment syntax, see document HTML comments by WDG.
It is generally preferable to include metainformation about the document into HTML elements, such as META. Consider making information about purpose, author, creation and last update time etc a visible part of the document itself, too.
Thus, comments should be inserted in rare cases only, e.g. to comment the HTML code itself to explain things that may look odd. Remember that a comment is part of an HTML file, to be transmitted whenever the document is delivered. Therefore, to avoid wasting bandwidth, if you have a long story to tell, put it into a separate document and insert just its URL into a comment.
HTML editors and converters often insert a few comment lines into the beginning of an HTML file. Such indications can be helpful and should not be removed.