Learning HTML 3.2 by Examples, section 3 General remarks on the syntax of HTML:

Character set

The character repertoire available to the author of HTML documents is not fixed exactly but it should, according to specifications, contain the ISO Latin 1 set, also known as ISO 8859-1, since it belongs to the ISO 8859 set of standards. Notice that the encoding of characters may vary, although the default encoding is the one specified in ISO 8859-1, and that encoding is the only one that browsers are required to support. (The HTTP protocol specifies how information about encoding is to be passed along with a document.)

In addition to character repertoire and encoding (of characters by bit combinations), there is a special feature which is fixed in HTML: the interpretation of numerical character escapes of the form &#n; where n is a number. Such an escape is to be interpreted as the character corresponding to n in ISO 10646 and Unicode. In practice, browsers cannot represent all ISO 10646 characters, but the specifications imply that if a browser presents &#n; as a character, it must use the ISO 10646 character. (Unfortunately, browsers often violate this.)

In practise, you should use ISO Latin 1 characters only. Currently or in the near future you can hardly expect general support for extensions to it, although support to some national alphabets may exist nationally. Support for ISO Latin 1 should exist in all browsers, but there are problems even with this. You may of course decide to stick to the ASCII character set, which is a subset of ISO Latin 1, especially if you do not need letters with diacritic marks (or, in general, letters other than English a - z).

The printable characters of ASCII (with code values from 32 to 126 in decimal) are the following:

  ! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ 
The other printable characters of ISO Latin 1 (with code values from 160 to 255 in decimal) are the following:
  ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
Note: The presentation of some characters in copies of this document may be defective e.g. due to lack of font support. Naturally, the appearance of characters varies from one font to another.

If your keyboard or text editor does not allow you to enter ( i.e. to type directly) some ISO Latin 1 characters such as ä or ñ, you can use the character escape conventions.

Some practical warnings to those who create HTML documents on microcomputers:

See also
Date of last update: 2010-12-16.
This page belongs to the free information site IT and communication, section Web authoring and surfing, by Jukka "Yucca" Korpela.