The ISO Latin 1 character repertoire – a description with usage notes

General information about ISO Latin 1

Introduction

This document describes the ISO Latin 1 (ISO 8859-1) characters in detail from a practical point of view, with usage notes. The notes are largely based on the Unicode standard. One of the purposes is to remove common misconceptions and uncertainty as regards to meanings of characters. Too often people base their conceptions on some visible form (glyph) of a character and may therefore identify e.g. German sharp s with Greek letter beta!

For basic concepts and terms on character sets, please refer to A tutorial on character code issues by the same author. See also information about other ISO 8859 character sets.

For the different methods of presenting ISO Latin 1 characters in HTML documents, please consult e.g. Table of Character Entities for ISO Latin-1. For additional historical notes on some of the characters, see Character histories.

See also Unicode charts in PDF format, where the ISO Latin 1 characters appear in blocks Basic Latin (= Ascii) and Latin 1 Supplement.

Code table

The following table summarizes ISO Latin 1, with links to descriptions of characters. Each cell in following table contains a character which acts as a link to a description of the character. If your browser shows links underlined, the presentation probably looks a bit messy; notice that on most browsers, the underlining can be turned off from the browser settings.

(

)

;

[

]

{

}

The upper half (six first rows) of the table contains the printable ASCII characters. Even for this well-established character repertoire, some characters might be different on different devices (in addition to the normal, expectable variation in glyph design).

See the detailed descriptions of the characters below for reliable information about each character.

Detailed information on characters

List of characters in code number order

The following table contains all ISO Latin 1 characters in code position order.

The character names used in this document are the official names as in the original (1987) version of the ISO 8859-1 standard, eventually with the official (primary) Unicode (from version 2.0 onwards) name in parentheses after it, if these names differ. The revised (1998) version of ISO 8859-1 uses the Unicode names. The following minor differences are not mentioned in the list: in Unicode,

all names beginning with CAPITAL LETTER or SMALL LETTER are prefixed with the word LATIN
the ISO 8859-1 names ending with WITH GRAVE ACCENT, WITH ACUTE ACCENT, and WITH CIRCUMFLEX ACCENT lack the word ACCENT

The reference spelling of the names is in small capitals (even for the names of lowercase letters!). However, the names are regarded as case-insensitive.

All ISO 8859-1 characters, in code number order
dec	oct	hex	glyph	official ISO 8859-1 (and Unicode) name
32	40	20		SPACE
33	41	21	!	EXCLAMATION MARK
34	42	22	"	QUOTATION MARK
35	43	23	#	NUMBER SIGN
36	44	24	$	DOLLAR SIGN
37	45	25	%	PERCENT SIGN
38	46	26	&	AMPERSAND
39	47	27	'	APOSTROPHE
40	50	28	(	LEFT PARENTHESIS
41	51	29	)	RIGHT PARENTHESIS
42	52	2A	*	ASTERISK
43	53	2B	+	PLUS SIGN
44	54	2C	,	COMMA
45	55	2D	-	HYPHEN, MINUS SIGN (HYPHEN-MINUS)
46	56	2E	.	FULL STOP
47	57	2F	/	SOLIDUS
48	60	30	0	DIGIT ZERO
49	61	31	1	DIGIT ONE
50	62	32	2	DIGIT TWO
51	63	33	3	DIGIT THREE
52	64	34	4	DIGIT FOUR
53	65	35	5	DIGIT FIVE
54	66	36	6	DIGIT SIX
55	67	37	7	DIGIT SEVEN
56	70	38	8	DIGIT EIGHT
57	71	39	9	DIGIT NINE
58	72	3A	:	COLON
59	73	3B	;	SEMICOLON
60	74	3C	<	LESS-THAN SIGN
61	75	3D	=	EQUALS SIGN
62	76	3E	>	GREATER-THAN SIGN
63	77	3F	?	QUESTION MARK
64	100	40	@	COMMERCIAL AT
65	101	41	A	CAPITAL LETTER A
66	102	42	B	CAPITAL LETTER B
67	103	43	C	CAPITAL LETTER C
68	104	44	D	CAPITAL LETTER D
69	105	45	E	CAPITAL LETTER E
70	106	46	F	CAPITAL LETTER F
71	107	47	G	CAPITAL LETTER G
72	110	48	H	CAPITAL LETTER H
73	111	49	I	CAPITAL LETTER I
74	112	4A	J	CAPITAL LETTER J
75	113	4B	K	CAPITAL LETTER K
76	114	4C	L	CAPITAL LETTER L
77	115	4D	M	CAPITAL LETTER M
78	116	4E	N	CAPITAL LETTER N
79	117	4F	O	CAPITAL LETTER O
80	120	50	P	CAPITAL LETTER P
81	121	51	Q	CAPITAL LETTER Q
82	122	52	R	CAPITAL LETTER R
83	123	53	S	CAPITAL LETTER S
84	124	54	T	CAPITAL LETTER T
85	125	55	U	CAPITAL LETTER U
86	126	56	V	CAPITAL LETTER V
87	127	57	W	CAPITAL LETTER W
88	130	58	X	CAPITAL LETTER X
89	131	59	Y	CAPITAL LETTER Y
90	132	5A	Z	CAPITAL LETTER Z
91	133	5B	[	LEFT SQUARE BRACKET
92	134	5C	\	REVERSE SOLIDUS
93	135	5D	]	RIGHT SQUARE BRACKET
94	136	5E	^	CIRCUMFLEX ACCENT
95	137	5F	_	LOW LINE
96	140	60	`	GRAVE ACCENT
97	141	61	a	SMALL LETTER a
98	142	62	b	SMALL LETTER b
99	143	63	c	SMALL LETTER c
100	144	64	d	SMALL LETTER d
101	145	65	e	SMALL LETTER e
102	146	66	f	SMALL LETTER f
103	147	67	g	SMALL LETTER g
104	150	68	h	SMALL LETTER h
105	151	69	i	SMALL LETTER i
106	152	6A	j	SMALL LETTER j
107	153	6B	k	SMALL LETTER k
108	154	6C	l	SMALL LETTER l
109	155	6D	m	SMALL LETTER m
110	156	6E	n	SMALL LETTER n
111	157	6F	o	SMALL LETTER o
112	160	70	p	SMALL LETTER p
113	161	71	q	SMALL LETTER q
114	162	72	r	SMALL LETTER r
115	163	73	s	SMALL LETTER s
116	164	74	t	SMALL LETTER t
117	165	75	u	SMALL LETTER u
118	166	76	v	SMALL LETTER v
119	167	77	w	SMALL LETTER w
120	170	78	x	SMALL LETTER x
121	171	79	y	SMALL LETTER y
122	172	7A	z	SMALL LETTER z
123	173	7B	{	LEFT CURLY BRACKET
124	174	7C	\|	VERTICAL LINE
125	175	7D	}	RIGHT CURLY BRACKET
126	176	7E	~	TILDE
Code positions here (127 - 159 decimal) are reserved for control characters
160	240	A0		NO-BREAK SPACE
161	241	A1	¡	INVERTED EXCLAMATION MARK
162	242	A2	¢	CENT SIGN
163	243	A3	£	POUND SIGN
164	244	A4	¤	CURRENCY SIGN
165	245	A5	¥	YEN SIGN
166	246	A6	¦	BROKEN BAR
167	247	A7	§	PARAGRAPH SIGN, SECTION SIGN (SECTION SIGN)
168	250	A8	¨	DIAERESIS
169	251	A9	©	COPYRIGHT SIGN
170	252	AA	ª	FEMININE ORDINAL INDICATOR
171	253	AB	«	LEFT ANGLE QUOTATION MARK (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK)
172	254	AC	¬	NOT SIGN
173	255	AD		SOFT HYPHEN
174	256	AE	®	REGISTERED TRADE MARK SIGN (REGISTERED SIGN)
175	257	AF	¯	MACRON
176	260	B0	°	RING ABOVE, DEGREE SIGN (DEGREE SIGN)
177	261	B1	±	PLUS-MINUS SIGN
178	262	B2	²	SUPERSCRIPT TWO
179	263	B3	³	SUPERSCRIPT THREE
180	264	B4	´	ACUTE ACCENT
181	265	B5	µ	MICRO SIGN
182	266	B6	¶	PILCROW SIGN
183	267	B7	·	MIDDLE DOT
184	270	B8	¸	CEDILLA
185	271	B9	¹	SUPERSCRIPT ONE
186	272	BA	º	MASCULINE ORDINAL INDICATOR
187	273	BB	»	RIGHT ANGLE QUOTATION MARK (RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK)
188	274	BC	¼	VULGAR FRACTION ONE QUARTER
189	275	BD	½	VULGAR FRACTION ONE HALF
190	276	BE	¾	VULGAR FRACTION THREE QUARTERS
191	277	BF	¿	INVERTED QUESTION MARK
192	300	C0	À	CAPITAL LETTER A WITH GRAVE ACCENT
193	301	C1	Á	CAPITAL LETTER A WITH ACUTE ACCENT
194	302	C2	Â	CAPITAL LETTER A WITH CIRCUMFLEX ACCENT
195	303	C3	Ã	CAPITAL LETTER A WITH TILDE
196	304	C4	Ä	CAPITAL LETTER A WITH DIAERESIS
197	305	C5	Å	CAPITAL LETTER A WITH RING ABOVE
198	306	C6	Æ	CAPITAL DIPHTHONG A WITH E (LATIN CAPITAL LETTER AE)
199	307	C7	Ç	CAPITAL LETTER C WITH CEDILLA
200	310	C8	È	CAPITAL LETTER E WITH GRAVE ACCENT
201	311	C9	É	CAPITAL LETTER E WITH ACUTE ACCENT
202	312	CA	Ê	CAPITAL LETTER E WITH CIRCUMFLEX ACCENT
203	313	CB	Ë	CAPITAL LETTER E WITH DIAERESIS
204	314	CC	Ì	CAPITAL LETTER I WITH GRAVE ACCENT
205	315	CD	Í	CAPITAL LETTER I WITH ACUTE ACCENT
206	316	CE	Î	CAPITAL LETTER I WITH CIRCUMFLEX ACCENT
207	317	CF	Ï	CAPITAL LETTER I WITH DIAERESIS
208	320	D0	Ð	CAPITAL ICELANDIC LETTER ETH (LATIN CAPITAL LETTER ETH)
209	321	D1	Ñ	CAPITAL LETTER N WITH TILDE
210	322	D2	Ò	CAPITAL LETTER O WITH GRAVE ACCENT
211	323	D3	Ó	CAPITAL LETTER O WITH ACUTE ACCENT
212	324	D4	Ô	CAPITAL LETTER O WITH CIRCUMFLEX ACCENT
213	325	D5	Õ	CAPITAL LETTER O WITH TILDE
214	326	D6	Ö	CAPITAL LETTER O WITH DIAERESIS
215	327	D7	×	MULTIPLICATION SIGN
216	330	D8	Ø	CAPITAL LETTER O WITH OBLIQUE STROKE (LATIN CAPITAL LETTER O WITH STROKE)
217	331	D9	Ù	CAPITAL LETTER U WITH GRAVE ACCENT
218	332	DA	Ú	CAPITAL LETTER U WITH ACUTE ACCENT
219	333	DB	Û	CAPITAL LETTER U WITH CIRCUMFLEX ACCENT
220	334	DC	Ü	CAPITAL LETTER U WITH DIAERESIS
221	335	DD	Ý	CAPITAL LETTER Y WITH ACUTE ACCENT
222	336	DE	Þ	CAPITAL ICELANDIC LETTER THORN (LATIN CAPITAL LETTER THORN)
223	337	DF	ß	SMALL GERMAN LETTER SHARP s (LATIN SMALL LETTER SHARP S)
224	340	E0	à	SMALL LETTER a WITH GRAVE ACCENT
225	341	E1	á	SMALL LETTER a WITH ACUTE ACCENT
226	342	E2	â	SMALL LETTER a WITH CIRCUMFLEX ACCENT
227	343	E3	ã	SMALL LETTER a WITH TILDE
228	344	E4	ä	SMALL LETTER a WITH DIAERESIS
229	345	E5	å	SMALL LETTER a WITH RING ABOVE
230	346	E6	æ	SMALL DIPHTHONG a WITH e (LATIN SMALL LETTER AE)
231	347	E7	ç	SMALL LETTER c WITH CEDILLA
232	350	E8	è	SMALL LETTER e WITH GRAVE ACCENT
233	351	E9	é	SMALL LETTER e WITH ACUTE ACCENT
234	352	EA	ê	SMALL LETTER e WITH CIRCUMFLEX ACCENT
235	353	EB	ë	SMALL LETTER e WITH DIAERESIS
236	354	EC	ì	SMALL LETTER i WITH GRAVE ACCENT
237	355	ED	í	SMALL LETTER i WITH ACUTE ACCENT
238	356	EE	î	SMALL LETTER i WITH CIRCUMFLEX ACCENT
239	357	EF	ï	SMALL LETTER i WITH DIAERESIS
240	360	F0	ð	SMALL ICELANDIC LETTER ETH (LATIN SMALL LETTER ETH)
241	361	F1	ñ	SMALL LETTER n WITH TILDE
242	362	F2	ò	SMALL LETTER o WITH GRAVE ACCENT
243	363	F3	ó	SMALL LETTER o WITH ACUTE ACCENT
244	364	F4	ô	SMALL LETTER o WITH CIRCUMFLEX ACCENT
245	365	F5	õ	SMALL LETTER o WITH TILDE
246	366	F6	ö	SMALL LETTER o WITH DIAERESIS
247	367	F7	÷	DIVISION SIGN
248	370	F8	ø	SMALL LETTER o WITH OBLIQUE STROKE (LATIN SMALL LETTER O WITH STROKE)
249	371	F9	ù	SMALL LETTER u WITH GRAVE ACCENT
250	372	FA	ú	SMALL LETTER u WITH ACUTE ACCENT
251	373	FB	û	SMALL LETTER u WITH CIRCUMFLEX ACCENT
252	374	FC	ü	SMALL LETTER u WITH DIAERESIS
253	375	FD	ý	SMALL LETTER y WITH ACUTE ACCENT
254	376	FE	þ	SMALL ICELANDIC LETTER THORN (LATIN SMALL LETTER THORN)
255	377	FF	ÿ	SMALL LETTER y WITH DIAERESIS

Detailed descriptions of the characters

Legend: The entries consist of

A (large) glyph for the character. The appearance naturally depends on the font used by your browser.
The name of the character, as defined in the ISO 8859-1 standard itself. In some cases, the Unicode name is given too, in parentheses. Not all name differences between ISO 8859-1 and Unicode standards are given; see notes on character names used in this document. Quite often various jargon names as used for the characters, especially for those in the "lower half" (ASCII range); see entry ASCII in The Jargon Lexicon.
The U+nnnn notation for the character. Here that notation also acts as a link to an entry in Indrek Hein's online character database (which has its own legend). Note that from this notation you can see code position in hexadecimal.
A numeric character reference for use in the HTML language. (For "symbolic" entity references, like © for the copyright sign, see Table of Character Entities for ISO Latin-1.) This also gives the code position in decimal (normal base 10 notation), which is needed e.g. if you use "Alt-0nnn" for typing characters on Windows.
The code position in octal. This is needed e.g. if you use "C-Q-nnn" for typing characters in Emacs
A link to this legend.

Descriptions of characters

	SPACE	`U+0020`	` `	octal: 40

This is the well-known space character, or blank. The abbreviation SP is often used for the name of the character. The ISO 8859-1 standard defines this character formally as follows:

This character may be interpreted as a graphic character, a control character or as both. As a graphic character it has the visual representation consisting of the absence of a graphic symbol.

In different programs for processing and displaying texts, spaces in data may be handled in different ways. In particular, the inter-word gaps can be of different widths in visual presentation. In the HTML language, spaces are treated as "collapsible".

!	EXCLAMATION MARK	`U+0021`	`!`	octal: 41

This character is basically used as a punctuation character at the end of an exclamation. It is also used in mathematics to denote a factorial (as in "5!" which denotes 1×2×3×4×5). Many other special usages exist; e.g. in the C programming language, the exclamation mark denotes a "not" operator (negation)!

Other names (mentioned in the Unicode standard): factorial, bang.

Cf. to inverted exclamation mark (¡).

This character is also used as a substitute for a similar-looking character, latin letter retroflex click (U+01C3) used in the orthography of some African languages, to denote a click sound, e.g. in the name "!Kung" (denoting a people in southern Africa). In principle the two characters are distinct, despite similarity in glyph appearance.

"	QUOTATION MARK	`U+0022`	`"`	octal: 42

This punctuation character is a "symmetric" quotation mark as opposite to "smart" or "asymmetric" quotation marks. That is, when this character is used to mark quotations, the opening quote is identical with the closing quote. Its glyph should be "neutral" (vertical) to reflect this. (The Unicode standard says about the quotation mark: "neutral (vertical), used as opening or closing quotation mark".) However, the appearance varies. It is sometimes difficult to find out what really happens, since text processing programs (word processors) like MS Word typically convert a quotation mark to a different character, often to a language-specific quotation mark, perhaps to a "smart" (curved) quotation mark in English text, a guillemet (« or ») in French text, etc. Entering the ISO Latin 1 quotation mark (ASCII quotation mark) can then be difficult; you might need to use some special "Insert Symbol" function. But you should take that path if your text really contains the ISO Latin 1 quotation mark, e.g. if your text discusses C or JavaScript code or Unix commands where the that very character needs to be used. Using a "smart" (curved) quotation mark wouldn't be smart at all in such cases.

In Unicode, there are several pairs of asymmetric quotation marks, but of them, only the double angle quotation marks « and » belong to ISO Latin 1. Notice in particular that left and right double quotation marks (U+201C, U+201D) do not belong to ISO Latin 1 (although they belong to the so-called Windows character set).

The rules for using quotation marks vary greatly from one language to another and even within a language. But when ISO Latin 1 is used, there are not many choices: you have to live with " and ' and « and ». It is much better to use these characters for quotations even if they are regarded as typographically inferior than to try to "construct" smart quotes from characters which are not quotes. See general reasons for being strict about meanings of characters. For example, section Quotation Marks in NASA SP-7084 should be read with caution in this respect. Also please notice that even in English there are also styles different from the one described there; for example, single quotes (to be presented using apostrophes in ISO Latin 1) might be used as normal quotes and quotation marks as inner quotes.

The Unicode standard explicitly says that APL quote is identical with the quotation mark. In addition to that, the quotation mark is used in many other programming and command languages, typically to delimit string constants. In some of such languages, a string can be delimited using either quotation marks or apostrophes with no change in meaning, whereas in some others there is a definite difference. For example, in the C language, quotation marks delimit string constants whereas apostrophes delimit character constants; in Perl, quotation marks allow variable substitution within the string whereas apostrophes indicate a pure literal.

In practice, the quotation mark is also widely used as the following symbols, although they are in principle distinct from it (and each other) in Unicode:

double prime (U+2033), which is used to denote seconds (when expressing times or angles) and inches
ditto mark (U+3003)
modifier letter double prime (U+02BA), used e.g. to transliterate Cyrillic "hard sign" (tverdyj znak).

In ASCII, the quotation mark was intended to have secondary usage as diaeresis. See notes on diacritics.

#	NUMBER SIGN	`U+0023`	`#`	octal: 43

In English and some other natural languages, this character is sometimes used in conjunction with ordinal numbers, as in "item #42" (meaning "item number 42"). Such usage is not very common; more often, abbreviations like "nr.", "no.", "n.", or "Nº" are used instead.

In programming languages, markup languages, etc., this character has many different uses. In some of these uses, # relates to ordinal numbers (e.g. in HTML, &#n; denotes the character which occupies code position n in Unicode) while in others it might be just a separator or have some special meaning assigned to it more or less arbitrarily. It is used e.g. in Web addresses, URL references, and the URL syntax specification calls it "crosshatch" character. That name is also mentioned in the Unicode standard, along with the following names: pound sign, hash, octothorpe. For more information on the names as well as usage, see entry Number sign in encyclopedia.laborlawtalk.com.

The number sign has also been used as a surrogate for music sharp sign (U+266F), due to some similarity in appearance.

The number sign character unambiguously occupies code position 23 hexadecimal in ISO 8859-1 and in Unicode, although the Unicode standard confusingly mentions "pound sign" as an alternative name to it. Here the word "pound" means a unit of weight (pound avoirdupois, usually abbreviated "lb"), not a currency unit. However, in ASCII that code position was primarily assigned to the pound sterling sign, and some programs and devices might reflect this in their behavior (displaying £ when the data contains #). The ASCII standard said:

The symbol £ is assigned to position 23 [hexadecimal] - -. In a situation where there is no requirement for the symbol £ the symbol # (number sign) may be used in position 23. - - The chosen allocation of [a symbol to this position] for international information exchange shall be agreed between the interested parties.

Notice that the pound sign (as a currency symbol) belongs to ISO Latin 1 as a completely independent symbol in its own code position.

For notes on different names and usages for the number sign, see section names of "&", "@", and "#" in the alt.usage.english FAQ.

$	DOLLAR SIGN	`U+0024`	`$`	octal: 44

This character is a famous currency symbol, but its exact meaning is not quite clear. The Unicode standard explicitly says:

this code is unambiguously dollar sign, not "currency" sign or any other currency symbol

But this is obviously to be interpreted mainly as a warning against the use of the sign to denote a currency generically; cf. to (general) currency sign, which belongs to ISO Latin 1 as a completely independent symbol in its own code position; see also notes on the dollar sign in Character histories. It is not intended to limit the use to denote only those currencies which are named "dollar", still less US dollar only. In fact, the English word "dollar" has a rather general meaning, covering "taler" as well as numerous coins patterned after the taler (as a Spanish peso). The Unicode standard mentions "milreis" and "escudo" as alternative names for dollar sign, so obviously the symbol can be used to denote those currencies, too.

For historical notes on the origin of the $ character itself, see section Origin of the dollar sign in the alt.usage.english FAQ.

In computing, this character has secondary uses which may have nothing to do with any currency. It can, for example, be a character allowed in identifiers and used to signal a reserved or otherwise special identifier.

According to the Unicode standard, a glyph for the dollar sign may have one or two vertical bars.

The dollar sign unambiguously occupies code position 24 hexadecimal in ISO 8859-1 and in Unicode. However, in the ASCII the situation was more vague, and some programs and devices might reflect that in their behavior (e.g. displaying ¤ when the data contains $). The ASCII standard says:

The - - symbol $ is assigned to position 24 [hexadecimal] - - Where there is no requirement for the symbol $ the symbol ¤ (currency sign) may be used in position 24. The chosen allocation of [a symbol to this position] for international information exchange shall be agreed between the interested parties.

%	PERCENT SIGN	`U+0025`	`%`	octal: 45

This character is basically used after numbers, in the meaning 'in the hundred' or 'of each hundred'. It is commonly used immediately after a number (e.g., 50%), but quite often the official spelling requires a space (e.g., 50 %), though this depends on authority. For the historical origin, see notes on the origin of the percent sign in The History of Mathematical Symbols by Douglas Weaver.

For use with SI units, see Guide for the Use of the International System of Units (SI), section 7.10.2.

In programming languages, for example, the percent sign has very different uses which have nothing to do with percentages, e.g. as a modulus operator in C or as indicating an identifier as a hash in Perl.

Per mille sign (U+2030) and per ten thousand sign (U+2031) do not belong to ISO Latin 1. For the latter, misconceptions about this may have arisen from confusion with the so-called Windows character set.

&	AMPERSAND	`U+0026`	`&`	octal: 46

In natural languages, this character normally means just 'and'. In other contexts, it has many other uses. The visual appearance of this character varies a lot; see Adobe's page The ampersand.

'	APOSTROPHE	`U+0027`	`'`	octal: 47

This character has mixed usage, usually as a punctuation character. Most commonly, it is used either as an apostrophe as in English "don't" or as a single quote. (In Unicode version 1.0, this character was named "apostrophe-quote" to reflect this.) As regards to use as a single quote, see notes on the use of the quotation mark.

According to Unicode, this character has "neutral (vertical)" glyph, but in practice it may get displayed as curved. It is sometimes difficult to find out what really happens, since text processing programs (word processors) like MS Word typically convert an apostrophe to a different character, often to a language-specific quotation mark. Entering the ISO Latin 1 apostrophe (ASCII apostrophe) can then be difficult; you might need to use some special "Insert Symbol" function. But you should take that path if your text really contains the ISO Latin 1 apostrophe, e.g. if your text discusses C or JavaScript code or Unix commands where the that very character needs to be used. Using a "smart" (curved) single quote wouldn't be smart at all in such cases.

In the future, as support to Unicode becomes wider, the use of this character should mostly be replaced by the use of more specific characters.

Version 2.0 of the Unicode standard said that "the preferred character for apostrophe" is the character modifier letter apostrophe (U+02BC); but this was changed in version 2.1 to the following:

U+02BC modifier letter apostrophe is preferred where the character is to represent a modifier letter (for example, in transliterations to indicate a glottal stop). In the latter case, it is also referred to as a letter apostrophe.

U+2019 right single quotation mark is preferred where the character is to represent a punctuation mark, as in "We've been here before." In the latter case, U+2019 is also referred to as a punctuation apostrophe.

The Unicode standard also discusses, in chapter 6, Punctuation, the use of quotation marks in different languages, implying that the preferred characters for opening and closing single quotation mark as used in English are left single quotation mark (U+2018) and right single quotation mark (U+2019).

The rules for using the apostrophe vary from one language to another, and even from one authority to another. For a good summary of one usage style in English, see section Apostrophe in NASA SP-7084.

Unicode defines modifier letter prime (U+02B9) and prime (U+2032) as distinct characters. The former is used mainly in linguistics to denote primary stress or palatalization (e.g. when transliterating Cyrillic soft sign). The latter is used to denote minutes or feet. When only ISO Latin 1 character repertoire is available, apostrophe can be used as a surrogate for those characters. It might look natural to use acute accent for some of such purposes, but since the whole idea is to use a replacement due to character repertoire restrictions, it is best to use a replacement that works most widely (due to being an ASCII character).

In ASCII, the apostrophe was intended to have secondary usage as acute accent. See notes on diacritics. See also notes on the apostrophe in Character histories.

(	LEFT PARENTHESIS	`U+0028`	`(`	octal: 50

This punctuation character is used as an opening delimiter for parenthetic remarks in natural languages. The rules for using such vary from one language to another, and even from one authority to another. For a good summary of one usage style in English, see section Parentheses in NASA SP-7084.

In other languages, there are various uses such as an opening delimiter for a list of parameters. Called opening parenthesis in Unicode version 1.0.

)	RIGHT PARENTHESIS	`U+0029`	`)`	octal: 51

Used as a closing delimiter for parenthetic remarks opened with left parenthesis ()) in natural languages. In other languages, there are various uses such as an closing delimiter for a list of parameters. Called closing parenthesis in Unicode version 1.0.

*	ASTERISK	`U+002A`	`*`	octal: 52

The asterisk has various uses, including the following:

In natural languages, an asterisk or a sequence of asterisks is sometimes used as a reference to a footnote or a margin note*. Several other symbols, such as daggers and (superscript-style) digits and letters are used for such purposes too. Due to glyph problems discussed below, it is probably best to avoid the use of asterisks for such purposes and use some other notations.
*The footnote or margin note itself begins with the asterisk or sequence of asterisks.
The asterisk is sometimes used when indicating the year or date of birth, ye.g. * 1952.
Especially in command languages, the asterisk is often used as a wildcard character which matches any string of characters. For example, *.txt as a command argument might refer to all file names ending with .txt.
In regular expressions, the asterisk often denotes possible repetition. For example, depending on the particular regexp syntax, ab* might denote the set of strings consisting of an a followed by any number (incl. zero) of b's, i.e. a, ab, abb, abbb etc.
In mathematics, the asterisk has several uses as an operator symbol of some kind. Generally such uses are surrogate notations for various star-like symbols with more specific semantics. Often ** indicates exponentiation.
In linguistics, a leading asterisk before a word can be used to indicate a reconstructed form (e.g.: the word king probably derives from old Germanic *kuningaz); it may also indicate an ungrammatical sentence.
In Usenet postings and some other contexts, the asterisk might occasionally be used for *emphasis* (though using _underlines_ is more common).
One of the early uses was to make a series of asterisks a "Check protector", to flank the amount of a check so one could not kite or change the value. That method was applied in punch cards and printers too, and it's still often used e.g. in password input, to help the user count characters but protect the password from prying eyes.
The asterisk is sometimes used to indicate a "masked out" character, as in "G*d".
In several programming languages, asterisk is the multiplication symbol, but it may also have other uses. For example, int *p; declares p as a pointer to int in C, in addition to use as a multiplying operator and other uses.

When writing or quoting expressions in programming, command or other languages which have the asterisk as part of language syntax, the asterisk shall be preserved of course. On the other hand, such usage should not be extended to other contexts, unless the limitations of the character repertoire prevent the use of better symbols. Specifically, in ISO Latin 1 there is a separate multiplication sign, and in some contexts the middle dot (·) is an adequate multiplication symbol.

The glyphs for the asterisk vary, but generally it appears in a more or less superscript style, perhaps in a rather small size. And it is difficult to say what an asterisk should look like, given its mixed usage. When used as an operator of some kind, it should be vertically positioned the same way as e.g. the plus sign. When used as a reference sign, and perhaps in some other uses too, it should appear in superscript style. It seems that most font designs reflect the latter style, making expressions like a*b look somewhat odd. If you cannot use a symbol with less ambiguous meaning, you might try to help things by using a font where the asterisk looks more operator-like, such as the Courier font, though even the Courier * is somewhat raised. Quite often it might be better to use a monospace font for all expressions (like a*b) quoted from programming, command etc. languages.

The name is sometimes misspelled as "asterick" or (intentionally) confused with the name Asterix (Astérix).

The Unicode standard mentions that asterisk is called "star" on phone keypads. It also mentions that the asterisk is distinct from arabic five pointed star (U+066D), asterisk operator (U+2217), and heavy asterisk (U+2731). Note that this list of Unicode characters resembling the asterisk in appearance is far from complete; see e.g. the Dingbats.

+	PLUS SIGN	`U+002B`	`+`	octal: 53

The well-known plus sign, primarily used to denote addition and as a unary plus. Notice in ISO Latin 1 the combination of plus and minus is available as a separate character, plus-minus sign (±).

,	COMMA	`U+002C`	`,`	octal: 54

Primarily this character is a punctuation symbol in natural languages. The rules for using it vary from one language to another, and even from one authority to another. For a good summary of one usage style in English, see section Comma in NASA SP-7084.

Notice that in numbers, some languages (mainly English) use comma as thousands separator (e.g. "1,234" means one thousand two hundred thirty-four) whereas in many other languages it is used as a decimal point (e.g. "1,234" means the same as "1.234" in English). The Unicode standard mentions "decimal separator" as another name for the comma.

In ASCII, the comma was intended to have secondary usage as cedilla. See notes on diacritics.

The comma should not be confused with the Unicode character single low-9 quotation mark (U+201A), which is used in quotations in some usages.

-	HYPHEN, MINUS SIGN (HYPHEN-MINUS)	`U+002D`	`-`	octal: 55

This character is a dual-purpose character: it can be used as a hyphen (punctuation character) or as a minus sign (mathematical symbol). It can usually be called "hyphen" or "minus" depending on the context, but when referred to as a character in a character repertoire, the best term is probably hyphen-minus.

The rules for using the hyphen vary from one language to another, and even from one authority to another. For a good summary of one usage style in English, see section Hyphen in NASA SP-7084.

Hyphens are also widely used as a replacement for various dashes when dashes themselves are not in the available character repertoire; see my document On the use of some MS Windows characters in HTML for the use of hyphens as a replacement for em dash and en dash.

The Unicode standard mentions "hyphen or minus sign" as a synonyms for this character. An older version also mentioned "hyphus". It is best to avoid these synonyms, since the former makes statements ambiguous and the latter is just an invented word which is hardly ever used in reality.

In situations where sufficient support to Unicode can be safely assumed (very rarely at present!), it is best to replace the use of hyphen-minus by Unicode hyphen (U+2010) or non-breaking hyphen (U+2011) or minus sign (U+2212) or, if hyphen-minus had been used e.g. in place of a dash symbol, some other Unicode character such as en dash (U+2013) or em dash (U+2014) or horizontal bar (U+2015). More information: Hyphens and dashes.

Cf. to soft hyphen.

.	FULL STOP	`U+002E`	`.`	octal: 56

This character is probably better known under the name "period" (which was the name used for it in Unicode version 1.0) and is commonly used as a punctuation character but also for other purposes. The Unicode standard mentions the alternative names "dot" and "decimal point" too.

The Unicode standard uses (in section 3.3) this character to illustrate that "a character may have a broader range of use than the most literal interpretation of its name might indicate". It says:

U+002E full stop can represent a sentence period, an abbreviation period, a decimal number separator in English, a thousands number separator in German, and so on.

In addition to such usage, note that programming languages and other notations may use the full stop for purposes that do not correspond to natural-language punctuation (or the name "full stop"!) at all. In particular, it is often used as a separator between components of a hierarchic name, so that foo.bar could denote the bar component of a structure named foo (which might be read as "foo's bar").

The rules for using the period vary from one language to another, and even from one authority to another. For a good summary of one usage style in English, see section Period in NASA SP-7084.

The Unicode standard mentions that this character "may be rendered as a raised decimal point in old style numbers". This is probably to be taken as a warning against interpreting such a character as a middle dot (·).

There is a separate character horizontal ellipsis (U+2026) in Unicode, but it does not belong to ISO Latin 1. One may therefore wish to use three full stops (...) instead as three points of ellipsis. You might consider using no-break spaces between the full stops (. . .) in such cases; for HTML documents, it might be better to use CSS to suggest increased spacing between characters.

/	SOLIDUS	`U+002F`	`/`	octal: 57

This character is much more widely known as "slash" (which was its name in Unicode version 1.0). It is sometimes called "virgule" or even "shilling" (which are alternative names mentioned in the Unicode standard) or "diagonal". Do not confuse it with the reverse solidus (backslash, "\").

Solidus is used for many different purposes, typically as a separator of some kind. Ambiguities easily arise. For example, a date notation like 3/4 might mean the 3rd of April - or the 4th of March; in ISO 8601 notation, the solidus is used when expressing a time interval (e.g. 1998-03-04/04-03 unambiguously means 'from 4th of March to 3rd of April in 1998'). Sometimes the solidus separates alternatives, e.g. in a fill-out form, with the suggestion to strike out the inapplicable alternative(s). In natural languages, the solidus is often used in a very confusing way, so that "foo/bar" might mean "foo or bar", "foo alias bar", or "foo and bar", or something else. In HTML (and in other SGML based markup languages), start and end tags are distinguished from each other by the presence of a solidus in the end tag, so that e.g. </cite> means 'end of cite element'. In natural languages, it seems to be fashionable to use it instead of the word "or", perhaps because the solidus symbol is less definite. In Web addresses and other URLs, the solidus is a separator between hierarchic components; this usage is historically based on similar usage in pathnames in hierarchic file systems.

Unicode defines fraction slash (U+2044) and division slash (U+2215) as characters distinct from solidus and from each other. (Notice that rules for using the solidus in various languages do not yet make this distinction. See e.g. section Slash in NASA SP-7084.) When only ISO Latin 1 character repertoire is available, solidus can be used as a surrogate for fraction slash. For division slash, the division sign is perhaps preferable.

Notice that for three commonly used fractions there are separate "vulgar fraction" characters in ISO Latin 1.

0	DIGIT ZERO	`U+0030`	`0`	octal: 60

A digit. Definitely distinct from the letter O.

1	DIGIT ONE	`U+0031`	`1`	octal: 61

A digit. Definitely distinct from the letter l (el). Cf. to superscript one (¹).

2	DIGIT TWO	`U+0032`	`2`	octal: 62

A digit. Cf. to superscript two (²).

3	DIGIT THREE	`U+0033`	`3`	octal: 63

A digit. Cf. to superscript three (³).

4	DIGIT FOUR	`U+0034`	`4`	octal: 64

A digit.

5	DIGIT FIVE	`U+0035`	`5`	octal: 65

A digit.

6	DIGIT SIX	`U+0036`	`6`	octal: 66

A digit.

7	DIGIT SEVEN	`U+0037`	`7`	octal: 67

A digit.

8	DIGIT EIGHT	`U+0038`	`8`	octal: 70

A digit.

9	DIGIT NINE	`U+0039`	`9`	octal: 71

A digit.

:	COLON	`U+003A`	`:`	octal: 72

This character is used as a punctuation symbol in natural and other languages. The rules for using it vary from one language to another, and even from one authority to another. For a good summary of one usage style in English, see section Colon in NASA SP-7084.

The colon is also used when presenting ratios (proportions) as in "2:3", but in Unicode the character ratio (U+2236) should be used instead in such contexts. For the history, see notes on symbols for ratio and proportion in The History of Mathematical Symbols by Douglas Weaver.

;	SEMICOLON	`U+003B`	`;`	octal: 73

<	LESS-THAN SIGN	`U+003C`	`<`	octal: 74

This character basically denotes a mathematical relation. It is used for some secondary purposes as well, such as an angle bracket. See also notes on using < and > as brackets.

ISO Latin 1 does not contain a less-than or equal to (U+2264) character as Unicode does. The usual workaround is to use the character pair <= (less-than sign followed by equals sign).

=	EQUALS SIGN	`U+003D`	`=`	octal: 75

This character is used to denote equality both in mathematics (as in 2+2=4) and in other areas. It is distinct from the Unicode character identical to (U+2261).

>	GREATER-THAN SIGN	`U+003E`	`>`	octal: 76

This character basically denotes a mathematical relation. It is used for some secondary purposes as well, such as an angle bracket. See also notes on using < and > as brackets.

ISO Latin 1 does not contain a greater-than or equal to (U+2265) character as Unicode does. The usual workaround is to use the character pair >= (greater-than sign followed by equals sign).

?	QUESTION MARK	`U+003F`	`?`	octal: 77

This character is basically used as a punctuation character at the end of a direct question. The rules for using it vary from one language to another, and even from one authority to another. For a good summary of one usage style in English, see section Question Mark in NASA SP-7084.

In some languages, some space is left before the question mark. In formal notations such as regular expressions, the question mark has special meanings. It could, for example, be a wildcard character that represents any single character.

Cf. to inverted question mark (¿).

@	COMMERCIAL AT	`U+0040`	`@`	octal: 100

In English, this character was originally used in conjunction with unit prices in the meaning 'each'. Its name still reflects such usage, which is relatively rare. The origin and the most original use of this character is debated.

This character has become most widely known as a separator in Internet E-mail addresses, where it can be read as "at" rather naturally, as in jukkakk@gmail.com. It has many other special uses, too, for example in Perl to indicate that a symbol denotes an array.

A large number of names are used in different languages when referring to this character, quite often using words which try the describe the visual appearance or connotations. See A Natural History of the @ Sign by Scott Herron. The page also contains names for the character in different languages, but they are mostly jargon names or just incorrectly recorded.

In several national variants of ASCII, there is some other character in the code position of this character.

A	CAPITAL LETTER A	`U+0041`	`A`	octal: 101

A basic Latin letter.

B	CAPITAL LETTER B	`U+0042`	`B`	octal: 102

A basic Latin letter. As regards to using B in place of script capital b (U+212C) (denoting Bernoulli function), see notes on letterlike symbols.

C	CAPITAL LETTER C	`U+0043`	`C`	octal: 103

A basic Latin letter. Notice that the copyright sign (©), appearing as C in a circle, is a separate symbol. As regards to using C e.g. in place of double-struck capital c (U+2102), denoting the set of complex numbers), see notes on letterlike symbols.

D	CAPITAL LETTER D	`U+0044`	`D`	octal: 104

A basic Latin letter.

E	CAPITAL LETTER E	`U+0045`	`E`	octal: 105

A basic Latin letter. As regards to using E e.g. in place of script capital e (U+2130), denoting electro-magnetic force, see notes on letterlike symbols.

F	CAPITAL LETTER F	`U+0046`	`F`	octal: 106

A basic Latin letter. As regards to using F e.g. in place of script capital f (U+2131), denoting Fourier transform, see notes on letterlike symbols.

G	CAPITAL LETTER G	`U+0047`	`G`	octal: 107

A basic Latin letter.

H	CAPITAL LETTER H	`U+0048`	`H`	octal: 110

A basic Latin letter. As regards to using H e.g. in place of script capital h (U+210B), denoting Hamiltonian function, see notes on letterlike symbols.

I	CAPITAL LETTER I	`U+0049`	`I`	octal: 111

A basic Latin letter. As regards to using I e.g. in place of black letter capital i (U+2111), denoting imaginary part, see notes on letterlike symbols.

J	CAPITAL LETTER J	`U+004A`	`J`	octal: 112

A basic Latin letter.

K	CAPITAL LETTER K	`U+004B`	`K`	octal: 113

A basic Latin letter. Also used to denote the temperature unit kelvin. (Although Unicode also has character kelvin sign (U+212A), it is just a compatibility character which is compatibility equivalent to letter K.)

L	CAPITAL LETTER L	`U+004C`	`L`	octal: 114

A basic Latin letter. It can be used, among other things, to denote "litre"; the primary symbol for "litre" is small letter l, but since it is easily confused with digit 1 in many fonts, a capital L is often used instead. Notice that the pound sign (£), historically a variant of L, is a separate symbol. As regards to using L in place of script capital l (U+2112), denoting Laplace function, see notes on letterlike symbols.

M	CAPITAL LETTER M	`U+004D`	`M`	octal: 115

A basic Latin letter. As regards to using M in place of script capital m (U+2133), denoting M-matrix, see notes on letterlike symbols.

N	CAPITAL LETTER N	`U+004E`	`N`	octal: 116

A basic Latin letter. As regards to using N e.g. in place of double-struck capital n (U+2115), denoting the set of natural numbers, see notes on letterlike symbols.

O	CAPITAL LETTER O	`U+004F`	`O`	octal: 117

A basic Latin letter. Naturally, this character is distinct from the digit zero (0).

P	CAPITAL LETTER P	`U+0050`	`P`	octal: 120

A basic Latin letter. Notice that the sound recording copyright symbol (U+2117), appearing as P in a circle, does not belong to ISO Latin 1. As regards to using P e.g. in place of script capital p (U+2118), denoting e.g. power set, see notes on letterlike symbols.

Q	CAPITAL LETTER Q	`U+0051`	`Q`	octal: 121

A basic Latin letter. As regards to using Q e.g. in place of double-struck capital q (U+211A), denoting the set of rational numbers, see notes on letterlike symbols.

R	CAPITAL LETTER R	`U+0052`	`R`	octal: 122

A basic Latin letter. Notice that the registered sign (®) appearing as R in a circle, is a separate symbol. As regards to using R e.g. in place of black letter capital r (U+211C), denoting real part, see notes on letterlike symbols.

S	CAPITAL LETTER S	`U+0053`	`S`	octal: 123

A basic Latin letter.

T	CAPITAL LETTER T	`U+0054`	`T`	octal: 124

A basic Latin letter.

U	CAPITAL LETTER U	`U+0055`	`U`	octal: 125

A basic Latin letter.

V	CAPITAL LETTER V	`U+0056`	`V`	octal: 126

A basic Latin letter.

W	CAPITAL LETTER W	`U+0057`	`W`	octal: 127

A basic Latin letter.

X	CAPITAL LETTER X	`U+0058`	`X`	octal: 130

A basic Latin letter.

Y	CAPITAL LETTER Y	`U+0059`	`Y`	octal: 131

A basic Latin letter.

Z	CAPITAL LETTER Z	`U+005A`	`Z`	octal: 132

A basic Latin letter. As regards to using Z e.g. in place of double-struck capital z (U+2124), denoting the set of integers, see notes on letterlike symbols.

[	LEFT SQUARE BRACKET	`U+005B`	`[`	octal: 133

This character is sometimes used as an opening delimiter for parenthetic remarks of some special kind in natural languages, especially when such remarks are nested or they present editorial insertions, corrections, and comments in quoted material and in reference citations. In other languages, there are various uses such as an opening delimiter for an array subscript list. Called opening square bracket in Unicode version 1.0.

In several national variants of ASCII, there is some other character in the code position of this character.

\	REVERSE SOLIDUS	`U+005C`	`\`	octal: 134

This character has various uses in technical contexts, e.g. as a separator in hierarchical file names on Windows and in several "escape notations". The reverse solidus was taken into chararacter repertoires for special usage. Later the use expanded to many technical areas. Note that the reverse solidus is especially suitable for use in "escape notations" just because it is, in a sense, an artificial creation: since it is not used in normal text, it will less likely be confused with normal data characters than other characters used for "escaping". However, confusion may still arise when different notational systems that use the reverse solidus (for different purposes) are combined; see e.g. notes on the reverse solidus in Web authoring.

In Unicode, the reverse solidus is regarded as distinct from set minus (U+2216), which is used in mathematics as an operator on sets (meaning set difference), but conceivably \ can be used as a surrogate for that character.

Called backslash in Unicode version 1.0 and very widely in actual practice.

Rather often the reverse solidus is confused with the solidus (slash) character "/". They are similar in shape, just slanted differently. But they are quite distinct characters and have different uses.

In several national variants of ASCII, there is some other character in the code position of this character.

]	RIGHT SQUARE BRACKET	`U+005D`	`]`	octal: 135

This character is sometimes used as a closing delimiter for special parenthetic remarks in natural languages. In other languages, there are various uses such as an closing delimiter for an array subscript list. Called closing square bracket in Unicode version 1.0.

In several national variants of ASCII, there is some other character in the code position of this character.

^	CIRCUMFLEX ACCENT	`U+005E`	`^`	octal: 136

This character is a spacing character which basically represents a diacritic mark. As such, it has little use: it can be used in order to mention the diacritic. Due to its spacing nature, it cannot be used to construct a character with a circumflex accent (such as â). However, a keyboard key labeled with the accent might work as a "composition key" which can used to type characters with the corresponding diacritic.

In practice, circumflex accent is used for a variety of technical purposes e.g. in programming and command languages. It might, for example, be used as an exponentiation operator.

In ASCII, this character had the primary name "upward arrow head", and "circumflex accent" is used there as secondary name only. See notes on diacritics and notes on the circumflex in Character histories.

Called spacing circumflex in Unicode version 1.0.

In several national variants of ASCII, there is some other character in the code position of this character.

_	LOW LINE	`U+005F`	`_`	octal: 137

Probably the most typical use of this character is to make long identifiers more readable in programming languages. Due to their general syntax, such languages generally do not allow spaces in identifiers; but several programming languages allow underscores in identifiers. For example, one could write number_of_events in such languages.

In plain text, e.g. in Usenet discussions, it is customary to use a low line before and after a word or phrase to denote emphasis (e.g. "this is _very_ important") due to lack of better methods.

Called spacing underscore in Unicode version 1.0. The most usual name in practice is probably just "underscore".

This is a spacing character, so it cannot be used to underline text (except through specific processing which goes beyond simple text presentation). See however notes on the low line in Character histories.

`	GRAVE ACCENT	`U+0060`	```	octal: 140

This character is a spacing character which basically represents a diacritic mark. As such, it has little use: it can be used in order to mention the diacritic. Due to its spacing nature, it cannot be used to construct a character with a grave accent accent (such as à). However, a keyboard key labeled with the accent might work as a "composition key" which can used to type characters with the corresponding diacritic.

Sometimes the grave accent is used as a single quote, especially to create the appearance of "smart" (asymmetric) quotes, using the grave accent instead of an opening single quote and either the apostrophe or (less often) the acute accent instead of a closing single quote, as in `this' or `this´. (This is why the grave accent is often called "backquote" in jargon.) Such usage is definitely incorrect nowadays, although it reflects an old idea of dual use for the character; see notes on the grave accent in Character histories. See ASCII and Unicode Quotation Marks by Markus Kuhn. In ISO Latin 1, the apostrophe is the only adequate surrogate for a single quote.

In different notation systems, the grave accent may have various technical uses which have nothing to do with accents. For example, in many Unix shells, the grave accent is a quoting character with a special meaning, "command substitution" (sometimes even called "grave command"!).

In several national variants of ASCII, there is some other character in the code position of this character.

a	SMALL LETTER A	`U+0061`	`a`	octal: 141

A basic Latin letter. Cf. to feminine ordinal indicator (ª).

b	SMALL LETTER B	`U+0062`	`b`	octal: 142

A basic Latin letter.

c	SMALL LETTER C	`U+0063`	`c`	octal: 143

A basic Latin letter. Cf. to cent sign (¢) and c with cedilla (ç).

d	SMALL LETTER D	`U+0064`	`d`	octal: 144

A basic Latin letter.

e	SMALL LETTER E	`U+0065`	`e`	octal: 145

A basic Latin letter. Notice that the estimated symbol (U+212E), also called "EEC sign", similar in appearance to "e" but typically larger), used in European packaging, does not belong to ISO Latin 1. As regards to using e e.g. in place of script small e (U+212F) ("error"), see notes on letterlike symbols.

f	SMALL LETTER F	`U+0066`	`f`	octal: 146

A basic Latin letter.

g	SMALL LETTER G	`U+0067`	`g`	octal: 147

A basic Latin letter. As regards to using it e.g. in place of script small g (U+210A), used as real number symbol, see notes on letterlike symbols.

h	SMALL LETTER H	`U+0068`	`h`	octal: 150

A basic Latin letter. The Planck constant h exists as a separate symbol planck constant (U+210E) in Unicode but as a compatibility character only; it can be presented as an h in italics (e.g. using the markup <I>h</I> in HTML).

i	SMALL LETTER I	`U+0069`	`i`	octal: 151

A basic Latin letter.

j	SMALL LETTER J	`U+006A`	`j`	octal: 152

A basic Latin letter.

k	SMALL LETTER K	`U+006B`	`k`	octal: 153

A basic Latin letter.

l	SMALL LETTER L	`U+006C`	`l`	octal: 154

A basic Latin letter. Used e.g. as a symbol for "litre", but see notes on capital L. As regards to using l e.g. in place of script small l (U+2113) (used as symbol for litre), see notes on letterlike symbols.

m	SMALL LETTER M	`U+006D`	`m`	octal: 155

A basic Latin letter.

n	SMALL LETTER N	`U+006E`	`n`	octal: 156

A basic Latin letter.

o	SMALL LETTER O	`U+006F`	`o`	octal: 157

A basic Latin letter. Cf. to masculine ordinal indicator (º). As regards to using o e.g. in place of script small o (U+2134), used to denote "order", see notes on letterlike symbols.

The letter o has often been used as a "list bullet". However, it might be read - especially by an automatic speech generator - as a word ("oh"), and in any case the use of a letter for such a purpose is illogical. Therefore, use e.g. a hyphen-minus or an asterisk instead. Notice that in the HTML language, you can just use logical elements (UL and LI) to set up lists and leave it up to browsers to present them.

p	SMALL LETTER P	`U+0070`	`p`	octal: 160

A basic Latin letter.

q	SMALL LETTER Q	`U+0071`	`q`	octal: 161

A basic Latin letter.

r	SMALL LETTER R	`U+0072`	`r`	octal: 162

A basic Latin letter.

s	SMALL LETTER S	`U+0073`	`s`	octal: 163

A basic Latin letter. Cf. to sharp s (ß).

t	SMALL LETTER T	`U+0074`	`t`	octal: 164

A basic Latin letter.

u	SMALL LETTER U	`U+0075`	`u`	octal: 165

A basic Latin letter.

v	SMALL LETTER V	`U+0076`	`v`	octal: 166

A basic Latin letter.

w	SMALL LETTER W	`U+0077`	`w`	octal: 167

A basic Latin letter.

x	SMALL LETTER X	`U+0078`	`x`	octal: 170

A basic Latin letter.

The letter x has often been used as a multiplication sign when ASCII characters only are available. In ISO Latin 1, there is no reason to do so; use multiplication sign (×) instead.

Another common usage of x is as a "wildcard"; cf. to the "wildcard" use of asterisk (*). For example, when referring to computer program versions, "4.x" means any version number beginning with "4." (i.e. any minor version of major version 4).

y	SMALL LETTER Y	`U+0079`	`y`	octal: 171

A basic Latin letter.

z	SMALL LETTER Z	`U+007A`	`z`	octal: 172

A basic Latin letter.

{	LEFT CURLY BRACKET	`U+007B`	`{`	octal: 173

This character is (rarely) used as an opening delimiter for parenthetic remarks in natural languages, especially when such remarks are nested. In other languages, there are various uses such as an opening delimiter for a comment or a parameter list.

Called opening curly bracket in Unicode version 1.0. In practice, the word "brace" is often used instead of "curly bracket".

In several national variants of ASCII, there is some other character in the code position of this character.

\|	VERTICAL LINE	`U+007C`	`\|`	octal: 174

This character is probably most typically used in formal languages (such as BNF) between alternatives, corresponding to the word "or". In mathematics, vertical lines are used around an expression to denote its absolute value, e.g. |-42| = 42. In some dictionaries, a vertical line is used to indicate a possible hyphenation point; there is also a quite different dictionary usage: to separate the invariable part of a word from the rest in a paragraph that describes several words that begin the same way (e.g. imitat|e ... -ion ... -ive ...). Several other usages exist, too, especially in technical contexts. In Unix shells, for example, this character is used to denote "piping" (e.g. ls | more means "execute the ls program directing its output to the more program as input").

Called vertical bar in Unicode version 1.0 and in most contexts in practice. However, the word "line" is preferable to "bar", since in Unicode there are several vertical bar symbols, and even light vertical bar (U+2658) is intended to be thicker than vertical line!

In some old fonts (and keyboards), this character appears as a broken vertical line. But notice that in ISO Latin 1, broken bar (¦) is a completely distinct character. See also notes on the vertical line in Character histories.

In several national variants of ASCII, there is some other character in the code position of this character.

}	RIGHT CURLY BRACKET	`U+007D`	`}`	octal: 175

This character is (rarely) used as an closing delimiter for parenthetic remarks in natural languages, especially when such remarks are nested. In other languages, there are various uses such as an closing delimiter for a comment or a parameter list.

Called closing curly bracket in Unicode version 1.0. In practice, the word "brace" is often used instead of "curly bracket".

In several national variants of ASCII, there is some other character in the code position of this character.

~	TILDE	`U+007E`	`~`	octal: 176

This is a spacing character of mixed usage. The word tilde is of Spanish origin and refers to a curly diacritic mark (though in Spanish, the word often denotes the acute accent too!). The name of this character thus reflects one of the originally intended uses. But currently such use has little to do with tilde as a Latin 1 character; it can be basically used just in order to mention the diacritic, but even for that it's not particularly adequate for reasons explained below. Due to its spacing nature, it cannot be used to construct a character with tilde (such as ã or ñ). However, a keyboard key labeled with the accent might work as a "composition key" which can used to type characters with the corresponding diacritic.

In practice, tilde is used for a variety of technical purposes according to specific rules e.g. in programming and command languages. For example, in many Unix shells ~ denotes the user's home directory. Reflecting this tradition, on many Web servers people's Web pages are named in a manner which involves the tilde character. If possible, the use of tilde in Web addresses should be avoided, since it causes various problems, but it has become rather common. In Windows systems, the mapping of Windows filenames to DOS compatible filenames ("8+3 characters") uses tilde; e.g. LONGFILENAME.TXT may get mapped to LONGFI~1.TXT. In the C language, it denotes a bitwise operator that complements each bit. In Perl, it is used in matching operators. So there's little in common between such meanings.

Tilde is often used as a symbol for negation in formal logic, but for that purpose, not sign would be more logical.

Tilde is not the same as tilde operator (U+223C), which is used in meanings like 'varies with', 'is proportional to', 'is similar to', etc. Typically, the glyphs for tilde operator and tilde look rather similar but the latter might be positioned higher with respect to the baseline, reflecting its origin as a diacritic. However, in many fonts tilde does not look like a diacritic at all but rather like an operator.

The Unicode standard contains (on p. 149 in version 3) the following characterization:

Tilde. U+007E tilde can be used either as a Spacing Clone of Combining Tilde [i.e. spacing counterpart of U+0303] - - or more often as a center line tilde similar in appearance to U+223C tilde operator. Two common uses are to indicate an approximate value or in dictionaries to repeat the defined term in the definition of the ~. Although U+007E tilde is ambiguous in its rendering, modern fonts generally render it with a center line glyph, as shown in the code charts.

The Unicode 3.0 standard also mentions (p. 161) that the spacing form of the diacritic tilde is denoted unambiguously by small tilde (U+02DC). Thus, that character would be more adequate (if it can be reliably used) when you wish to mention the diacritic.

In ASCII, the tilde character had the primary name "overline" (and a corresponding appearance; cf. to MACRON); "tilde" was a secondary name only. See notes on diacritics and notes on the tilde in Character histories. See also Mark Pilgrim's History of the tilde.

The Unicode standard includes the tilde into a table of "Unicode Dash Characters", mentioning the synonym swung dash for it.

In several national variants of ASCII, there is some other character in the code position of this character.

	NO-BREAK SPACE	`U+00A0`	` `	octal: 240

This character is used in place of a normal space character as a "binding space", to prevent a line break between words or other expressions. The reason is that programs which process texts, even if the processing is otherwise quite simple, very often reformat the text as regards to division into lines. This means that normal spaces may be replaced by line breaks. In some cases, e.g. when a statement ends with an expression like "number 7.", such processing would lead to unesthetic results. The use of no-break space instead of normal space between "number" and "7." is expected to prevent that.

The ISO 8859-1 standard says this in technical language as follows:

NO-BREAK SPACE (NBSP)

A graphic character the visual representation of which consists of the absence of a graphic symbol, for use when a line break is to be prevented in the text as presented.

Unicode Technical Report #14: Line Breaking Properties specifies more exact semantics. It (normatively) defines that the no-break space character belongs to Non-breaking ("Glue") characters, to which the following applies:

The action of these characters is to glue together both left and right neighbor character such that they are kept on the same line. If they follow a space character, they still allow a break.

In the HTML language, no-break spaces may have other meanings, too. It may have special effects there, and it is generally treated as "non-collapsible" space as opposite to normal space characters, which are "collapsible" in HTML in the sense that any sequence of spaces is equivalent to a single space.

Text processing programs often treat no-break spaces as "non-expandable" in visual formatting, in the following sense: When formatting text lines so that they are justified on both sides, i.e. of equal width, the programs use varying amounts of spacing between words. But when a no-break space is used between words, the programs often keep the spacing between them as constant (and relatively narrow). This is usually quite adequate, since if two words should not be separated by a line break, it's usually good that there won't be much horizontal spacing either.

¡	INVERTED EXCLAMATION MARK	`U+00A1`	`¡`	octal: 241

This character (¡) is used in Spanish, Asturian and Galician at the beginning of an exclamation (which is terminated by a "normal" exclamation mark). Example:

¡Buenos días, señor!

For information (in Spanish) on related grammatical rules, see e.g. De los signos de puntuación on the Spanish language pages by Ricardo Soca.

¢	CENT SIGN	`U+00A2`	`¢`	octal: 242

This character is a currency symbol used in many countries. It is most widely known as the symbol for "cent" as one hundredth of the US dollar.

In the English language, this character is written immediately after a number, e.g. 75¢. It is never used when writing a sum of money which begins with dollar sign ($); in such cases, cents are indicated as fractions of dollar, e.g. $0.75, $49.95.

The currency unit euro is divided into 100 cents; there seems to be no indication that the cent sign would be recommended as a symbol for cent in that meaning. See notes on the euro sign and associates.

£	POUND SIGN	`U+00A3`	`£`	octal: 243

This character is a currency symbol. The Unicode standard mentions the names "pound sterling" and "Irish punt" but marks it as distinct from the lira sign (U+20A4), which has been used as the symbol for Turkish lira and (previously) Italian lira. The pound sign has one crossbar whereas the lira sign has two. On the other hand, the Unicode standard says that "preferred character for lira is 00A3 £".

See notes on the number sign (#), especially as regards to its use as a pound sign in some contexts.

¤	CURRENCY SIGN	`U+00A4`	`¤`	octal: 244

This character is a currency symbol to which no definite semantics has been assigned. It is used very rarely. The most natural semantics for it would probably be that it is a generic currency symbol: a placeholder for actual currency symbols. But there is very little such usage in reality. However, localization settings in Microsoft products may use the currency sign in patterns used to specify the formatting of monetary quantities. For example, "1,1 ¤" might be a setting that tells the system to put the currency symbol (to be specified in another setting) after the number and separated from it with a space.

In the 1960s, the international currency sign ¤ was substituted for the dollar sign $ in an internationalized version of ASCII (ISO 646). The dollar sign was restored later however, and it seems that nobody actually used the currency sign for anything. For some odd reason, it was included into ISO Latin 1 into a code position of its own.

¥	YEN SIGN	`U+00A5`	`¥`	octal: 245

This character (¥) is a currency symbol, with an alternative name "yuan", reflecting its dual use for the currencies of Japan and China. A glyph for the character may have one or two crossbars (with no difference in meaning).

¦	BROKEN BAR	`U+00A6`	`¦`	octal: 246

In some old fonts (and keyboards), the vertical line character appears as a broken line. But in ISO Latin 1, the broken bar is a completely distinct character. Its Unicode 1.0 name is vertical broken bar. There seems to be no good information about the intended or actual usage of this character. The Unicode standard mentions that an alternative name used in typography is "parted rule".

It is advisable to avoid using this character, since its code position is occupied by another character in ISO Latin 9 (alias ISO 8859-15), which will probably widely replace ISO Latin 1 at least in European usage.

§	PARAGRAPH SIGN, SECTION SIGN (SECTION SIGN)	`U+00A7`	`§`	octal: 247

This character (§) is used as a section sign especially in the US, and as a paragraph sign in some European usage, especially when referring to paragraphs in laws, contracts, rules, etc. (For that reason, § is sometimes used to symbolize law in general.) The varying names reflect the variation in usage. Cf. to pilcrow sign (¶).

¨	DIAERESIS	`U+00A8`	`¨`	octal: 250

This character (¨) is a spacing character which basically represents a diacritic mark. As such, it has little use: it can be used in order to mention the diacritic.

The official spelling "diaeresis" conforms to British English; the American spelling "dieresis" is often used in practice. In Unicode 1.0, the name is spacing diaeresis.

The name "umlaut" or "Umlaut" is often used, especially when referring to the use of diaeresis in languages like German where it reflects a phonetic phenomenon called Umlaut. For more information on this, see a news article with subject Umlaut, ablaut, etc. by Christian Weisgerber. As regards to the appearance, especially when used to denote Umlaut in handwritten text, diaeresis often takes a form which looks like tilde or macron.

©	COPYRIGHT SIGN	`U+00A9`	`©`	octal: 251

This character (©) consists of the letter C in a circle, and it means "copyright". It can be used instead of or in addition to the word "copyright", partly because the character is in principle language-neutral and universal. Example:

© 1996 - 1998 Jukka Kalervo Korpela

The example is a copyright notice which satisfies the formalities for copyright protection in some countries. In most countries, there is no such formality requirement, but a notice might still be useful; see 10 Big Myths about copyright explained.

The sound recording copyright symbol (U+2117) (P within a circle) does not belong to ISO Latin 1.

ª	FEMININE ORDINAL INDICATOR	`U+00AA`	`ª`	octal: 252

This character (ª) looks like the letter "a" used as a superscript, often underlined. It is used in Spanish when denoting the feminine ending (-a) of an ordinal number, e.g. in "1ª", read "primera". Cf. to masculine ordinal indicator (º).

«	LEFT ANGLE QUOTATION MARK (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK)	`U+00AB`	`«`	octal: 253

This punctuation character is often called left guillemet, and it is a quotation mark which is usually used as an opening quotation mark, sometimes as closing.

Angle quotation marks, namely this character and right-pointing double angle quotation mark (»), are often called guillemets. They are mainly used in books. They are used in either "symmetric" or "asymmetric" way, i.e. the opening mark can be similar to the closing mark, or one of the marks can be the opening mark and the other the closing mark. This mainly depends on language. Some examples:

In German usage, they are used asymmetrically so that they "point to" the quoted text: »this way«.
In French, the usage is asymmetric, too, but "pointing away" from the quoted text: « this way ». Notice that French uses spaces to separate the marks from the quoted text; no-break spaces (or "thin no-break spaces", not present in ISO Latin 1) can be used in such a context, to prevent bad line breaks
In Finnish and Swedish the usage is symmetric, using the right angle quotation mark: »this way».
For information on the usage in some other languages, see the document Anführungs- und "Abführungszeichen". Note that in several languages the usage of quotation marks varies, i.e. so that guillemets are used in books and other quotation marks in letters, memos, etc.

For some additional notes on guillemets, see Microsoft's Character design standards - Punctuation for Latin 1.

This character is not the same as the much less-than sign (U+226A); the latter, if needed when only ISO Latin 1 is available, can be simulated using two less-than signs (<<).

In Unicode 1.0, the name is left pointing guillemet. The Unicode standard also mentions that in typography this character is called "chevrons".

¬	NOT SIGN	`U+00AC`	`¬`	octal: 254

This character denotes logical negation, or "not" operator. It is probably used mainly in sentential logic, and even there, the tilde sign is probably more often used to denote negation.

The Unicode standard also mentions that in typography this character is called "angled dash". Assumably reflecting some typographic tradition, MS Word displays an "optional hyphen" (i.e., an invisible hyphenation hint) as ¬ when in "show formatting" ("Show ¶") mode. The idea is probably that the not sign looks like hyphen with a special mark on it. Note that Word's "optional hyphen" is internally presented as a control code (unit separator), not as a soft hyphen (see next entry).

	SOFT HYPHEN	`U+00AD`	``	octal: 255

The soft hyphen character, for which the abbreviation SHY is often used, seems to have different and contradictory meanings in ISO 8859-1 and in Unicode. In the former, SHY is a hyphen-like graphic character to be used at the end of line to indicate that word division has occurred; in the latter, it is an invisible hyphenation hint, a "discretionary hyphen" (which is an alternative name for the character in Unicode). There seems to be little support in widely used programs for soft hyphen in either meaning; the most notable exception is Internet Explorer 5, which treats the soft hyphen as a hyphenation hint - but IE 4 as well as most other Web browsers seem to treat the soft hyphen simply as a graphic character which is always displayed. Thus, you should probably just forget this character. But if you want to get even more confused with it, see my essay Soft hyphen (SHY) - a hard problem?.

®	REGISTERED TRADE MARK SIGN (REGISTERED SIGN)	`U+00AE`	`®`	octal: 256

This character consists of letter R in a circle; it is not classified as a letter but as "other symbol". It is used after a name or other expression, to indicate that it is a registered trade mark (at least in some country). In some countries, the law may require the aknowledgement of (registered) trade marks when mentioning product names in some contexts. See TM Basics by INTA.

There is considerable variation in glyphs for this character. The R inside the circle may have different shapes, but in addition to that, the size and position may vary. For example, in the Lucida Sans Unicode font, ® is a small superscript, whereas in Verdana, ® extends below baseline (making the R in the symbol line up with the baseline) and is relatively large.

The trade mark sign (U+2122) (letters TM in superscript style), used for trade marks which have not been registered but established by continuous use, does not belong to ISO Latin 1. Some confusion has been caused by the fact the trade mark sign belongs to the so-called Windows character set.

¯	MACRON	`U+00AF`	`¯`	octal: 257

This character (¯) is a spacing character with a rather indefinite meaning.

The Unicode standard mentions "overline" and "APL overbar" as synonyms for this character. The latter is not problematic: it simply refers to use in the APL programming language. The former is strange, since Unicode also contains a character with the primary name overline (U+203E), in the General Punctuation block. Probably this is to be interpreted so that macron is distinct from overline. Notice that combining (nonspacing) macron and overline are also distinct characters (U+0304 and U+0305), in the Combining Diacritical Marks block; and combining overline (U+0305) is shown in the Unicode standard with a longer glyph than combining macron (U+0304) (despite the latter having the synonym "long" - it probably refers to the use of the diacritic to denote that a vowel is pronounced long), with the explicit statement that combining overline "connects on left and right".

Thus, it might seem that macron is intended to be a (spacing) diacritic mark, in addition to its special use in APL. And, for example, the transliteration rules for Greek letters in the standard ISO 843 use "i" and "o" with a line above for eta and omega, with the alternative of writing the line above after the letter (i¯, o¯). However, in Unicode there is the separate character modifier letter macron (U+02C9), which is classified under "miscellaneous phonetic modifiers"!

In Unicode 1.0, the name is spacing macron.

As a note mostly of historical value, it needs to be remarked that in ISO 646, the primary name for tilde is "overline" and the primary glyph for it looks like overline. Luckily, such usage seems to be rare if not nonexistent.

°	RING ABOVE, DEGREE SIGN (DEGREE SIGN)	`U+00B0`	`°`	octal: 260

This character denotes degrees. It is used both for temperature degrees (e.g. 100 °F, 38 °C) and when expressing angles in degrees (e.g. 90° angle). Notice that when a temperature is expressed in kelvins, the degree sign is not used; the symbol of kelvin is simply K (e.g. 311 K).

According to the rules of the SI system of units (see Guide for the Use of the International System of Units (SI)), a space should be used between a numeric value and a unit symbol, with the exception of angle notations like 30°22'8". Note that when the degree sign is used for temperatures, the normal rule applies. A no-break space can be used instead of a normal space to prevent undesired line breaks.

Despite the name of this character in the ISO 8859-1 standard, and despite some fonts showing it as a ring above, it should not be regarded as a diacritic mark, or as anything else than the degree sign for that matter. The reason is that the Unicode standard, in addition to specifying "degree sign" as the only name for it, specifically distinguishes it from ring above (U+02DA), which is listed under "spacing clones of diacritics". It is also distinct from ring operator (U+2218). And it is not to be regarded as superscript 0 either.

However, in practice you may find the degree sign used for different purposes. The Unicode standard even mentions (in 14.2 Letterlike Symbols): "Legacy data encoded in ISO/IEC 8859-1 (Latin-1) or other 8-bit character sets may also have represented the numero sign by a sequence of 'N' followed by the degree sign (U+00B0 DEGREE SIGN). Implementations interworking with legacy data should be aware of such alternative representations for the numero sign when converting data." This statement should be understood as describing legacy data rather than adequate use of the degree sign. If you wish to imitate the appearance of the numero sign, U+2116, then it's probably better to use "N" followed by the letter "o" in smaller font and underlined: No.

The degree sign is not the same as masculine ordinal indicator (º) although the glyphs for the two characters may look similar. Bob Baumel has written some notes on this common confusion.

±	PLUS-MINUS SIGN	`U+00B1`	`±`	octal: 261

This character means 'plus or minus'. It has different uses:

It is sometimes used to refer to two quantities at the same time, as in "the solutions of the equation x²-4=0 are ±2", meaning that the solutions are +2 and -2.
It is also used to indicate an interval of uncertainty in measurements and estimates, as in "according to the measurements, the weight is 42.4 kg ± 0.5 kg"; this means that the weight is expected to be between 42.4 - 0.5 and 42.4 + 0.5 kilograms. Typically, this does not specify absolute limits; the quantity after the ± sign is often some statistical measure like standard deviation. Note that according to the Guide for the Use of the International System of Units (SI), section 7.7 Clarity in writing values of quantities, notations like 42.4 ± 0.5 kg should not be used; you should either repeat the unit as above or use parentheses: (42.4 ± 0.5) kg to make it "completely clear to which unit symbols the numerical values of the quantities belong".
Yet another (informal) usage seems to be to let ± denote 'about, circa' (e.g. "he is ±50 years old"), which can be quite confusing. The English Style Guide of the Translation Service of the EU explicitly says, in section Mathematical symbols, that ± should not be used to mean "about" or "approximately", only for technical tolerances.

In Unicode 1.0, the name is plus-or-minus sign.

²	SUPERSCRIPT TWO	`U+00B2`	`²`	octal: 262

This character (²) is digit 2 as superscript. Alternative name: "squared". Example of use: m² (square meter). In Unicode 1.0, the name is superscript digit two.

³	SUPERSCRIPT THREE	`U+00B3`	`³`	octal: 263

This character (³) is digit 3 as superscript. Alternative name: "cubed". Example of use: m³ (cubic meter). In Unicode 1.0, the name is superscript digit three.

´	ACUTE ACCENT	`U+00B4`	`´`	octal: 264

This character is a spacing character which basically represents a diacritic mark. Due to its spacing nature, it cannot be used to construct a character with an acute accent (such as á). However, a keyboard key labeled with the accent might work as a "composition key" which can used to type characters with the corresponding diacritic.

Thus the accent itself as a character has little use. It can be used in order to mention the diacritic. In some dictionaries, the acute accent is used to indicate a stressed syllable; usages vary as regards to its placement before or after the stressed syllable or the stressed vowel.

In Unicode 1.0, the name is spacing acute.

Sometimes the acute accent is used as a single quote, especially to create the appearance of "smart" (asymmetric) quotes, using grave accent instead of an opening single quote and acute accent instead of a closing single quote. Such usage is definitely incorrect. In ISO Latin 1, the apostrophe is the only adequate surrogate for a single quote. See ASCII and Unicode Quotation Marks by Markus Kuhn.

The acute accent should not be used instead of the apostrophe in expressions like "don't" or "Jim's" or "o'clock". In writing computer code (source programs, scripts, commands, ...) people sometimes make the mistake of typing the acute accent instead of an apostrophe in quoted strings like 'foo'; and typically this results in errors because compilers and other programs treat the acute as normal data character, not a valid delimiter. See also notes on the use of apostrophe (rather than acute accent) as a surrogate for various characters not in ISO Latin 1.

µ	MICRO SIGN	`U+00B5`	`µ`	octal: 265

This character (µ) corresponds to the prefix "micro-". It is used in the metric system and, more generally, in the SI system of units to denote 'millionth of'. More exactly, it corresponds to a numeric multiplier of 10^-6 (ten to the power -6). For example, "µm" means 'micrometer', i.e. one millionth of a metre. The old, unsystematic name for that unit, "micron", with an old abbreviation consisting of the "µ" character alone, is still used sometimes.

When this character does not belong to the character repertoire in use, e.g. writing texts in ASCII characters only, it is customary to use the letter "u" instead due to some glyph similarity.

This character is historically based on the Greek letter mu (my). In Unicode, these characters are however distinct. But on the other hand, Unicode defines micro sign as a compatibility character which has greek small letter mu (U+03BC) as its compatibility decomposition. The situation is somehow confusing, though; the Unicode standard version 3.0 said:

- - some pairs of characters might have been treated as canonical equivalents but are left unequivalent for compatibility with legacy differences. This situation pertains to U+00B5 µ MICRO SIGN (cf. U+03BC μ GREEK SMALL LETTER MU) - -

The Unicode Standard Version 3.0, p. 74

So they are just compatibility equivalents, not canonical equivalents. The difference, as explained in the Unicode standard, is that replacing a character by its compatibility equivalent may remove formatting information whereas replacing by canonical equivalent will not.

Unicode version 3.0 used a (strongly) slanted glyph for the micro sign and an upright glyph for the mu letter there, as well as in the tables of glyphs (in blocks Latin 1 Supplement and Greek). And this reflected some actual font designs where the micro sign is slanted. However, there seems to be a widespread opinion that the symbol for the SI prefix should not be slanted, or should be just a little slanted.

In Unicode version 4.0, the sample glyphs for the micro sign and the letter mu look very similar, if not identical. In many fonts, however, there are differences, which vary from hardly noticeable to substantial. See a test page showing the micro sign and the mu letter in different fonts.

One might use U+03BC rather than U+00B5 even if the character is used as a micro prefix, provided that the Unicode character U+03BC can be used reliably in the context.

¶	PILCROW SIGN	`U+00B6`	`¶`	octal: 266

This character is a "section sign in some European usage", as the Unicode standard puts it. But for example, according to the description of Spanish punctuation in De los signos de puntuación by Ricardo Soca, such usage is now outdated, and the character is used as a marker for special notes. ("Ahora se emplea en lo impreso para señalar alguna observación especial.")

In old manuscripts, there was a tendency to present a new paragraph by writing a pilcrow sign and continuing in-line, presumably because of the considerable cost of the recording media in those days.

Moreover, the pilcrow sign appears as paragraph sign (and is typically called that way) in some US usage, in much the same way as the paragraph sign (§) is often used in Europe, e.g. so that clause 6 of an agreement or verdict is referred to by "¶ 6" and clauses from 20 to 28 are referred to by "¶¶ 20-28".

In Unicode 1.0, the name is paragraph sign. Cf. to paragraph sign (§); and since usages vary, a confusion between the two characters is thus quite possible.

Many text processing programs display paragraph breaks as ¶ when requested to "show formatting". This does not mean that the data itself (as e.g. saved onto disk) would contain such characters; it's usually just a visual indication on the screen.

·	MIDDLE DOT	`U+00B7`	`·`	octal: 267

In the Unicode standard, this character has alternative names "Georgian comma" and "Greek middle dot" and (in typography) "midpoint", which suggest some intended uses but obviously shouldn't be taken as an exhaustive list.

Unicode contains the character greek ano teleia (U+0387). However, it has the middle dot as its canonical decomposition, and the representative glyph as vertically centered. Thus is debatable whether either of these characters is adequate for use as a Greek punctuation character, since the Greek ano teleia is an upper dot, not a middle dot. It seems that this character was not properly included into the ISO 8859-7 set or into Unicode. However, in several fonts greek ano teleia is an upper dot, not a middle dot, so it is a better punctuation character for Greek texts when it is available.

Examples of other uses:

In the SI system of units, a middle dot, called "half-high dot" or "raised dot" in that context, can be used when denoting the product of two or more units, e.g. "N·m" (newton multiplied by meter). An alternative is to use a space, e.g. "N m". (See section II.4 in the official definition The International System of Units (SI).)
In mathematics, a middle dot is often used as a multiplication symbol. If such a symbol is needed - note that in algebra it is often implied: ab means a multiplied by b - then it is better to use the multiplication sign (×).
In chemistry, a middle dot is used in some cases to separate major parts of complex formula (components of a double salt). Example: K₂SO₄·Al₂(SO₄)₃
In Catalan the middle dot is used to distinguish between "ll" and "l·l" which are pronounced differently. Examples: "Avel·lí", "Apel·les", "Col·legio". Such usage might be the main reason for including this character into ISO Latin 1, since the repertoire is basically designed for writing words in Western European languages. In Unicode, there are separate characters latin capital letter l with middle dot (U+013F) and latin small letter l with middle dot (U+0140), but they are compatibility equivalent to letter L or l followed by the middle dot. However, see Microsoft's Character design standards - Math symbols for Latin 1, which expresses a different view on Catalan middle dots.
As a surrogate for hyphenation point (U+2027), i.e. to indicate correct word breaking as in dic·tion·ar·ies (e.g. in WWWebster).

It is, however, debatable whether the middle point is adequate for use as a multiplication dot in principle. See notes on multiplication symbols in Characters in SI notations.

Note that a raised decimal point should not be interpreted as a middle dot; see notes about full stop (.).

The middle dot is distinct from the following characters: bullet (U+2022), one dot leader (U+2024), bullet operator (U+2219), dot operator (U+22C5), hyphenation point (U+2027). None of these characters belongs to ISO Latin 1; note that Microsoft's Character design standards - Math symbols for Latin 1 gets things very wrong when it treats U+2219 as “period centered - bullet operator” and even identifies it with the middle dot as used in Catalan.

The middle dot character is sometimes used as a small bullet, but it is not visually suitable for such use, since the glyph for middle dot is typically a rather small dot (though sometimes it is displayed as a big dot resembling a bullet). In HTML authoring, there is no need for a list bullet character, since you simply present an unordered list using the UL and LI elements, leaving it to browsers to present them (using bullets or otherwise). Moreover, the bullet character is in the so-called Windows character set, so it can be used in text processing fairly safely if desired; but see my document On the use of some MS Windows characters in HTML.)

¸	CEDILLA	`U+00B8`	`¸`	octal: 270

This character (¸) is a spacing character which basically represents a diacritic mark. As such, it has little use: it can be used in order to mention the diacritic. Due to its spacing nature, it cannot be used to construct a character with a cedilla. However, a keyboard key labeled with the accent might work as a "composition key" which can used to type characters with the corresponding diacritic.

Notice that in ISO Latin 1, the only letter with cedilla which you can use is c with cedilla (Ç and ç). There does not seem to be much secondary use for the cedilla character either. In Unicode 1.0, the name is spacing cedilla.

¹	SUPERSCRIPT ONE	`U+00B9`	`¹`	octal: 271

This character (¹) is digit 1 as superscript. In Unicode 1.0, the name is superscript digit one.

º	MASCULINE ORDINAL INDICATOR	`U+00BA`	`º`	octal: 272

This character (º) looks like the letter "o" used as a superscript, often underlined. It is used in Spanish when denoting the masculine ending (-o) of an ordinal number, e.g. in "1º", read "primero". Cf. to feminine ordinal indicator (ª).

This character is definitely not superscript 0 or degree sign.

»	RIGHT ANGLE QUOTATION MARK (RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK)	`U+00BB`	`»`	octal: 273

This character is often called right guillemet, and it is a quotation mark which is usually used as closing quotation mark, sometimes as opening. See usage notes in the description of the left-pointing double angle quotation mark («). Cf. to quotation mark (").

This character is relatively often used on Web pages as an arrow-like symbol, due to its visual appearance. Such usage is questionable, since the character is a quotation mark and could be processed according to that by various programs. See general reasons for being strict about meanings of characters.

This character is not the same as the much greater-than sign (U+226B); the latter, if needed when only ISO Latin 1 is available, can be simulated using two greater-than signs (>>).

In Unicode 1.0, the name is right pointing guillemet.

¼	VULGAR FRACTION ONE QUARTER	`U+00BC`	`¼`	octal: 274

This character (¼) denotes "1/4" as one character. See notes on vulgar fractions. In Unicode 1.0, the name is fraction one quarter.

½	VULGAR FRACTION ONE HALF	`U+00BD`	`½`	octal: 275

This character (½) denotes "1/2" as one character. See notes on vulgar fractions. In Unicode 1.0, the name is fraction one half.

¾	VULGAR FRACTION THREE QUARTERS	`U+00BE`	`¾`	octal: 276

This character (¾) denotes "3/4" as one character. See notes on vulgar fractions. In Unicode 1.0, the name is fraction three quarters.

¿	INVERTED QUESTION MARK	`U+00BF`	`¿`	octal: 277

This character is used in Spanish and (less regularly) in Catalan and Galician, at the beginning of a question (which is terminated by a "normal" question mark). Example:

¿Cómo está usted?

A synonym for the character is "turned question mark".

For information (in Spanish) on related grammatical rules, see e.g. De los signos de puntuación on the Spanish language pages by Ricardo Soca.

À	CAPITAL LETTER A WITH GRAVE ACCENT	`U+00C0`	`À`	octal: 300

This is a separate character composed of a basic Latin letter and a diacritic mark.

Á	CAPITAL LETTER A WITH ACUTE ACCENT	`U+00C1`	`Á`	octal: 301

This is a separate character composed of a basic Latin letter and a diacritic mark.

Â	CAPITAL LETTER A WITH CIRCUMFLEX ACCENT	`U+00C2`	`Â`	octal: 302

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ã	CAPITAL LETTER A WITH TILDE	`U+00C3`	`Ã`	octal: 303

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ä	CAPITAL LETTER A WITH DIAERESIS	`U+00C4`	`Ä`	octal: 304

This is a separate character composed of a basic Latin letter and a diacritic mark.

Å	CAPITAL LETTER A WITH RING ABOVE	`U+00C5`	`Å`	octal: 305

This is a separate character composed of a basic Latin letter and a diacritic mark. It is used in Used in Danish, Norwegian, and Swedish. See usage notes in the description of the corresponding small letter, å.

This character is sometimes used in physics to denote the angstrom (ångström) unit. (In Unicode, there is separate character angstrom sign (U+212B), which is compatibility character with Å as its compatibility decomposition.) The very angstrom unit should be replaced by regular SI units: 1 Å is 0.1 nanometres.

Æ	CAPITAL DIPHTHONG A WITH E (LATIN CAPITAL LETTER AE)	`U+00C6`	`Æ`	octal: 306

This character (Æ) is a separate character which historically originated as a ligature of the basic Latin letters A and E. See usage notes in the description of the corresponding small letter, æ.

Ç	CAPITAL LETTER C WITH CEDILLA	`U+00C7`	`Ç`	octal: 307

This is a separate character composed of a basic Latin letter and a diacritic mark.

È	CAPITAL LETTER E WITH GRAVE ACCENT	`U+00C8`	`È`	octal: 310

This is a separate character composed of a basic Latin letter and a diacritic mark.

É	CAPITAL LETTER E WITH ACUTE ACCENT	`U+00C9`	`É`	octal: 311

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ê	CAPITAL LETTER E WITH CIRCUMFLEX ACCENT	`U+00CA`	`Ê`	octal: 312

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ë	CAPITAL LETTER E WITH DIAERESIS	`U+00CB`	`Ë`	octal: 313

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ì	CAPITAL LETTER I WITH GRAVE ACCENT	`U+00CC`	`Ì`	octal: 314

This is a separate character composed of a basic Latin letter and a diacritic mark.

Í	CAPITAL LETTER I WITH ACUTE ACCENT	`U+00CD`	`Í`	octal: 315

This is a separate character composed of a basic Latin letter and a diacritic mark.

Î	CAPITAL LETTER I WITH CIRCUMFLEX ACCENT	`U+00CE`	`Î`	octal: 316

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ï	CAPITAL LETTER I WITH DIAERESIS	`U+00CF`	`Ï`	octal: 317

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ð	CAPITAL ICELANDIC LETTER ETH (LATIN CAPITAL LETTER ETH)	`U+00D0`	`Ð`	octal: 320

This character (Ð) is a letter which was included into ISO Latin 1 due to its use in Icelandic and Faeroese. Although its appearance is typically that of the letter D with stroke, it is not regarded as a letter with a diacritic. It is also distinct from latin capital letter d with stroke (U+0110), which appears in some other ISO Latin alphabets, and from latin capital letter african d (U+0189), although these letters may all look similar.

See usage notes in the description of the corresponding small letter, ð.

In Unicode 1.0, the name is "Latin capital letter ETH".

Ñ	CAPITAL LETTER N WITH TILDE	`U+00D1`	`Ñ`	octal: 321

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ò	CAPITAL LETTER O WITH GRAVE ACCENT	`U+00D2`	`Ò`	octal: 322

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ó	CAPITAL LETTER O WITH ACUTE ACCENT	`U+00D3`	`Ó`	octal: 323

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ô	CAPITAL LETTER O WITH CIRCUMFLEX ACCENT	`U+00D4`	`Ô`	octal: 324

This is a separate character composed of a basic Latin letter and a diacritic mark.

Õ	CAPITAL LETTER O WITH TILDE	`U+00D5`	`Õ`	octal: 325

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in Portuguese to denote nasal "o". Also used in Estonian but denoting a different (non-nasal) vowel.

Ö	CAPITAL LETTER O WITH DIAERESIS	`U+00D6`	`Ö`	octal: 326

This is a separate character composed of a basic Latin letter and a diacritic mark. Cf. to small o with diaeresis.

×	MULTIPLICATION SIGN	`U+00D7`	`×`	octal: 327

This character is a mathematical symbol denoting multiplication. Examples: "2×2 makes 4", where "×" can be read as "times"; "a 5×10 metres area", where "×" can be read as "by". In biology, this character is used when naming hybrids, e.g. Salix ×capreola indicates that the species results from hybridization, and Agrostis stolonifera × Polypogon monspeliensis is a "hybrid formula" that indicates the hybrid of two named species. The Unicode standard mentions an alternative name "z notation Cartesian product", reflecting the usage for Cartesian (direct) product of sets. Cf. to middle dot (·).

Ø	CAPITAL LETTER O WITH OBLIQUE STROKE (LATIN CAPITAL LETTER O WITH STROKE)	`U+00D8`	`Ø`	octal: 330

This character is classified as a letter. It is used e.g. in Danish. Cf. to the corresponding small letter, ø. Despite its name--which reflects its origin--it is not regarded as a letter with a diacritic mark.

This letter is not a suitable symbol for the empty set or for diameter, for which there are separate characters in Unicode, namely empty set (U+2205) and diameter sign (U+2300).

Ù	CAPITAL LETTER U WITH GRAVE ACCENT	`U+00D9`	`Ù`	octal: 331

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ú	CAPITAL LETTER U WITH ACUTE ACCENT	`U+00DA`	`Ú`	octal: 332

This is a separate character composed of a basic Latin letter and a diacritic mark.

Û	CAPITAL LETTER U WITH CIRCUMFLEX ACCENT	`U+00DB`	`Û`	octal: 333

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ü	CAPITAL LETTER U WITH DIAERESIS	`U+00DC`	`Ü`	octal: 334

This is a separate character composed of a basic Latin letter and a diacritic mark.

Ý	CAPITAL LETTER Y WITH ACUTE ACCENT	`U+00DD`	`Ý`	octal: 335

This is a separate character composed of a basic Latin letter and a diacritic mark.

Þ	CAPITAL ICELANDIC LETTER THORN (LATIN CAPITAL LETTER -THORN)	`U+00DE`	`Þ`	octal: 336

This character (Þ) is the a capital letter corresponding to the Latin small letter thorn, þ.

ß	SMALL GERMAN LETTER SHARP s (LATIN SMALL LETTER SHARP S)	`U+00DF`	`ß`	octal: 337

This character is a letter used in German, and it denotes an "s" sound (unvoiced). It is definitely not the Greek letter beta! A synonym for the name is "ess-zed" (or, according to the Unicode standard, "Eszett"), reflecting an assumed origin of the letter ß as a ligature of "s" and "z", although the origin of ß has also been explained as a ligature of "long s" and "short s". According to the Unicode standard, it is originally a ligature of latin small letter long s (U+017F) and normal "s".

When converting German text into uppercase, this letter is converted to the character pair "SS" (two normal "S" letters).

The use of this character has been affected (reduced, in favor of "ss") by the German orthography reform (basically carried out in 1998 - 2005, later modified slightly). In Swiss German, this character is not used at all; instead, "ss" is written.

à	SMALL LETTER A WITH GRAVE ACCENT	`U+00E0`	`à`	octal: 340

This is a separate character composed of a basic Latin letter and a diacritic mark.

This character sometimes appears in languages which do not usually have accented letters, since they use the loanword (French preposition) "à" in a punctuation-like manner, e.g. "5 à 7" meaning '5 to 7', '5--7'.

á	SMALL LETTER A WITH ACUTE ACCENT	`U+00E1`	`á`	octal: 341

This is a separate character composed of a basic Latin letter and a diacritic mark. Rather often it appears as a misspelling of à in Finnish usage at least.

â	SMALL LETTER A WITH CIRCUMFLEX ACCENT	`U+00E2`	`â`	octal: 342

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in several languages, e.g. French; also used in latinization of Cyrillic letters according to ISO 9.

ã	SMALL LETTER A WITH TILDE	`U+00E3`	`ã`	octal: 343

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in Portuguese.

ä	SMALL LETTER A WITH DIAERESIS	`U+00E4`	`ä`	octal: 344

This is a separate character composed of a basic Latin letter and a diacritic mark.

å	SMALL LETTER A WITH RING ABOVE	`U+00E5`	`å`	octal: 345

This is a separate character composed of a basic Latin letter and a diacritic mark.

This character was formed from "ao" (by writing the "o" above the "a") in the 15th century for use in Swedish orthography. It denotes a sound roughly similar to the one denoted by "a" in English "hall".

The character was taken into use in Norwegian in 1918 and in Danish in 1948, replacing "aa". For example, the name Håkon was previously written Haakon, and the old spelling is still widely used e.g. in texts in English. It is also used in Walloon.

æ	SMALL DIPHTHONG A WITH E (LATIN SMALL LETTER AE)	`U+00E6`	`æ`	octal: 346

This character historically originated as a ligature of the basic Latin letters a and e. Despite this background, and despite the old (Unicode 1.0) name latin small ligature ae, this character is not to be regarded as a ligature but as a separate letter which is not decomposable.

The word "diphthong" is also misleading in this context, since the character does not necessarily, or even usually, denote a combination of vowels pronounced as a diphthong.

This character is used

In some languages (Danish, Norwegian, Icelandic, Faroese, Old English, French) as a letter of its own.
As a ligature of a and e, especially when writing Latin words (e.g. "Caesar" might be written as "Cæsar"; but this is not the classical Latin spelling).
In the International phonetic alphabet by IPA, as a phonetic symbol (for a vowel like the one denoted by "a" in the English word "hat").

The Unicode standard mentions an alternative name "ash", which comes from Old English "æsc".

ç	SMALL LETTER C WITH CEDILLA	`U+00E7`	`ç`	octal: 347

This is a separate character composed of a basic Latin letter and a diacritic mark. It is used e.g. in French to denote an "s" sound. It is also used in the international phonetic alphabet by IPA to denote an unvoiced palatal fricative.

è	SMALL LETTER E WITH GRAVE ACCENT	`U+00E8`	`è`	octal: 350

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in several languages, e.g. French; also used in latinization of Cyrillic letters according to ISO 9.

é	SMALL LETTER E WITH ACUTE ACCENT	`U+00E9`	`é`	octal: 351

This is a separate character composed of a basic Latin letter and a diacritic mark.

ê	SMALL LETTER E WITH CIRCUMFLEX ACCENT	`U+00EA`	`ê`	octal: 352

This is a separate character composed of a basic Latin letter and a diacritic mark.

ë	SMALL LETTER E WITH DIAERESIS	`U+00EB`	`ë`	octal: 353

This is a separate character composed of a basic Latin letter and a diacritic mark. Typically used to indicate that the letter e preserves its own phonetic value instead of being combined with the preceding vowel. Also used in latinization of Cyrillic letters according to ISO 9.

ì	SMALL LETTER I WITH GRAVE ACCENT	`U+00EC`	`ì`	octal: 354

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in Italian and Malagasy.

í	SMALL LETTER I WITH ACUTE ACCENT	`U+00ED`	`í`	octal: 355

This is a separate character composed of a basic Latin letter and a diacritic mark.

î	SMALL LETTER I WITH CIRCUMFLEX ACCENT	`U+00EE`	`î`	octal: 356

This is a separate character composed of a basic Latin letter and a diacritic mark.

ï	SMALL LETTER I WITH DIAERESIS	`U+00EF`	`ï`	octal: 357

This is a separate character composed of a basic Latin letter and a diacritic mark.

ð	SMALL ICELANDIC LETTER ETH (LATIN SMALL LETTER ETH)	`U+00F0`	`ð`	octal: 360

This character is a letter which was included into ISO Latin 1 due to its use in Icelandic and Faeroese. It is also used in Old English and in the international phonetic alphabet by IPA. It denotes the voiced sound which is denoted by "th" in modern English (as in the word "the").

This character is distinct from latin small letter d with stroke (U+0111), which appears in some other ISO Latin alphabets and is used in Sáami, Croatian, and Vietnamese. Do not confuse it with greek small letter delta (U+03B4) or partial differential (U+2202) either.

Cf. to the corresponding capital letter, Ð.

ñ	SMALL LETTER N WITH TILDE	`U+00F1`	`ñ`	octal: 361

This is a separate character composed of a basic Latin letter and a diacritic mark.

ò	SMALL LETTER O WITH GRAVE ACCENT	`U+00F2`	`ò`	octal: 362

This is a separate character composed of a basic Latin letter and a diacritic mark.

ó	SMALL LETTER O WITH ACUTE ACCENT	`U+00F3`	`ó`	octal: 363

This is a separate character composed of a basic Latin letter and a diacritic mark.

ô	SMALL LETTER O WITH CIRCUMFLEX ACCENT	`U+00F4`	`ô`	octal: 364

This is a separate character composed of a basic Latin letter and a diacritic mark.

õ	SMALL LETTER O WITH TILDE	`U+00F5`	`õ`	octal: 365

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in Portuguese and in phonetic writing to denote nasal "o". Also used in Estonian but denoting a different (non-nasal) vowel.

ö	SMALL LETTER O WITH DIAERESIS	`U+00F6`	`ö`	octal: 366

This is a separate character composed of a basic Latin letter and a diacritic mark. It is used in several languages, including German. In English, it has occasionally been used to indicate that vowel "o" be pronounced separately, e.g. in "coöoperation". Such usage is however rare nowadays.

÷	DIVISION SIGN	`U+00F7`	`÷`	octal: 367

This character is a mathematical symbol denoting division. Its intended scope of use is unclear. One might think that in ISO Latin 1, which lacks the Unicode character division slash (described as "generic division operator") one could probably use it as the normal division operator (as in "100÷5 makes 20"). Cf. to the discussion of slashes in the description of solidus. In some numeric keypads of computer keyboards, there is a key with the ÷ symbol, which means multiplication in calculator usage but generates the asterisk * when used for character input.

However, it has been reported that the symbol is also used to denote subtraction:

DIVISION SIGN <÷> should not be used for division, as it is also used for subtraction, the sign is known as "minus" in Denmark. Use SOLIDUS </> instead.

Source: Danish language locale for Denmark, Narrative Cultural Specification.

A Swedish standard on keyboards, SS 66 22 41 version (utgåva) 2, calls the division sign amerikanskt divisionstecken 'American division sign'. The notes on the origin of symbols for division in The History of Mathematical Symbols by Douglas Weaver says about this character: "The Anglo-American symbol for division is of 17th century origin, and has long been used on the continent of Europe to indicate subtraction." And CWA 14094 mentions the division sign as an example of symbols that have culturally dependent meanings. Thus, it is advisable to avoid using this character, except in special occasions where its meaning can be made clear.

ø	SMALL LETTER O WITH OBLIQUE STROKE (LATIN SMALL LETTER O WITH STROKE)	`U+00F8`	`ø`	octal: 370

This character is classified as a letter. It is used in Danish, Norwegian and Faroese and in the International phonetic alphabet by IPA. Cf. to the corresponding capital letter, Ø. Despite its name--which reflects its origin--it is not regarded as a letter with a diacritic mark.

ù	SMALL LETTER U WITH GRAVE ACCENT	`U+00F9`	`ù`	octal: 371

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in French and Italian.

ú	SMALL LETTER U WITH ACUTE ACCENT	`U+00FA`	`ú`	octal: 372

This is a separate character composed of a basic Latin letter and a diacritic mark.

û	SMALL LETTER U WITH CIRCUMFLEX ACCENT	`U+00FB`	`û`	octal: 373

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in several languages, e.g. French; also used in latinization of Cyrillic letters according to ISO 9.

ü	SMALL LETTER U WITH DIAERESIS	`U+00FC`	`ü`	octal: 374

This is a separate character composed of a basic Latin letter and a diacritic mark.

ý	SMALL LETTER Y WITH ACUTE ACCENT	`U+00FD`	`ý`	octal: 375

This is a separate character composed of a basic Latin letter and a diacritic mark. Used in Czech, Slovak, Icelandic, Faroese, Welsh, and Malagasy.

þ	SMALL ICELANDIC LETTER THORN (LATIN SMALL LETTER THORN)	`U+00FE`	`þ`	octal: 376

This character (þ), originally a runic letter, was included into ISO Latin 1 due to its use in Icelandic. It is also used in Old English. It denotes the unvoiced sound which is denoted by "th" in modern English (as in the word "mouth"). The character was taken into Latin script from runic letters, but it is now regarded as distinct from runic letter thurisaz thurs thorn (U+16A6). For historical background and other notes, see the document On the status of the Latin letter ÞORN and of its sorting order by Michael Everson and others.

The Unicode 2.0 standard also mentions IPA in the usage notes for this character, but this is probably a mistake, for the following reasons: This character is not listed in the cross-reference section of the description of the IPA extensions block (which lists e.g. latin small letter eth). The International phonetic alphabet by IPA does not contain any character resembling thorn. On the other hand, there are two characters for dental fricatives, unvoiced and voiced; the latter is obviously eth while the former looks like greek small letter theta (U+03B8), which is actually listed in the cross-reference section mentioned above. (Unicode 3.0 has a more vague statement than "IPA" about the usage, namely "phonetics".)

Cf. to the corresponding capital letter, Þ. See also notes on the "character pair" þÿ below.

ÿ	SMALL LETTER Y WITH DIAERESIS	`U+00FF`	`ÿ`	octal: 377

This is a separate character composed of a basic Latin letter and a diacritic mark. It is used in French in some names like L'Haÿ; the diaeresis indicates, as usual in French, that each vowel keeps its own pronunciation. Sometimes it is also (incorrectly) used in Dutch in place of latin small ligature ij (U+0133).

Notice that the corresponding capital letter latin capital letter y with diaeresis (U+0178) does not belong to ISO Latin 1. (See some related notes in my document On the use of some MS Windows characters in HTML.)

In most cases where you see a ÿ character, it does not really mean the letter. Instead, it's just the octet 255 decimal displayed according to ISO Latin 1 interpretation of that data. This octet has special uses, due to being the largest numeric value when octets are interpreted as unsigned integers, or the number -1 when octets are interpreted as signed 8-bit integers in two's complement notation. Edward Welbourne has remarked (in an E-mail message):

The lowercase y-dieresis - - has one other use, outside French. Because its ISO 8859 Latin-1 character code is 255, it is very useful in test-data for programs written in C mis-using the built-in getc() function (and its kin): this returns an integer, but a common folly is to store its return value in a character variable... at which point a y-dieresis is indistinguishable from end-of-file. Including a y-dieresis in test data is an easy way to spot this mistake when it has been made!

If you see the character pair þÿ, it is practically impossible that anyone really meant to write a word containing those characters in succession. In addition to being rather rare, globally speaking, they are used in very different contexts. The HTML 4.01 Specification describes, in section Character encodings:

- - to maximize chances of proper interpretation, it is recommended that documents transmitted as UTF-16 always begin with a ZERO-WIDTH NON-BREAKING SPACE character (hexadecimal FEFF, also called Byte Order Mark (BOM)) which, when byte-reversed, becomes hexadecimal FFFE, a character guaranteed never to be assigned. Thus, a user-agent receiving a hexadecimal FFFE as the first bytes of a text would know that bytes have to be reversed for the remainder of the text.

A bit technical, but the idea is that the Unicode character U+FEFF is used as an empty "starter" character so that if the byte (octet) order accidentally gets reversed, the problem can be detected and perhaps even fixed programmatically. And in the UTF-16 encoding, that character is presented as two octets with values FE and FF hexadecimal (254 and 255 decimal). If these octets are interpreted as ISO Latin 1 characters, we get þÿ. Thus, if you see that character pair, you are probably viewing the beginning of a UTF-16 encoded document, using a program which does not interpret it in the intended way but according to the ISO Latin 1 encoding.

The characters grouped by type, with annotations

The groups

Basic Latin letters (A - Z, a - z)
Diacritics (accents etc.) and letters with them
Other letters
Digits (0 - 9), superscript digits (¹ ² ³), and vulgar fractions (¼ ½ ¾)
Punctuation
Currency symbols
Mathematical, logical and physical symbols
Space characters
Other symbols

Note: Some characters appear in more than one category in this classification, due to different uses. (For example, hyphen-minus has dual use as punctuation symbol and as mathematical symbol.)

Basic Latin letters (A - Z, a - z)

These are the letters which are conventionally called the Latin letters. This letter repertoire was in practice selected for the purpose of writing the English language. (Notice that the letter w is not part of the alphabet of the Latin language.)

Notice that although many of the characters are often presented using glyphs similar to those for Greek and Russian characters, for example, these character repertoires are by definition distinct. For example, the Latin letter A is not the same as the Greek capital letter alpha or the first capital letter of the Cyrillic alphabet, although the same glyph could be used for all of them and although they might, under some circumstances, be pronounced similarly.

There is a large number of various derivatives of Latin letters, such as letters with diacritics (some of which belong to ISO Latin 1) and various symbols which historically originated as forms of letters (letterlike symbols) or as ligatures (such as the ampersand, &, which was originally a ligature of e and t).

Several basic Latin letters are in use as such as symbols for physical units and other special purposes. For example, the symbol for the SI unit ampere is regarded as identical with the capital letter A, and similarly the symbol for the SI prefix kilo- is identical with small letter k.

There are also many letterlike symbols which have been historically formed from letters, such as double-struck capital r (U+211D) used to denote the set of real numbers in mathematics. Quite a few of them have their own code positions and names in Unicode, either in the Letterlike Symbols block or elsewhere. Depending on the symbol and context, they can be regarded as merely glyph variants of the basic letters or as completely independent symbols or as something between. When ISO Latin 1 repertoire only is available, there isn't much choice: either you use the normal letter (such as "R" as a symbol of the set of real numbers) or you avoid using the symbol at all, expressing things verbally (e.g. "the set of real numbers"). In the first case, you should try to make things clear to readers, perhaps including a separate description of the notations used. You might additionally try to use a specific font to suggest that the letter is used in a special meaning. - Notice, however, the following independent (non-letter) characters belong to ISO Latin 1 and can be used for their proper meanings: ¢ (originally formed from "c"), £ (originally formed from "L"), ¥ (originally formed from "Y"), © (originally formed from "C"), and ® (originally formed from "R").

Diacritics (accents etc.) and letters with them

Loosely speaking, a diacritic mark is a sign such as an accent (e.g. acute accent ´) attached to a character (such as letter e) to create a new character (such as é). Most diacritics are placed above a letter.

Often a diacritic mark indicates some change in the pronunciation as compared with the base letter. However, the rules for this are language-dependent, and sometimes they imply no phonetic difference. This means that e.g. the definition of "diacritic" in WWWebster is somewhat misleading when it says: "indicating a phonetic value different from that given the unmarked or otherwise marked element". J. C. Wells has written a survey of the use of diacritics in some languages: Orthographic diacritics and multilingual computing.

Quite often a keyboard has no separate key for a letter with a diacritic, even if the keyboard is capable of sending such a character (i.e. the code of a letter with a diacritic). It might be possible to compose such a character using auxiliary "composition keys". Depending on the software in use and the intended data format, it might also be possible to use some "escape" notation to denote the character.

Various approaches to enabling the use of letters with diacritics have been suggested and tried in different systems and standards:

In ASCII, there are some characters which have both a primary use and a secondary meaning as a diacritic. The idea was that the secondary meaning applies when the character is preceded or followed by the ASCII backspace control code (BS, FE₀, control-H, code 8). Thus, for example, letter "e" followed by backspace followed by apostrophe (') would mean letter "e" with acute accent (é). This method has not been implemented and used widely, and it should be considered as very obsolete. However, similar methods are still sometimes used e.g. when one needs to simulate accented letters in pure US-ASCII: one just types "e'" and expects the reader or a program to take it as presenting "é". The following table summarizes how some ASCII characters were meant to have dual use:

dec	oct	hex	ASCII primary name	secondary use
34	42	22	quotation mark (")	diaeresis (¨)
39	47	27	apostotrophe (')	acute accent (´)
44	54	2C	comma (,)	cedilla (¸)
94	136	5E	upward arrow head	circumflex accent (^)
126	176	7E	overline	tilde (~)

In various National variants of ASCII (as well as in some other character sets), letters with diacritics were introduced into various code positions. For example, in some national variants "é" might appear in the code position occupied by right square bracket (]) in US-ASCII, whereas in some other it might replace grave accent (`). Obviously, this caused problems in contexts where one would have needed the replaced characters as well. Naturally, the repertoire of added characters was selected according to the needs of particular languages. These methods are still in use, although their importance is decreasing.
In ISO Latin 1, a number of letters with diacritics appear as separate characters in their own code positions. Practically speaking, the repertoire of such characters covers those characters used in national variants of ASCII.
In Unicode, the approach in ISO Latin 1 is applied more widely, introducing a large number of letters with diacritics. In addition to that, a general mechanism for expressing such letters is defined. Unlike the ASCII approach described above, it uses a special class of characters, "nonspacing diacritics". For example, in Unicode one can use "é" as a character of its own as in ISO Latin 1 (and with the same code position). But alternatively one could present is as a combination of two printable characters, normal letter "e" and combining acute accent (U+0301). This way, one could present a very large number of letters with diacritics. However, this approach is generally not supported yet.

In ISO Latin 1, there are several characters which are "precomposed" from a basic Latin letter and a diacritic:

Vowels with accents (grave, acute, circumflex, tilde, diaeresis)
À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ

Other letters with diacritics in ISO Latin 1 are:
Å å ("a" with ring above)
Ç ç ("c" with cedilla)
Ñ ñ ("n" with tilde)

The meanings of an accent or other diacritic are generally different in different languages. For example, an accent on a vowel may indicate that the vowel is stressed, or that it is long, or that it is otherwise phonetically different from the sound denoted by the base letter. Sometimes accents are used just to make a distinction between words which would otherwise be similar, as in Italian "è" 'is', as opposite to "e" 'and', or in several word pairs in Spanish. (Proposed changes to Spanish orthography would reduce such use of accents.) To take a further example, o with diaeresis (ö) is sometimes used in English (e.g. in the word "coöperation") to signal that the letter "o" is pronounced separately instead being combined with the preceding vowel; in German it denotes the vowel "o umlaut" which is quite distinct from "o" in pronunciation but appears as identical to "o" at the first sorting level in alphabetic order; in Swedish it denotes a separate sound too but is positioned as the last letter of the alphabet. There are some additional notes on usage in the descriptions of the spacing diacritics.

The exact rules for using diacritics vary, depending on the language, and even within a language. In particular, in the French language, which uses diacritics extensively, there has been a reform of the official orthography in the 1990s; see the official document Rectifications de l'orthographe. It should also be noted that although it has been rather common in French to omit diacritics from capital letters, such usage seems to have been caused by technical difficulties basically. But the document Accentuation des majuscules (on the Web site of l'Académie Française) states that diacritics be used with capital letters, too. For Spanish, Ortografía de la lengua española by Real Academia Española expresss the same principle, even saying that the academy has never established a different rule on this. Thus, an upper case letter should have a diacritic according to the normal rules of the language.

ISO Latin 1 contains the following diacritics as separate and spacing characters:

´ acute accent

` grave accent

^ circumflex accent

~ tilde

¨ diaeresis

¸ cedilla

It might be argued that the ISO 8859-1 standard is ambiguous regarding whether these character denote spacing or non-spacing characters. But Unicode and ISO 10646 definitely specify them as spacing.

In Unicode, there are other diacritics, too, such as breve and caron (hacek).

The term spacing as a property of a character means that the character is presented visually using a separate glyph which occupies its own space (smaller or larger), as opposite to being graphically combined with other characters using e.g. overprinting.

In addition to spacing diacritics like those mentioned above, Unicode also contains nonspacing diacritics. The are also (and officially, in Unicode terminology) called combining. A spacing diacritic like circumflex accent (^), apart from its secondary technical usages for quite different purposes, is useful only for mentioning a circumflex. It can be used e.g. to say that "the letter â is formed from the letter a by attaching the circumflex ^ to it" (although the visual appearance of ^ in a font may significantly differ from the circumflex in â). It can not be used to form the letter â. For instance, "a^" is simply a sequence of two characters; although some programs may convert it to "â", this is something that takes place outside character set issues. In contrast, the combining circumflex accent (U+0302) in Unicode has, as part of its defined meaning, the property that when following a letter, it is logically combined with it to produce a letter with a diacritic. In Unicode technical terms, a character like "â" is a "decomposable character" which is equivalent to the two-character decomposition consisting of the letter "a" followed by the combining circumflex accent (U+0302). In Unicode, there is a very large number of "precomposed" characters like "â" formed from a base character and an embedded diacritic, but sequences of base characters and combining diacritics allow an even wider repertoire to be presented. However, in practice, even those systems which have relatively good support to Unicode rarely support combining diacritics.

Other letters

The feminine ordinal indicator (ª) and the masculine ordinal indicator (º) can be regarded as letters, too, since they correspond to letters "a" and "o" in specific situations.

The following characters are regarded as independent letters, although some of them are historically combinations of two letters or a letter and a diacritic:
Æ æ (letter ae)
Ð ð (eth)
Þ þ (thorn)
Ø ø (o with stroke)
ß (sharp s)

Notice that the following characters are not regarded as letters, despite being historically formed from one or more letters:
¢ £ ¥ © ® µ

Digits (0 - 9), superscript digits (¹ ² ³), and vulgar fractions (¼ ½ ¾)

The "normal" digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are often called Arabic digits (especially to distinguish them from Roman numerals like XIV). In fact, Western Europeans adopted them from the Arabs, who had adopted them from scripts used in India. In these processes, the shapes of digits changed, however. The digits used in Arabic writing have shapes which differ from those of these "Arabic" digits, and they are classified as separate characters in Unicode: they are "Arabic-Indic digits" in block Arabic. There are also several other sets of digits in Unicode, for use in different scripts.

In Unicode, there are distinct characters for digits used as superscripts or subscripts. Only the superscripts corresponding to 1, 2 and 3, that is ¹ and ² and ³, belong to ISO Latin 1; the others are in block Superscripts and Subscripts in Unicode. Notice that ISO Latin 1 repertoire contains two characters which may look like superscript 0: the degree sign (°) and the masculine ordinal indicator (º).

When using the ISO Latin character repertoire only, it is probably best to use superscript ¹ or ² or ³ only if all superscripts used in a document can be expressed that way. Otherwise, i.e. when you need to use some other method for presenting other superscripts (such as the SUP element when authoring in HTML), it is probably best to use that method throughout, for uniformity.

The so-called vulgar fractions are characters denoting fractional numbers as single characters. In ISO Latin 1, there are such characters for the fractions 1/4, 1/2, 3/4 (namely ¼ ½ ¾). This reflects the character repertoire on many typewriters. Depending on the font, the bar (which corresponds to fraction slash) can be horizontal or slanted.

Analogously with the situation with superscript digits, when using the ISO Latin 1 character repertoire only, it is probably best to use vulgar fractions only if all fractions used in a document can be expressed that way. Otherwise, i.e. when you need to use some other method for presenting other fractions it is probably best to use that method throughout, for uniformity. You could use simply expressions like 2/3 and 1/4. (In the HTML language, you might use the SUP markup for the nominator and the SUB markup for the denominator, thereby suggesting a presentation which somewhat resembles vulgar fractions in appearance. However, such markup may cause uneven line spacing. See also section Fractions in Math in HTML.)

A practical problem with the vulgar fraction characters is that their appearance is often hard to read, especially on computer screens.

In Unicode, both the superscripts and the vulgar fractions are compatibility characters, so that e.g. the compatibility decomposition of ¾ is 3/4 presented in "fraction style".

Punctuation

The following ISO Latin 1 characters can be classified as punctuation characters:

! exclamation mark

¡ inverted exclamation mark

? question mark

¿ inverted question mark

" quotation mark

' apostrophe (used as single quote, too)

« left-pointing double angle quotation mark

» right-pointing double angle quotation mark

( left parenthesis

) right parenthesis

[ left square bracket

] right square bracket

{ left curly bracket

} right curly bracket

, comma

. full stop (period)

: colon

; semicolon

- hyphen-minus

soft hyphen

For some typographic notes on punctuation characters, see Microsoft's Character design standards - Math symbols for Latin 1.

Punctuation rules

Punctuation rules vary from one language to another. Even within a language, there might be differences in the recommended rules, depending on style and authority. For the English language, the following resources contain well thought-of recommendations:

English Style Guide by the Translation Service of the European Commission (EU)
NASA SP-7084: Grammar, Punctuation, and Capitalization; A Handbook for Technical Writers and Editors by Mary K. McCaskill
Basic Punctuation and Mechanics, by Craig Waddell

As regards to some other languages:

French has punctuation rules differing a lot from English. See Règles de typographie française and Composition des textes scientifiques (also available in Word format)
For German usage, see Rund um die Satzzeichen, where section Anführungs- und "Abführungszeichen" also illustrates the use of double and single quotation marks in several other languages.
Spanish: De los signos de puntuación by Ricardo Soca and Ortografía de la lengua española available in PDF format from the Real Academia Española website.
Finnish: Merkit (punctuation rules as published in Kielikello in 1993).

Paired punctuation and directionality

The parentheses, brackets and braces, i.e. characters ()[]{}, are classified as "paired punctuation" characters in Unicode. This means that the characters ([{ are regarded as defined logically, as opening punctuation, and the characters )]} correspondingly closing. Thus, although e.g. the name of "(" is "left parenthesis", it is really by definition "opening parenthesis".

This means that if the writing direction is from right to left, as in Hebrew and Arabic, the mirror images of the "normal" glyphs of these characters are used. Thus, a "left parenthesis", "(", would appear as mirrored so that it looks like what we are used to regarding as right parenthess, ")".

Currency symbols

$	dollar sign
¢	cent sign
£	pound sign
¤	currency sign
¥	yen sign

For informative notes on actual usage of various symbols and abbreviations for currencies of the world, see e.g.

the money table in WWWebster.
World Currencies and Abbreviations by Paul L. Allen

It depends on language-specific rules how currency symbols are attached to numbers. In English, the dollar and pound sign are usually written before the number (e.g. $1000), whereas in many other languages currency symbols are written after the number and separated from it with a space. And in Portuguese, for example, dollar sign is used as an escudo symbol so that it appears in place of decimal point (e.g. 30$00 is 30 escudos). Or rather was; escudo is not used any more.

Currencies can be denoted in several ways: words (in some language), currency symbol characters, or various abbreviations. The optimal choice depends on the context and intentions. When uniqueness, definiteness, and internationality (as neutrality with respect to national languages) are essential, the three-letter codes as defined in ISO 4217 should be used.

Note: ISO Latin 1 does not contain the symbol for the currency unit euro, euro sign (U+20AC). A new member of the ISO 8859 family of character repertoires, ISO 8859-15 alias ISO Latin 9 (!), contains euro sign in place of currency symbol (¤).

Mathematical, logical and physical symbols

%	percent sign
+	plus sign
-	hyphen-minus
±	plus-minus
<	less-than sign
>	greater-than sign
=	equals sign
¬	not sign
¯	macron
×	multiplication sign
÷	division sign
°	degree sign
µ	micro sign

Notes:

Asterisk is used as a multiplication symbol in several programming languages.
Solidus (slash) is often used as a division symbol.
The exclamation mark is, in addition to its primary use for punctuation, also used to denote a factorial (originally as a workaround, see notes on the factorial symbol in The History of Mathematical Symbols by Douglas Weaver.
When presenting real numbers, the full stop is used (in English) as a decimal point (e.g. "1.5") whereas many other languages use comma (e.g. "1,5").

A Brief History of the Notation of Boole's Algebra by Michael Schroeder contains, in section Algebraic Notation, information on the history of some mathematical symbols.

Space characters

ISO Latin 1 contains only two space characters: normal space and no-break space. In Unicode, there are other space characters too, such as em space (U+2003), many of which are defined to have some specific width. The other characters are those in the range U+2000 to U+200B (in the General Punctuation block), ideographic space (U+3000), and zero width no-break space (U+FEFF). For a brief summary, see the document Unicode spaces.

Quite often the phrase whitespace (characters) is used to denote a set of characters or codes which are treated as "empty space". The exact definition varies but typically covers some control codes such as horizontal tab and linefeed, too. See e.g. the definition of "whitespace" in the HTML 4.0 Specification.

Other symbols

These characters are hard to classify:

# number sign

& ampersand

* asterisk

/ solidus (slash)

\ reverse solidus (backslash)

@ commercial at

_ low line (underscore)

| vertical line

¦ broken bar

§ section

© copyright sign

® registered sign

¯ macron

Explanations and notations

Index to explanations

What ISO Latin 1 was designed for
About names
On the meanings of characters
Why should we be so strict about meanings of characters?
The notation U+nnnn

What ISO Latin 1 was designed for

ISO Latin 1 is a 8-bit extension of the 7-bit ASCII character repertoire. Since some of the 256 (respectively 128) code positions that are representable using 8 (respectively 7) bits are reserved for control characters, ISO Latin 1 contains 191 printable characters, 95 of which are ASCII characters.

ISO Latin 1 was designed mainly for use with languages of western Europe. These languages use Latin alphabets with some extensions. More exactly, ISO Latin 1 was designed with the following languages in mind: Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish and Swedish. However, for Finnish and French it is not quite sufficient; see my notes on ISO Latin 9. See also Coverage of European languages by ISO Latin alphabets.

Many other languages, for example Indonesian and Swahili, can be written with the ISO Latin 1 character repertoire.

After the addition of letters for those languages, there were still many code positions available. A set of special characters, such as the copyright symbol (©) and pound sterling symbol (£), were added. No "free positions" were left for eventual special use. There is no obvious logic in the repertoire of characters added, but assumably the idea was to select characters which are often needed in texts written in the above-mentioned languages.

The ISO 8859-1 standard was originally approved in 1987. As of this writing, the newest version of the ISO 8859-1 standard is ISO/IEC 8859-1:1998, dated 1998-04-16. Disclaimer: I have not yet been able to compare the versions in detail. My document is based on the 1987 version. However, according to a Usenet posting by Markus Kuhn, the main change is that the names have been made identical to those in UCS (i.e., in ISO 10646 and Unicode).

As early as in 1982, ECMA (originally established as European Computer Manufacturers' Association) begun work on a standard with aims similar to those that lead to the ISO 8859 standardization, and in March 1985, ECMA published Standard ECMA-94 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4. It is largely compatible with parts 1 through 4 of ISO 8859. The 2nd edition of ECMA-94 (June 1986) is available on the Web in PDF and PostScript formats.

About names

As explained in legend for the the character list, there are some differences between the ISO 8859-1 names and Unicode names for some characters, and even variation between Unicode versions. It is probably best to use Unicode names as defined in the newest version, due to the increasing importance of Unicode.

In addition to official names, there is a large number of unofficial names for characters, and they vary from one context, culture, and group of people to another. For a collection of some of the jargon, see pronunciation guide for unix. For example, for the tilde character ~ it lists the following Unix and C jargon names: twiddle, tilda, tildee, wave, squiggle, swung dash, approx, wiggle, enyay, home, worm, not. For communicative purposes, such jargon names should be avoided at least outside contexts and communities where they are generally known and uniquely understood. And in fact, if you use them in your ordinary environment, are you sure you can smoothly switch to standard names when needed?

On the meanings of characters

The official definition of the ISO Latin 1 character repertoire in the ISO 8859-1 standard "does not define and does not restrict the meanings of graphic characters", except for the following characters: space, no-break space, soft hyphen. It says that "the names chosen to denote graphic characters are intended to reflect their customary meaning", but as far as the ISO 8859-1 standard is concerned, you might use most of the characters for whatever you like. The price to pay for this "liberalism" is that you cannot assume that other people and computer programs will interpret the characters the same way as you.

On the other hand, the Unicode standard contains quite detailed notes on the use of characters. Some of the notes related to characters in the ISO Latin 1 repertoire are available, in PDF format, online as parts Basic Latin and Latin-1 Supplement of Unicode charts. It seems reasonable to use ISO Latin 1 characters according to the semantics specified in the Unicode standard.

Why should we be so strict about meanings of characters?

Let us first make it clear that in various formal languages, programming languages, command languages, markup languages, etc., special meanings can be assigned to characters quite independently of their normal meanings in everyday language.

For example, in normal language the ampersand character (&) means simply 'and', as its origin (the Latin word "et") suggests. But various technical meanings have been assigned to it. For example, in the C programming language it can mean an "address of" operator; in Unix command language, it may tell "run the program in the background"; in SGML based languages (such as HTML), it is used for so-called entity references (e.g. © is an entity reference which means the copyright symbol ©); and in LaTeX it can be used to specify tabulation.

However, such usages are based on specifications--often official standards--for such languages. The specifications form, for human beings and for programs, a firm basis for interpreting the characters in a consistent manner--in a specific context.

In the absence of a specific agreement on anything else, in normal textual data all characters should be used consistently in the Unicode meanings intended for such usage. The basic reason is that those meanings are what we can assume text processing software to apply in the long run. Whatever such software might do otherwise, perhaps honoring some special markup which uses some characters in special meanings, it must ultimately process "raw text data" too. And at that important level, the Unicode meanings come into the picture.

If we don't stick to standardized meanings for characters, there is really nothing to base text processing on. You cannot even perform such a simple transformation as converting text into lower case if you don't know which characters are really letters and which aren't. I explain this in some detail with an example in my character code tutorial as follows:

You should never use a character just because it "looks right" or "almost right". Characters with quite different purposes and meanings may well look similar, or almost similar, in some fonts at least. Using a character as a surrogate for another for the sake of apparent similarity may lead to great confusion. Consider, for example, the so-called sharp s (ess-zed), which is used in the German language. Some people who have noticed such a character in the ISO Latin 1 repertoire have thought "vow, here we have the beta character!". In many fonts, the sharp s (ß) really looks more or less like the Greek lowercase beta character (β). But it must not be used as a surrogate for beta. You wouldn't get very far with it, really; what's the big idea of having beta without alpha and all the other Greek letters? More seriously, the use of sharp s in place of beta would confuse text searches, spelling checkers, speech synthesizers, indexers, etc.; an automatic converter might well turn sharp s into ss; and some font might present sharp s in a manner which is very different from beta.

In practice, one often needs to make compromises due to lack of adequate support to rich enough character repertoires, such as using the quotation mark as double prime. But using, say, sharp s for beta goes definitely too far.

Similarly, for example, in many notational systems the less-than sign and greater-than sign are used as brackets due to the restrictedness of the character repertoire which was generally available when the notation was originally designed. (For example, in HTML they are used to delimit tags, as in <HTML LANG="en">.) But this does not make those characters into brackets any more than the letter l (el) was turned into digit 1 (one) just because many typewriters lacked the latter and the former was used in place of it. Consequently, it is appropriate to use the names "less-than sign" and "greater-than sign" for "<" and ">", even in contexts where they do not indicate the mathematical relations suggested by the names. Calling them by names reserved for other characters would lead to confusion, especially when support to large character repertoires becomes more and more widespread and people will be able to use real angle brackets, too.

The notation `U+nnnn`

Unicode characters are commonly referred to using a notation like
U+nnnn
where nnnn is a four-digit hexadecimal (base 16) number specifying the code position of the character in Unicode. For example, the space character has the same code number in Unicode as in ISO 8859-1, namely 32 decimal, 20 hexadecimal; thus, it can be denoted as U+0020. Generally, a notation like U+nnnn is needed for referring to characters uniquely in contexts where one cannot reliably present the character itself.

Originally created 2000-03-31. Structurally changed 2018-10-16. Minor modifications 2018-12-15.

This page belongs to the free information site IT and communication by Jukka "Yucca" Korpela.

À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ

À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ

À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ