The ISO Latin 1 character repertoire – a description with usage notes, section 3 The characters grouped by type, with annotations:

Diacritics (accents etc.) and letters with them

Loosely speaking, a diacritic mark is a sign such as an accent (e.g. acute accent ´) attached to a character (such as letter e) to create a new character (such as é). Most diacritics are placed above a letter.

Often a diacritic mark indicates some change in the pronunciation as compared with the base letter. However, the rules for this are language-dependent, and sometimes they imply no phonetic difference. This means that e.g. the definition of "diacritic" in WWWebster is somewhat misleading when it says: "indicating a phonetic value different from that given the unmarked or otherwise marked element". J. C. Wells has written a survey of the use of diacritics in some languages: Orthographic diacritics and multilingual computing.

Quite often a keyboard has no separate key for a letter with a diacritic, even if the keyboard is capable of sending such a character (i.e. the code of a letter with a diacritic). It might be possible to compose such a character using auxiliary "composition keys". Depending on the software in use and the intended data format, it might also be possible to use some "escape" notation to denote the character.

Various approaches to enabling the use of letters with diacritics have been suggested and tried in different systems and standards:

In ASCII, there are some characters which have both a primary use and a secondary meaning as a diacritic. The idea was that the secondary meaning applies when the character is preceded or followed by the ASCII backspace control code (BS, FE₀, control-H, code 8). Thus, for example, letter "e" followed by backspace followed by apostrophe (') would mean letter "e" with acute accent (é). This method has not been implemented and used widely, and it should be considered as very obsolete. However, similar methods are still sometimes used e.g. when one needs to simulate accented letters in pure US-ASCII: one just types "e'" and expects the reader or a program to take it as presenting "é". The following table summarizes how some ASCII characters were meant to have dual use:

dec	oct	hex	ASCII primary name	secondary use
34	42	22	quotation mark (")	diaeresis (¨)
39	47	27	apostotrophe (')	acute accent (´)
44	54	2C	comma (,)	cedilla (¸)
94	136	5E	upward arrow head	circumflex accent (^)
126	176	7E	overline	tilde (~)

In various National variants of ASCII (as well as in some other character sets), letters with diacritics were introduced into various code positions. For example, in some national variants "é" might appear in the code position occupied by right square bracket (]) in US-ASCII, whereas in some other it might replace grave accent (`). Obviously, this caused problems in contexts where one would have needed the replaced characters as well. Naturally, the repertoire of added characters was selected according to the needs of particular languages. These methods are still in use, although their importance is decreasing.
In ISO Latin 1, a number of letters with diacritics appear as separate characters in their own code positions. Practically speaking, the repertoire of such characters covers those characters used in national variants of ASCII.
In Unicode, the approach in ISO Latin 1 is applied more widely, introducing a large number of letters with diacritics. In addition to that, a general mechanism for expressing such letters is defined. Unlike the ASCII approach described above, it uses a special class of characters, "nonspacing diacritics". For example, in Unicode one can use "é" as a character of its own as in ISO Latin 1 (and with the same code position). But alternatively one could present is as a combination of two printable characters, normal letter "e" and combining acute accent (U+0301). This way, one could present a very large number of letters with diacritics. However, this approach is generally not supported yet.

In ISO Latin 1, there are several characters which are "precomposed" from a basic Latin letter and a diacritic:

Vowels with accents (grave, acute, circumflex, tilde, diaeresis)
À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ

Other letters with diacritics in ISO Latin 1 are:
Å å ("a" with ring above)
Ç ç ("c" with cedilla)
Ñ ñ ("n" with tilde)

The meanings of an accent or other diacritic are generally different in different languages. For example, an accent on a vowel may indicate that the vowel is stressed, or that it is long, or that it is otherwise phonetically different from the sound denoted by the base letter. Sometimes accents are used just to make a distinction between words which would otherwise be similar, as in Italian "è" 'is', as opposite to "e" 'and', or in several word pairs in Spanish. (Proposed changes to Spanish orthography would reduce such use of accents.) To take a further example, o with diaeresis (ö) is sometimes used in English (e.g. in the word "coöperation") to signal that the letter "o" is pronounced separately instead being combined with the preceding vowel; in German it denotes the vowel "o umlaut" which is quite distinct from "o" in pronunciation but appears as identical to "o" at the first sorting level in alphabetic order; in Swedish it denotes a separate sound too but is positioned as the last letter of the alphabet. There are some additional notes on usage in the descriptions of the spacing diacritics.

The exact rules for using diacritics vary, depending on the language, and even within a language. In particular, in the French language, which uses diacritics extensively, there has been a reform of the official orthography in the 1990s; see the official document Rectifications de l'orthographe. It should also be noted that although it has been rather common in French to omit diacritics from capital letters, such usage seems to have been caused by technical difficulties basically. But the document Accentuation des majuscules (on the Web site of l'Académie Française) states that diacritics be used with capital letters, too. For Spanish, Ortografía de la lengua española by Real Academia Española expresss the same principle, even saying that the academy has never established a different rule on this. Thus, an upper case letter should have a diacritic according to the normal rules of the language.

ISO Latin 1 contains the following diacritics as separate and spacing characters:

´	acute accent
`	grave accent
^	circumflex accent
~	tilde
¨	diaeresis
¸	cedilla

It might be argued that the ISO 8859-1 standard is ambiguous regarding whether these character denote spacing or non-spacing characters. But Unicode and ISO 10646 definitely specify them as spacing.

In Unicode, there are other diacritics, too, such as breve and caron (hacek).

The term spacing as a property of a character means that the character is presented visually using a separate glyph which occupies its own space (smaller or larger), as opposite to being graphically combined with other characters using e.g. overprinting.

In addition to spacing diacritics like those mentioned above, Unicode also contains nonspacing diacritics. The are also (and officially, in Unicode terminology) called combining. A spacing diacritic like circumflex accent (^), apart from its secondary technical usages for quite different purposes, is useful only for mentioning a circumflex. It can be used e.g. to say that "the letter â is formed from the letter a by attaching the circumflex ^ to it" (although the visual appearance of ^ in a font may significantly differ from the circumflex in â). It can not be used to form the letter â. For instance, "a^" is simply a sequence of two characters; although some programs may convert it to "â", this is something that takes place outside character set issues. In contrast, the combining circumflex accent (U+0302) in Unicode has, as part of its defined meaning, the property that when following a letter, it is logically combined with it to produce a letter with a diacritic. In Unicode technical terms, a character like "â" is a "decomposable character" which is equivalent to the two-character decomposition consisting of the letter "a" followed by the combining circumflex accent (U+0302). In Unicode, there is a very large number of "precomposed" characters like "â" formed from a base character and an embedded diacritic, but sequences of base characters and combining diacritics allow an even wider repertoire to be presented. However, in practice, even those systems which have relatively good support to Unicode rarely support combining diacritics.

Next part: Other letters

Originally created 2000-03-31. Structurally changed 2018-10-16. Minor modifications 2018-12-15.

This page belongs to the free information site IT and communication by Jukka "Yucca" Korpela.

À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ

À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ

À	Á	Â	Ã	Ä	à	á	â	ã	ä
È	É	Ê		Ë	è	é	ê		ë
Ì	Í	Î		Ï	ì	í	î		ï
Ò	Ó	Ô	Õ	Ö	ò	ó	ô	õ	ö
Ù	Ú	Û		Ü	ù	ú	û		ü
	Ý					ý			ÿ