The ISO Latin 1 character repertoire – a description with usage notes, section 3 The characters grouped by type, with annotations:

Diacritics (accents etc.) and letters with them

Loosely speaking, a diacritic mark is a sign such as an accent (e.g. acute accent ´) attached to a character (such as letter e) to create a new character (such as é). Most diacritics are placed above a letter.

Often a diacritic mark indicates some change in the pronunciation as compared with the base letter. However, the rules for this are language-dependent, and sometimes they imply no phonetic difference. This means that e.g. the definition of "diacritic" in WWWebster is somewhat misleading when it says: "indicating a phonetic value different from that given the unmarked or otherwise marked element". J. C. Wells has written a survey of the use of diacritics in some languages: Orthographic diacritics and multilingual computing.

Quite often a keyboard has no separate key for a letter with a diacritic, even if the keyboard is capable of sending such a character (i.e. the code of a letter with a diacritic). It might be possible to compose such a character using auxiliary "composition keys". Depending on the software in use and the intended data format, it might also be possible to use some "escape" notation to denote the character.

Various approaches to enabling the use of letters with diacritics have been suggested and tried in different systems and standards:

In ISO Latin 1, there are several characters which are "precomposed" from a basic Latin letter and a diacritic:

Vowels with accents (grave, acute, circumflex, tilde, diaeresis)
À Á Â Ã Ä à á â ã ä
È É Ê   Ë è é ê   ë
Ì Í Î   Ï ì í î   ï
Ò Ó Ô Õ Ö ò ó ô õ ö
Ù Ú Û   Ü ù ú û   ü
  Ý         ý     ÿ

Other letters with diacritics in ISO Latin 1 are:
Å å ("a" with ring above)
Ç ç ("c" with cedilla)
Ñ ñ ("n" with tilde)

The meanings of an accent or other diacritic are generally different in different languages. For example, an accent on a vowel may indicate that the vowel is stressed, or that it is long, or that it is otherwise phonetically different from the sound denoted by the base letter. Sometimes accents are used just to make a distinction between words which would otherwise be similar, as in Italian "è" 'is', as opposite to "e" 'and', or in several word pairs in Spanish. (Proposed changes to Spanish orthography would reduce such use of accents.) To take a further example, o with diaeresis (ö) is sometimes used in English (e.g. in the word "coöperation") to signal that the letter "o" is pronounced separately instead being combined with the preceding vowel; in German it denotes the vowel "o umlaut" which is quite distinct from "o" in pronunciation but appears as identical to "o" at the first sorting level in alphabetic order; in Swedish it denotes a separate sound too but is positioned as the last letter of the alphabet. There are some additional notes on usage in the descriptions of the spacing diacritics.

The exact rules for using diacritics vary, depending on the language, and even within a language. In particular, in the French language, which uses diacritics extensively, there has been a reform of the official orthography in the 1990s; see the official document Rectifications de l'orthographe. It should also be noted that although it has been rather common in French to omit diacritics from capital letters, such usage seems to have been caused by technical difficulties basically. But the document Accentuation des majuscules (on the Web site of l'Académie Française) states that diacritics be used with capital letters, too. For Spanish, Ortografía de la lengua española by Real Academia Española expresss the same principle, even saying that the academy has never established a different rule on this. Thus, an upper case letter should have a diacritic according to the normal rules of the language.

ISO Latin 1 contains the following diacritics as separate and spacing characters:

´acute accent
`grave accent
^circumflex accent
~tilde
¨diaeresis
¸cedilla

It might be argued that the ISO 8859-1 standard is ambiguous regarding whether these character denote spacing or non-spacing characters. But Unicode and ISO 10646 definitely specify them as spacing.

In Unicode, there are other diacritics, too, such as breve and caron (hacek).

The term spacing as a property of a character means that the character is presented visually using a separate glyph which occupies its own space (smaller or larger), as opposite to being graphically combined with other characters using e.g. overprinting.

In addition to spacing diacritics like those mentioned above, Unicode also contains nonspacing diacritics. The are also (and officially, in Unicode terminology) called combining. A spacing diacritic like circumflex accent (^), apart from its secondary technical usages for quite different purposes, is useful only for mentioning a circumflex. It can be used e.g. to say that "the letter â is formed from the letter a by attaching the circumflex ^ to it" (although the visual appearance of ^ in a font may significantly differ from the circumflex in â). It can not be used to form the letter â. For instance, "a^" is simply a sequence of two characters; although some programs may convert it to "â", this is something that takes place outside character set issues. In contrast, the combining circumflex accent (U+0302) in Unicode has, as part of its defined meaning, the property that when following a letter, it is logically combined with it to produce a letter with a diacritic. In Unicode technical terms, a character like "â" is a "decomposable character" which is equivalent to the two-character decomposition consisting of the letter "a" followed by the combining circumflex accent (U+0302). In Unicode, there is a very large number of "precomposed" characters like "â" formed from a base character and an embedded diacritic, but sequences of base characters and combining diacritics allow an even wider repertoire to be presented. However, in practice, even those systems which have relatively good support to Unicode rarely support combining diacritics.


Originally created 2000-03-31. Structurally changed 2018-10-16. Minor modifications 2018-12-15.
This page belongs to the free information site IT and communication by Jukka "Yucca" Korpela.