ISO 8859-7 vs. windows-1253

ISO 8859-7 (ISO Latin/Greek alphabet) and windows-1253 (CP 1253) are eight-bit character codes which can be used for texts in (modern) Greek. They both contain ASCII as a subset but differ somewhat in the "upper half" of the code space. This document lists the differences in detail and comments on them. It also suggests that when either of these codes is used, the character repertoire be restricted to the intersection of the repertoires covered by the two codes. This raises the question how capital alpha with tonos should be presented.

The reader is assumed to have basic knowledge about character code concepts. If in doubt, please consult my tutorial on character code issues and, for the practical side of the matter in HTML authoring, general instructions for using different 8-bit character codes for HTML documents.

What is ISO 8859-7

ISO 8859-7 is an international standard which is, at least according to a document by the Unicode consortium, "equivalent to ISO-IR-126, ELOT 928, and ECMA 118". The authoritative specification is the ISO 8859-7 standard, which is not available online, but Roman Czyborra's famous ISO 8859 Alphabet Soup contains a short description of ISO 8859-7, including an image showing glyphs for the upper half of the code table. There is also a description of ISO 8859-7 on a Microsoft Web page.

The preferred MIME name for the ISO 8859-7 code, or "charset", is ISO-8859-7.

What is windows-1253

Windows-1253, on the other hand, is a code defined by Microsoft. It is however officially registered at IANA. The registration entry refers, in addition to printed documents, to http://www.microsoft.com/globaldev, which contains a document titled Microsoft Windows Code Page : 1253 (Greek).

A fundamental difference: code area 128 - 159 (decimal)

Generally, in 8-bit character codes (as well as in Unicode), code positions from 128 to 159 in decimal (80 to 9F in hexadecimal) have been reserved for control codes, or "control characters". This applies to ISO 8859-7, too.

But in the "Windows character sets", such as windows-1253, some of these positions have been assigned to printable characters. There are even differences between various Windows character sets (windows-1250 through windows-1258). In windows-1253, the following positions in the area have been assigned:

code Unicode name
0x80 U+20AC EURO SIGN
0x82 U+201A SINGLE LOW-9 QUOTATION MARK
0x83 U+0192 LATIN SMALL LETTER F WITH HOOK
0x84 U+201E DOUBLE LOW-9 QUOTATION MARK
0x85 U+2026 HORIZONTAL ELLIPSIS
0x86 U+2020 DAGGER
0x87 U+2021 DOUBLE DAGGER
0x89 U+2030 PER MILLE SIGN
0x8B U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x91 U+2018 LEFT SINGLE QUOTATION MARK
0x92 U+2019 RIGHT SINGLE QUOTATION MARK
0x93 U+201C LEFT DOUBLE QUOTATION MARK
0x94 U+201D RIGHT DOUBLE QUOTATION MARK
0x95 U+2022 BULLET
0x96 U+2013 EN DASH
0x97 U+2014 EM DASH
0x99 U+2122 TRADE MARK SIGN
0x9B U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

For reasons analogous to those presented in the document On the use of some MS Windows characters in HTML, the characters listed above should be avoided except (1) in documents which will be processed in one computer system only and (2) in situations where one can rely on adequate code conversions or the use of Unicode encodings.

Other differences: capital alpha with tonos and some special characters

code ISO 8859-7 windows-1253
0xA1 U+2018 LEFT SINGLE QUOTATION MARK U+0385 GREEK DIALYTIKA TONOS
0xA2 U+2019 RIGHT SINGLE QUOTATION MARK U+0386 GREEK CAPITAL LETTER ALPHA WITH TONOS
0xA4 unassignedU+00A4 CURRENCY SIGN
0xA5 unassignedU+00A5 YEN SIGN
0xAE unassignedU+00AE REGISTERED SIGN
0xB5 U+0385 GREEK DIALYTIKA TONOS U+00B5 MICRO SIGN
0xB6 U+0386 GREEK CAPITAL LETTER ALPHA WITH TONOS U+00B6 PILCROW SIGN

Note: In an old version of the ISO 8859-7:1987 to Unicode mapping table, characters in positions 0xA1 and 0xA2 were mapped to U+20BD MODIFIER LETTER REVERSED COMMA and U+20BC MODIFIER LETTER APOSTROPHE, respectively. This seems to have been an oversight, but it may have affected some interpretations of the code.

Which one to use?

Some programs can process windows-1253 encoded data but not ISO 8859-7 encoded data. This applies for example to the version of Internet Explorer 4.0 I'm using (on WinNT); it's the "international", or English, version.

There are probably also programs which accept ISO 8859-7 but not windows-1253. And naturally there are programs which accept both, but they are not the problem here.

The safest approach would be to write the document using only such characters which appear in both codes in the same positions. Thus, one would dispense with the characters discussed above. The most common of them is probably GREEK CAPITAL LETTER ALPHA WITH TONOS.

Several methods for presenting capital alpha with tonos have been suggested:

In contexts like E-mail message headers and HTTP headers where the encoding used should be announced, one could then in principle use either iso-8859-7 or windows-1253. The former would refer to an international standard and the latter to a code invented by a software vendor. On the other hand, that vendor's products are rather widely used, so announcing a document as windows-1253 encoded might be a more practical solution. But this suggestion applies only to the information about encoding; the above recommendation of not using "Windows specific" or otherwise unsafe characters still applies.

For example, if a Web page is announced with
Content-Type: text/html; charset=iso-8859-7
then people using IE 4.0 might need to manually change the encoding to windows-1253 in order to be able to read it. There are probably more people with this problem than there are people (typically, on Unix systems) with the opposite problem.

Naturally you could also make a document available in different encodings. See an example of this at the end of the document Using national and special characters in HTML. In that example, the "windows-1253" and "ISO 8859-7" versions are actually identical, applying the principle suggested above: the code positions which have different meanings in those codes are not used. The server has just been configured to send them with different information about encoding, i.e. with different charset attributes in Content-Type headers.

A note on "ano teleia"

Neither of the codes contains an adequate symbol for the Greek punctuation character ano teleia (upper dot). Obviously it was intended that the middle dot character be used instead, but this is not a good solution. Using a period in superscript style (<sup>.</sup>) is not a logical solution either but it might result in better appearance.

In Unicode, there is a separate character named greek ano teleia, U+0387. Although it is compatibility equivalent to the middle dot character, the glyph for it seems to be better suited for use as an upper dot in most fonts where it is available.

Sources

ISO 8859-7:1987 to Unicode
from ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-7.TXT (accessed 1999-09-08).
cp1253 to Unicode table
from ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1253.TXT

Note: The ISO 8859-7 was updated in 2003, adding the following assignments into ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-7.TXT (for code points that were previously unassigned):

0xA4  0x20AC  #       EURO SIGN
0xA5  0x20AF  #       DRACHMA SIGN

Related information

Disclaimer: This is only a list of documents which seem to be more or less relevant to the topic. I cannot judge their accuracy and applicability.


Date of last update: 2004-07-14.

Jukka Korpela