ISO 8859-7 (ISO Latin/Greek alphabet) and windows-1253 (CP 1253) are eight-bit character codes which can be used for texts in (modern) Greek. They both contain ASCII as a subset but differ somewhat in the "upper half" of the code space. This document lists the differences in detail and comments on them. It also suggests that when either of these codes is used, the character repertoire be restricted to the intersection of the repertoires covered by the two codes. This raises the question how capital alpha with tonos should be presented.
The reader is assumed to have basic knowledge about character code concepts. If in doubt, please consult my tutorial on character code issues and, for the practical side of the matter in HTML authoring, general instructions for using different 8-bit character codes for HTML documents.
ISO 8859-7 is an international standard which is, at least according to a document by the Unicode consortium, "equivalent to ISO-IR-126, ELOT 928, and ECMA 118". The authoritative specification is the ISO 8859-7 standard, which is not available online, but Roman Czyborra's famous ISO 8859 Alphabet Soup contains a short description of ISO 8859-7, including an image showing glyphs for the upper half of the code table. There is also a description of ISO 8859-7 on a Microsoft Web page.
The preferred MIME name for the ISO 8859-7 code, or "charset", is ISO-8859-7.
Windows-1253, on the other hand, is a code defined by
Microsoft. It is however
officially registered at IANA.
The registration entry refers, in addition to printed documents, to
http://www.microsoft.com/globaldev
, which contains
a document titled Microsoft Windows Code Page : 1253
(Greek).
Generally, in 8-bit character codes (as well as in Unicode), code positions from 128 to 159 in decimal (80 to 9F in hexadecimal) have been reserved for control codes, or "control characters". This applies to ISO 8859-7, too.
But in the "Windows character sets", such as windows-1253, some of these positions have been assigned to printable characters. There are even differences between various Windows character sets (windows-1250 through windows-1258). In windows-1253, the following positions in the area have been assigned:
code | Unicode | name |
---|---|---|
0x80 | U+20AC | EURO SIGN |
0x82 | U+201A | SINGLE LOW-9 QUOTATION MARK |
0x83 | U+0192 | LATIN SMALL LETTER F WITH HOOK |
0x84 | U+201E | DOUBLE LOW-9 QUOTATION MARK |
0x85 | U+2026 | HORIZONTAL ELLIPSIS |
0x86 | U+2020 | DAGGER |
0x87 | U+2021 | DOUBLE DAGGER |
0x89 | U+2030 | PER MILLE SIGN |
0x8B | U+2039 | SINGLE LEFT-POINTING ANGLE QUOTATION MARK |
0x91 | U+2018 | LEFT SINGLE QUOTATION MARK |
0x92 | U+2019 | RIGHT SINGLE QUOTATION MARK |
0x93 | U+201C | LEFT DOUBLE QUOTATION MARK |
0x94 | U+201D | RIGHT DOUBLE QUOTATION MARK |
0x95 | U+2022 | BULLET |
0x96 | U+2013 | EN DASH |
0x97 | U+2014 | EM DASH |
0x99 | U+2122 | TRADE MARK SIGN |
0x9B | U+203A | SINGLE RIGHT-POINTING ANGLE QUOTATION MARK |
For reasons analogous to those presented in the document On the use of some MS Windows characters in HTML, the characters listed above should be avoided except (1) in documents which will be processed in one computer system only and (2) in situations where one can rely on adequate code conversions or the use of Unicode encodings.
code | ISO 8859-7 | windows-1253 |
---|---|---|
0xA1 | U+2018 LEFT SINGLE QUOTATION MARK | U+0385 GREEK DIALYTIKA TONOS |
0xA2 | U+2019 RIGHT SINGLE QUOTATION MARK | U+0386 GREEK CAPITAL LETTER ALPHA WITH TONOS |
0xA4 | unassigned | U+00A4 CURRENCY SIGN |
0xA5 | unassigned | U+00A5 YEN SIGN |
0xAE | unassigned | U+00AE REGISTERED SIGN |
0xB5 | U+0385 GREEK DIALYTIKA TONOS | U+00B5 MICRO SIGN |
0xB6 | U+0386 GREEK CAPITAL LETTER ALPHA WITH TONOS | U+00B6 PILCROW SIGN |
Note: In an old version of the ISO 8859-7:1987 to Unicode mapping table, characters in positions 0xA1 and 0xA2 were mapped to U+20BD MODIFIER LETTER REVERSED COMMA and U+20BC MODIFIER LETTER APOSTROPHE, respectively. This seems to have been an oversight, but it may have affected some interpretations of the code.
Some programs can process windows-1253 encoded data but not ISO 8859-7 encoded data. This applies for example to the version of Internet Explorer 4.0 I'm using (on WinNT); it's the "international", or English, version.
There are probably also programs which accept ISO 8859-7 but not windows-1253. And naturally there are programs which accept both, but they are not the problem here.
The safest approach would be to write the document using only such characters which appear in both codes in the same positions. Thus, one would dispense with the characters discussed above. The most common of them is probably GREEK CAPITAL LETTER ALPHA WITH TONOS.
Several methods for presenting capital alpha with tonos have been suggested:
Ά
, which should work, irrespective
of the character encoding used. And it actually works quite
often, though far from universally. Here is a test
(using a very big font for clarity)
of how it
works on your browser with its current settings, for this
document which is announced to be iso-8859-1 encoded:
Ά
In contexts like E-mail message headers and HTTP headers where the encoding used should be announced, one could then in principle use either iso-8859-7 or windows-1253. The former would refer to an international standard and the latter to a code invented by a software vendor. On the other hand, that vendor's products are rather widely used, so announcing a document as windows-1253 encoded might be a more practical solution. But this suggestion applies only to the information about encoding; the above recommendation of not using "Windows specific" or otherwise unsafe characters still applies.
For example,
if a Web page is announced with
Content-Type: text/html; charset=iso-8859-7
then people using IE 4.0 might need to
manually change the encoding to windows-1253 in order to be able to read it.
There are probably more people with this problem than there are people
(typically, on Unix systems) with the opposite problem.
Naturally you could also make a document available
in different encodings. See an example of this
at the end of the document
Using national and special characters in HTML.
In that example, the "windows-1253" and "ISO 8859-7" versions
are actually identical, applying the principle suggested above:
the code positions which have different meanings in those codes are
not used. The server has just been configured to send
them with different information about encoding, i.e. with
different charset
attributes in Content-Type
headers.
Neither of the codes contains an adequate symbol for the Greek
punctuation character
ano teleia (upper dot). Obviously it was intended that the
middle dot character be used instead,
but this is not a good solution. Using a period in superscript style
(<sup>.</sup>
) is not a logical solution either
but it might result in better appearance.
In Unicode, there is a separate character named
greek ano teleia,
U+0387
.
Although it is compatibility equivalent to
the middle dot character, the glyph for it
seems to be better suited for use as an upper dot in most fonts where it is
available.
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-7.TXT
(accessed 1999-09-08).
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1253.TXT
Note: The ISO 8859-7 was updated in 2003, adding the following assignments into ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-7.TXT (for code points that were previously unassigned):
0xA4 0x20AC # EURO SIGN 0xA5 0x20AF # DRACHMA SIGN
Disclaimer: This is only a list of documents which seem to be more or less relevant to the topic. I cannot judge their accuracy and applicability.
Date of last update: 2004-07-14.
Jukka Korpela