This document describes the character encodings supported by the Nvu web page editor. The encodings are briefly described, and their usefulness on the web is commented.
For the concept of character encoding, please consult my tutorial on character code issues or the Nvu help.
By default, Nvu saves an HTML document in ISO-8859-1 (ISO Latin 1) encoding. You can choose another encoding by selecting
in the menu. This opens first a dialog window where you can select among a large set of encodings.
The encodings appear with a common name followed by a more official
(MIME) name in parentheses. However, not all name in parentheses
are official, and they may differ from the exact official spelling.
When you save a document in a particular encoding, Nvu generates
a meta
tag that specifies that encoding,
using its official name, e.g.
<meta content=
.
(Usually such tags are used with the
http-equiv
and
content
attributes in the reverse order,
but the order is not significant.)
Nvu represents characters as entity references
(e.g., é
for é)
or as character references
(e.g., Й
for Й)
to the extent that characters cannot be written as such in the
selected encoding. Moreover, it may use
such representations for some characters even if they could appear
as such. This depends on the settings; select
,
,
,
to view and modify them.
Practical notes:
The following table presents the entries in the character encoding
menu.
The second column contains the name actually appearing in the
meta
tag that Nvu generates.
Practical notes are given in the third column. The
word “unregistered” means that the encoding
is not registered according to MIME specifications.
Menu entry | Charset name | Notes | |
---|---|---|---|
Arabic (IBM-864) | IBM864 | DOS code page for Arabic, cp864 | |
Arabic (ISO-8859-6) | ISO-8859-6 | ISO Latin/Arabic | |
Arabic (MacArabic) | – | x-mac-arabic | Macintosh encoding for Arabic, unregistered |
Arabic (Windows-1256) | windows-1256 | Windows Arabic | |
Armenian (ARMSCII-8) | armscii-8 | “Armenian ASCII”, unregistered | |
Baltic (ISO-8859-13) | ISO-8859-13 | ISO Latin 7, “Baltic Rim” | |
Baltic (ISO-8859-4) | ISO-8859-4 | ISO Latin 4, “North European” | |
Baltic (Windows-1257) | windows-1257 | Windows Baltic | |
Celtic (ISO-8859-14) | ISO-8859-14 | ISO Latin 8; no wide support | |
Central European (IBM-852) | IBM852 | DOS code page for Central European, cp852 | |
Central European (ISO-8859-2) | ISO-8859-2 | ISO Latin 2 | |
Central European (MacCE) | x-mac-ce | Macintosh encoding for Central European, unregistered | |
Central European (Windows-1250) | windows-1250 | Windows Latin 2 | |
Chinese Simplified (GB18030) | gb18030 | Newer encoding for Chinese in Simplified writing system | |
Chinese Simplified (GB2312) | gb2312 | Older encoding for Chinese in Simplified writing system | |
Chinese Simplified (GBK) | x-gbk | An extension of GB2312 (MIME name: GBK) | |
Chinese Simplified (HZ) | HZ-GB-2312 | An encoding designed for E-mail | |
Chinese Simplified (ISO-2022-CN) | ISO-2022-CN | ISO 2022 based encoding for Chinese | |
Chinese Traditional (Big5) | Big5 | Chinese encoding, used especially in Taiwan | |
Chinese Traditional (Big5-HKSCS) | Big5-HKSCS | Chinese encoding, used especially in Hong Kong | |
Chinese Traditional (EUC-TW) | x-euc-tw | Chinese encoding, unregistered | |
Croatian (MacCroatian) | x-mac-croatian | Macintosh encoding for Croatian, unregistered | |
Cyrillic (IBM-855) | IBM855 | DOS code page for Cyrillic, cp855 | |
Cyrillic (ISO-8859-5) | ISO-8859-5 | ISO Latin/Cyrillic | |
Cyrillic (ISO-IR-111) | ISO-IR-111 | ECMA Cyrillic | |
Cyrillic (KOI8-R) | KOI8-R | Russian version of KOI8 | |
Cyrillic (MacCyrillic) | x-mac-cyrillic | Macintosh encoding for Cyrillic, unregistered | |
Cyrillic (Windows-1251) | windows-1251 | Windows Cyrillic | |
Cyrillic/Russian (CP-866) | IBM866 | DOS code page for Russian | |
Cyrillic/Ukrainian (KOI8-U) | KOI8-U | Ukrainian version of KOI8 | |
Cyrillic/Ukrainian (MacUkrainian) | x-mac-ukrainian | Macintosh encoding for Ukrainian | |
Farsi (MacFarsi) | x-mac-farsi | Macintosh encoding for Farsi (Persian), unregistered | |
Georgian (GEOSTD8) | GEOSTD8 | Encoding for the Georgian language, unregistered | |
Greek (ISO-8859-7) | ISO-8859-7 | ISO Latin/Greek | |
Greek (MacGreek) | x-mac-greek | Macintosh encoding for Greek, unregistered | |
Greek (Windows-1253) | windows-1253 | Windows Greek | |
Gujarati (MacGujarati) | x-mac-gujarati | Macintosh encoding for Gujarati, unregistered | |
Gurmukhi (MacGurmukhi) | x-mac-gurmukhi | Macintosh encoding for Gurmukhi, unregistered | |
Hebrew (IBM-862) | IBM862 | DOS code page for Hebrew, cp862 | |
Hebrew (ISO-8859-8-I) | ISO-8859-8-I | ISO-8859-8 (ISO Latin/Hebrew) in logical order | |
Hebrew (MacHebrew) | x-mac-hebrew | Macintosh encoding for Hebrew, unregistered | |
Hebrew (Windows-1255) | windows-1255 | Windows Hebrew | |
Hindi (MacDevanagari) | x-mac-devanagari | Macintosh encoding for Devanagari, unregistered | |
Icelandic (MacIcelandic) | x-mac-icelandic | Macintosh encoding for Icelandic, unregistered | |
Japanese (EUC-JP) | EUC-JP | Common Japanese encoding | |
Japanese (ISO-2022-JP) | ISO-2022-JP | Another common Japanese encoding | |
Japanese (Shift_JIS) | Shift_JIS | Yet another common Japanese encoding | |
Korean (EUC-KR) | EUC-KR | Common Korean encoding | |
Nordic (ISO-8859-10) | ISO-8859-10 | ISO Latin 6, “Nordic” (Sámi etc.); no wide support | |
Romanian (ISO-8859-16) | ISO-8859-16 | ISO Latin 10; no wide support | |
Romanian (MacRomanian) | x-mac-romanian | Macintosh encoding for Romanian, unregistered | |
South European (ISO-8859-3) | ISO-8859-3 | ISO Latin 3, for Maltese and Esperanto; no wide support | |
Thai (ISO-8859-11) | ISO-8859-11 | ISO Latin/Thai | |
Thai (TIS-620) | TIS-620 | Encoding for Thai, national standard | |
Thai (Windows-874) | windows-874 | Windows Thai | |
Turkish (IBM-857) | IBM857 | DOS code page for Turkish, cp857 | |
Turkish (ISO-8859-9) | ISO-8859-9 | ISO Latin 5 | |
Turkish (MacTurkish) | x-mac-turkish | Macintosh encoding for Turkish, unregistered | |
Turkish (Windows-1254) | windows-1254 | Windows Turkish | |
Unicode (UTF-16 Big Endian) | UTF-16BE | UTF-16 Big Endian (high byte first) | |
Unicode (UTF-16 Little Endian) | UTF-16LE | UTF-16 Little Endian (low byte first) | |
Unicode (UTF-16) | UTF-16 | UTF-16, with endianness to be inferred | |
Unicode (UTF-32 Big Endian) | UTF-32BE | UTF-32 Big Endian | |
Unicode (UTF-32 Little Endian) | UTF-32LE | UTF-32 Little Endian | |
Unicode (UTF-32) | UTF-32 | UTF-32, with endianness to be inferred | |
Unicode (UTF-8) | UTF-8 | UTF-8, the preferred Unicode encoding on the Internet | |
User Defined | x-user-defined | Unspecified encoding, usually for use with specific font | |
Western (IBM-850) | IBM850 | DOS code page for West European languages, cp850 | |
Western (ISO-8859-1) | ISO-8859-1 | ISO Latin 1, the default encoding | |
Western (ISO-8859-15) | ISO-8859-15 | ISO Latin 9, with euro sign, not widely supported | |
Western (MacRoman) | x-mac-roman | Macintosh encoding for Western European, unregistered | |
Western (Windows-1252) | windows-1252 | Windows Latin 1 | |
Vietnamese (TCVN) | x-viet-tcvn5712 | TCVN 5712, VISCII-2, unregistered | |
Vietnamese (Windows-1258) | windows-1258 | Windows Vietnamese | |
Vietnamese (VISCII) | VISCII | “Vietnamese extension to ASCII” | |
Vietnamese (VPS) | x-viet-vps | VPS, unregistered |
Note: Nvu generates an unregistered name x-gbk for the GBK
encoding, although this encoding has a MIME registration
under the name GBK.
You can change the meta
tag in the Source
mode in Nvu and save the file to fix this.