Issues in Unicode email

Issues in OE

In Outlook Express (OE), if you have the default encoding set to ISO-8859-1 but you have entered characters beyond the repertoire (ISO Latin 1) supported by that encoding, OE may convert the message into HTML format or into a different encoding, such as UTF-8. This depends on the program settings. However, if all the extra characters are Windows Latin 1 characters, such as the en dash and the curly quotation marks, OE may simply map them to “nearest” ISO Latin 1 characters. Thus, for example, “–” may get turned into a hyphen, and “š” may get simplified to plain “s.” There is no warning about this. This is not nice at all.

Thus, it is probably best to set the encoding used by OE to windows-1252 rather than ISO-8859-1. Windows-1252 is what OE really thinks it’s using. As a drawback, you will send windows-1252 even when ISO-8859-1 would suffice, and some recipients might be unable to handle windows-1252.

If a message contains characters outside the repertoire of the default encoding, OE often asks the user what to do, with the suggestion of sending as UTF-8 as the default. This is fine, but there are exceptions, in addition to the above-mentioned feature. If all the extra characters are Latin letters, OE maps them to their Ascii simplifications, e.g. ā and ă both to a, ı (dotless i) to i, and ć and č both to c. There is no warning about this. The only cure is to manually set the encoding to something suitable (typically, UTF-8, or maybe ISO-8859-2 or some other ISO-8859 encoding in some cases) before sending the message.

Issues in Thunderbird

These problems seem to have been fixed in Thunderbird. The rest of this document is thus just historical.

Thunderbird has settings that you should probably check and fix, because they may affect the encoding of the email messages that you send, in unexpected and perhaps unpleasant ways. Use the command Tools→Options and click on Fonts to open the Fonts & Languages settings.

Under the heading Character encoding there, make sure that both checkboxes are unchecked.

If the first checkbox is checked, Thunderbird will incorrectly ignore MIME headers that specify the character encoding. It will interpret the message body as being in the encoding that has been set as the default in Thunderbird. This may seriously distort the data, so that you immediately see that there is something wrong, or it may just make some characters interpreted incorrectly. There is no good reason to the presence of this option. Notice that even when the checkbox is not checked, you can change the interpretation of a particular message, if you think that its MIME headers contain incorrect information about the encoding. (Just select View/Character encoding.)

If the second checkbox is checked (as it may be by default), Thunderbird will respond to a message using the character encoding that has been set as the default in its settings. This is not friendly at all. If someone sends you, say, an ISO-8859-2 encoded message and your email program can handle it, then it would be adequate and polite to send a response message using ISO-8859-2, too. It is not very probable that the recipient’s email program cannot handle your default encoding if it is ISO-8859-1, but why take the risk? Besides, changing the encoding in responses often messes up quotations, since not all email programs can perform character encoding conversions when inserting quoted text.

The default encoding is set in the same window under the same heading, using the two dropdown menus. They both typically default to “Western (ISO-8859-1),” which is usually fine. When sending E-mail, Thunderbird will usually ask you to select a different encoding when needed, i.e. when the default encoding does not contain all the characters needed (but see below for an exception). The default encoding for incoming mail affects only such incoming mail that does not specify its encoding, if you first checkbox discussed above is not checked. You might consider setting it to “Western (Windows-1252),” but this is really what “Western (ISO-8859-1)” means here.

If you have set the default encoding to ISO-8859-1 and you use the additional characters in Windows Latin 1, Thunderbird will silently switch to windows-1252. This is not a big problem, but sometimes it implies that you are sending a windows-1252 encoded message when you didn’t mean to and perhaps shouldn’t.


I wrote this document (on September 25, 2006) as an appendix to the page Errata and annotations for Unicode Explained. It relates to subsection “Sending Unicode Email” (on pages 53–54). Updated November 25, 2011.

Jukka K. Korpela