Fixing Garbled Text in Win Outlook from Tiger Mail

When sending rich text with attachments, Tiger Mail produces messages with mixed character encodings. This does not bother most mail clients, but when the message contains accented characters used in European languages, or sometimes even special formatting or punctuation characters in English text, it can cause Windows Outlook to display these as Chinese, question marks, or other incorrect symbols.

Fix A: Plain Text Only. The easiest way avoid this problem is to send email as plain text (Format > Make Plain Text) instead of rich text.

Fix B: Rich Text - Terminal. You can set Mail's default encoding to UTF-8. To do this, exit Mail, open Terminal, type the following, and press return:

defaults write com.apple.mail NSPreferredMailCharset "UTF-8"

Mail's encoding can also be set to UTF-8 for a particular message by using the Message > Text Encoding menu before sending.

Fix C: Rich Text - Dingbat. Another way to force UTF-8 is to include a Unicode dingbat in the body of every message, such as Character Palette 2701. This may be the only option if your text has no accented characters or smart punctuation in it. If you don't want it to show, color it white. If you use a signature, you can try putting it there, making sure however that no graphic comes between it and the message text. (Note: You must use the Character Palette set to the proper number for this. Switching your font and using the keyboard for input will *not* produce a Unicode dingbat. See graphic here)

Some older mail clients may not understand UTF-8, and for these the plain text solution would be more appropriate.

Other options you can try are to replace the UTF-8 in the terminal command by ISO-8859-1 or Windows-1252. This may especially help if you have recipients who are also using webmail systems that cannot handle UTF-8. But I have reports that there are various circumstances in which the terminal command will not work for that, and you will have to set the encoding for each message individually. Also the limitations of webmail, combined with the bugs of Outlook, may mean that using Mail for rich text messages is just not going to work with all your clients. In that case, it is best to consider an alternative like Thunderbird or Entourage or using webmail yourself.

Note that you cannot always tell how your message is being received by how it looks when quoted back to you in a reply, so it is best to verify whether the recipient actually has problems reading your text before trying to fix anything.

Also you cannot tell what is happening to the encoding of your outgoing mail by looking at Message > Text Encoding. It will always say Automatic or Default (unless of course you change it manually) even when you have set this to something specific in the Terminal. In addition, Mail > Preferences > Appearance > Default Encoding is only for incoming messages, not for outgoings.

To check whether your mail is being given a uniform encoding, do View > Message > Raw Source on the message in your Sent folder and check to see if all the "charset=" statements are the same (there should normally be 2 of these in a rich text message).

Duplicated Texts: Some older mail clients such as Outlook Express and Eudora have been reported to display two copies of rich text messages with attachments coming from Mail. Why this happens is not known, but a possible fix is to manually set the encoding to ISO-8859-1 before sending.

Webmail: If your problem involves what recipients are seeing in webmail rather than Win Outlook, the fixes here may or may not work. The ways different webmail systems deal with encodings and with html is impossible to determine other than by experiment.

Boring Technical Details for Garbled Text

Here is the explanation for how this problem happens. Essentially there are two bugs in Outlook. The first one causes it to confuse the two encodings in a multipart incoming Mail message and read Latin-1 characters beyond ascii as if they were UTF-8. So, for example in the French phrase

pensé qu'il

it sees the é + space + q as a series of 3 bytes, E9 20 71 forming one character. (In UTF-8 a byte beginning with E signals a 3 byte character.)

E9 20 71 is not in fact a valid UTF-8 sequence, but Windows or Outlook has another bug: It doesn't care whether the sequence is valid or not. It looks at the binary for the last two bytes this sequence, which is

(E9) 00100000 01110001

and only reads the last 6 bits of each of them, assuming that the first 2 are 10 (which is what valid UTF-8 should normally have) instead of 00 and 01. So it interprets this as (E9) 10100000 10110001 or E9 A0 B1, which is valid UTF-8 for 頱. Thus "pensé qu'il" becomes "pens頱u'il."

Other accented characters may give different results, including question marks or complete absence of the character. The no-break space, which Mail uses frequently in Rich Text, is encoded as A0 in Latin-1 and may appear as a ? in Outlook because A0 is not a character in UTF-8.

Some reports indicate that Outlook has yet a third bug, in that its reply function quotes the UTF-8 copy of the text but sends it as Latin-1, transforming the quoted text into garbage wherever it has non-ascii characters.