One of the best-kept secrets about MacOS is the built-in support it
contains for reading and writing languages beyond English, including ones
that use non-Latin scripts and characters. This document explains these capabilities and provides various resources to help users exploit them to the maximum degree possible. Comments and additions from readers are most welcome.
These comments are based on OS X 10.3, Panther (build 7B85), issued 10/24/03. A similar text relating to 10.2 (Jaguar), as well as 10.0 and 10.1, can be found here.
OS X is a complex animal. For the first time, Apple has deployed Unicode on a broad scale in its operating system, which provides the potential for great linguistic flexibility. At the same time, OS X operates in 3 different modes -- Classic, Carbon, and Cocoa -- each of which have different capabilities, and successive versions have had very substantial improvements. The Darwin OS, based on Unix, is also available. So it is sometimes difficult to generalize about how applications, languages, and modes work together. Basic Apple documentation can be found in the Help menu of the Finder if you put "languages" in the Question box.
Unlike OS 9 and earlier Mac systems, which were produced in localized versions for foreign countries, OS X offers the choice of 15 system languages out of the box -- English, Japanese, French, German, Spanish, Italian, Dutch, Swedish, Danish, Norwegian, Finnish, Traditional Chinese, Simplified Chinese, Korean, and Brazilian Portuguese. These languages, which affect system-wide menus and dialogues, can also be changed, for your next login, via the Languages menu of the International pane in System Preferences. Just move your preferred language to the top of the list.
"Fast User Switching," activated in the Accounts preferences, enables you to quickly rotate your screen, with an interesting "cube effect," among different system languages if you set up separate users for them. Be careful to keep your keyboard the same for all login and logout operations, or you can find your password will not work.
If you poke the "Edit" button in the Language menu to see all varieties available, you get a list of 103. These relate primarily to user preferences regarding menus and dialogues for applications. Also OS X uses the order of languages set by the user in this menu to determine default fonts and collation and which encodings are available in Mail.app. So if Chinese is ahead of Japanese in this list, Chinese fonts should normally get first choice by the system in any ambiguous situation. You should make sure that any languages you want to read or write are on the list.
If you move a language to the top of the Languages Pane for which you do not have the localization files installed, you may get strange behavior or disable your system. To recover from this, a reinstall may be necessary, or at least installing the files for the language you chose. Or you may be able to manually change the preference file (Users/username/Library/Preferences/.GlobalPreferences.plist).
To install a missing system localization, you can do so directly by running the installers found on the second OS X 10.3 CD. If you have an Install DVD, instructions can be found here.
Many applications have their own localizations independent of the system. To activate these, do File/Get Info on the application's icon, select the Languages tab, and choose the language you want the application to be in. Apple's information on how to localize applications can be found here.
Note that the system language is distinct from the keyboard language, which determines what you can type. The latter is set from the Input Menu in the International Pane. Also the language of the login function is fixed at whatever is chosen upon installation, though you can probably change this by logging in as Root and setting it there, or with the program TinkerTool System, or by reinstalling.
If you do not want all the 15 system languages, be sure to do a Custom Install. To get rid of system languages after they have been installed (normally to liberate hard drive space, about 50MB per language), you can check out the programs Monolingual, and DeLocalizer.
If you buy a machine with OS X outside the US, you should be aware that the OS 9/Classic (and possibly AppleWorks) that comes with it will most likely be in the local language only. If you want English instead, you will probably need to buy and install another copy of this software, which is not part of OS X. The install disks for the iMac G5 reportedly include a nearly full set of localizations (the same as for OS X, except Portuguese) for AppleWorks, and those for the Mac Mini have a full set of localizations for OS 9/Classic.
Use the Formats Tab of the International pane to set your preferred locale for date, time, and number formats.
Language input can be done in either OS X proper or in Classic. OS X switches automatically to OS 9, operating in "Classic" mode, whenever you open an application which is only designed for the older systems. In this situation your system language is that of the OS 9 which is being used, and you have access to all the language kits that you have installed on it. See the OS 9 section above for installation info. If your system does not appear to offer an OS 9 Custom Install with language kits, look on the CD or on your hardrive in the folder OS9 Applications/Apple Extras for an installer. If you want to use the carbonized AppleWorks in Classic mode, with many of the System 9 language kits available, control-click on the application icon, select Show Package Contents, open Contents and then MacOSClassic, and double-click on AppleWorks6.
If you are in OS X proper you can select over 50 keyboards covering Arabic, Armenian, Bulgarian, Catalan, Cherokee, Chinese (simplified and traditional), Croatian, Czech, Danish, Dari, Devanagari, Dutch, English, Estonian, Faroese, Finnish, French, German, Greek, Gujarati, Gurmurkhi (Punjabi), Hawaiian, Hebrew, Hungarian, Icelandic, Inuktitut, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Northern Sami, Norwegian, Pashto, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Uzbek, Vietnamese, and Welsh plus Unicode Hex and US Extended (formerly called Extended Roman) .
In addition to the keyboards, you can choose a Japanese Kana palette, the Character Palette, the Input Mode Palette (a floating version of part of the "flag" menu), and the Keyboard Viewer (formerly KeyCaps). To activate these you go the Desktop menu, then to System Preferences, International, and Input Menu. The Options button of the Input Menu lets you see the possible keyboard shortcuts for switching scripts and keyboards.
To have all possible languages, make sure you do a Custom Install and check the box "Fonts for Additional Languages." If you do not do a Custom Install, you could be missing some fonts that may be useful or essential for Arabic, Hebrew, Thai, Cyrillic, Devanagari, Gujarati, Punjabi, Armenian, Cherokee, and Inuktitut. If you have the 3 CD system installer, you can add these separately by inserting the third OS X 10.3 disk, opening the "AdditionalFonts.pkg" and running the installer. If you have an Install DVD, note the name of the installation DVD as it appears on the desktop (such as "iMac Software"), then do this: From the Finder's Go menu, choose Go to Folder. Type: "/volumes/{name of your DVD}/System/Installation/Packages". Click Go. This procedure is also explained here.
To type "accented characters" you do not necessarily need to switch to a specialized language keyboard. The standard Mac US keyboard has "dead keys" for 5 common accents activated via the Option key, and the US Extended keyboard has the same for many other diacritical marks. A Windows-style US International keyboard is also available here. Opening the Keyboard Viewer (located in "flag" menu if activated in the Input Menu pane) and poking the physical Option key will indicate how this works. Also here is a chart.
Many of the available keyboards can be used with all the Carbon and Cocoa programs that run on OS X. Just as in OS 9, there is a "Flag" menu at the top right where you select the language. When Chinese or Korean are selected you will get the input methods familiar from OS 9, but the options formerly in the "pencil menu" are now found at the bottom of the "flag" menu. The Japanese IM differs from the other two in that its options are listed separately under Kotoeri in the "flag" menu. For some (unfortunately somewhat outdated) info on how to use these input methods and links to manuals, see the section above on Typing Foreign Language Texts in OS 9. For Chinese (and sometimes also Japanese and Korean as concerns applications) the key info site is the
OS X Kotoeri includes an interesting "reverse conversion" command that will convert kanji text into kana, which can then also be transliterated into romaji. The Japanese IM can switch between Roman and direct Kana input via its Preferences pane (first tab, first item), and also allows you to choose your Roman input keyboard (first tab, last item). For Chinese and Korean Roman input it appears you are confined to English Qwerty.
Because of the Unicode capabilities of OS X, Chinese/Japanese/Korean users potentially have access to a vastly increased number of characters (used mainly in historical documents) compared to OS 9. See the section on Unicode below for more information.For users who need the capability of composing Asian languages in vertical, right-to-left format, or with "Ruby" annotations, Word2004 or NeoOffice/J are probably the most practical choice. Also not all Asian fonts have proper typographical features for vertical text -- the Hiragino Japanese fonts that come with OS X do, however.
"Unicode" keyboards, including Vietnamese, Turkish, Thai, Arabic, Persian, Hebrew, all Afhani and Indic languages, Icelandic, Faroese, Sami, Greek, Romanian, Serbian, Croatian, Slovenian, Armenian, Cherokee, Inuktitut, Welsh and keyboards with "extended" in their names, normally require a Unicode-savvy program to function, which excludes MS Office X (prior to the 2004 version) , AppleWorks, and many DTP programs. Examples of Unicode-savvy Apple programs are iWork (Pages and Keynote), TextEdit, Stickies, Address Book, Mail, iChat, iTunes, WorldText 1.x and the Finder.
Devanagari input/display has a bug where repha plus consonant plus nukta does not produce correct ligatures. This can be fixed by using the font Devanagari MTS (from the OS 9 Indian Kit) instead of Devanagari MT. Devanagari half-form consonants used in alternative conjuncts, created by typing Halant plus Nukta, do not work in 10.3.0, but this is fixed in one of the later updates. Also Devanagari filenames are not displayed correctly in the Finder.
Unicode Word processors and similar programs worth looking at include Nisus Writer Express (with automatic keyboard and font activation for any language chosen), Mellel (with excellent Hebrew/Arabic support), NoteTaker,ThinkFree Write,AbiWord, and NeoOffice/J. Word 2004 has Unicode support but (like the Adobe CS products) can't do RTL or complex scripts or handle all combining diacritics in certain fonts.
A fully Unicode-capable page layout and design application is Create, which can use all OS X keyboards for publishing to print or to web. Adobe InDesign CS and Illustrator CS (as well as Photoshop CS) are also generally Unicode-savvy, but unfortunately support for RTL, complex scripts, and combining diacritics is lacking. Special versions of Adobe apps for Middle East languages can be found here.For TeX, have a look at the program XeTeX.
Unicode-capable HTML editors include GoodPage,Web Minimalist,Taco HTML Edit, DreamWeaver MX 2004 (but Arabic support is very limited), Mozilla Composer, and OmniWeb. Some other programs that can use the Unicode keyboards are the OmniOutliner 1.2 outlining application, the OmniGraffle diagramming/charting program, and the notepad program MoosePad.
Unicode-savvy database programs include iData, FileMaker Pro 7, MySQL 4.1, FrontBase, PostgreSQL, and OpenBaseSQL.
For video editing, Final Cut Express 2 (and Pro) and iMovie HD have the ability to do titles in Unicode scripts, but iMovie 4 and earlier do not. iMovie HD can do Arabic/Hebrew only in the 3DSpin option, others are in reversed letter order. LiveType has some limitations.Rolling Credits can do movie credits in all Unicode scripts.
The music composing programs Finale and Sibelius are not yet Unicode-savvy. Neither is the score editor of Apple's Logic Express/Pro 7.
Input of RTL (Right-to-left) scripts like Arabic and Hebrew poses special challenges for word processors and other programs. The program Mellel mentioned above is especially designed to deal with these. Pages can handle copy/paste well, but keyboard input in probably too buggy to use. In TextEdit, for best results use rich text mode and activate the menu item Format/Text/Writing Direction/Right to Left. For other programs it may help to use the add-on Direction Service or Writing Direction Menu.
Non-Apple hacks/add-ons are also available for doing non-Unicode Greek and Thai. Also see this site for an add-on that does Greek, Turkish, Croatian and Rumanian.
Unicode Mail programs are covered in the Email section further down the page.
For more info on the significance of Unicode and on using the US Extended and Unicode Hex keyboards, see the section on Unicode below.
If you want make your own keyboards, there are a couple different approaches, often depending on whether a Unicode keyboard is required. Apple Tech Note 2056 has some information on various options. For Unicode keyboards, you can compose an XML .keylayout file along the lines of those contained in /System/Library/Keyboard
Layouts/Unicode.bundle/Contents/Resources. An online utility for doing this can be found here. Other keyboard editing program can be found here and here. For non-Unicode scripts, you can take an existing keyboard from OS9, rename it as a .rsrc file, and put it into /Library/Keyboard Layouts/. You can also modify such keyboards using the ResEdit program. Here are some instructions for editing a kchr resource and this site has similar instructions in French.
If there is a Unicode keyboard that you want to use in a non-Uncode-savvy app, like AppleWorks, you may be able to modify it to work in some cases. For example, in the Slovenian.keylayout file, change the id code number to something positive and also change the keyboard group number from 126 to 29 (for the CE script). Then change the name to SlovenianCE.
Keyboards for Runic Scripts, Czech and Slovak QWERTY, Lao, Tibetan, Urdu, Biblical Hebrew, Esperanto, Azeri, Pinyin, Hausa, Mongolian Cyrillic, Manchu, Old Persian, and Navajo (plus alternative keyboards for Farsi/Persian, Brazilian, Polish, Canadian, Arabic, French, UK, Spanish, and US International) can be found here. Also available is a super-comprehensive Latin Extended Keyboard. and a keyboard for Aramaic. Another source for QWERTY keyboards in several languages, plus Armenian, Georgian, and Thaana, is here.
This site has Vietnamese keyboards that may work better than Apple's with MS Office 2004 and Adobe CS products for certain fonts.
Here is a Windows style US-International keyboard for OS X and OS 9.
For a set of Windows-style keyboards in several languages, download the Logitech Control Center. Do not install this, but do Control-Click on the package to get at the contents, and copy LCCKCHR.rsrc from Resources to one of your Keyboard Layouts folders.
For a non-Apple Cyrillic keyboard and font that covers Slavonic, Old Church Slavonic, Russian, Byelorusian, Ukrainian, Serbian, Bulgarian, Macedonian, and non-Slavic Cyrillic scripts, check out Slavija.org.
For a non-Apple Chinese input system, you can check out PanALEX. The Chinese IM has a large number of input options, and there is onscreen help available in English. For Taiwanese, see Jason Cox's Page. An experimental input method for Vietnamese Nom is available here.
If you need to make unusual accented characters, like macroned vowels, in an app which is not Unicode-savvy, like AppleWorks or Word, you can try the Czech or Slovak keyboards and use one of the fonts ending in CE.
For information on IPA fonts and keyboards or keyboards for Ancient Greek, see the Other Resources by Language section at the end of this page.
To install keyboards that you download or create yourself, put them in Users/username/Library/Keyboard Layouts (or in Library/Keyboard Layouts if all usernames need access to them). Then go to System Preferences/International/Input Menu and check the box for the new keyboard. You may need to log out and log in again to have it appear.
A number of online pseudo-keyboards, covering Armenian, Bengali, Cherokee, Cirth, Devanagari, Etruscan, Georgian, Gothic, Gujarati, Gurmukhi, Kannada, Khmer, Lao, Malayalam, Myanmar, Ogham, Old Italic, Old Persian Cuneiform, Oriya, Runic, Tamil, Telugu, Tengwar, and Ugaritic, are available here. Many of these can be used with the more advanced browser to create strings of odd scripts for copy/paste operations. For best results, use Opera 6.
For a source of physical keyboard overlays for various languages, see DataCal or Hooleon or SpeedSkin.
OS X includes a system-wide spell-checker, which is accessible from any Cocoa program via the Edit/Spelling menu. In addition to US English, 10.3 has dictionaries for Australian, British, and Canadian English, German, Spanish, French, Italian, Dutch, Portuguese, and Swedish. A non-Apple Cocoa spell-checker covering Breton, Catalan, Czech, Danish, Dutch, German, Greek, Esperanto, Faroese, French, Icelandic, Italian,Norwegian, Polish, Portuguese, Russian, Slovak, Spanish, Swedish, Ukrainian, and Welsh is CocoaSpell. Hebrew spell checking can be found here, and Finnish here. Persian/Farsi spell checking (and much other useful stuff for this language) can be had at the Iranian Mac User Group.
MS Office 2004 comes with proofing tools for English , French , Spanish, Italian, Japanese, Norwegian, German, Danish, Swedish, Portuguese, Finnish, and Dutch. Pages can only use Apple's own dictionaries for spellchecking.
The stand-alone spell-checker Excalibur can be used in both Cocoa and Carbon environments, and has dictionaries available for British, Catalan, Danish, French, Dutch, German, Haitian, Indonesian, Italian, Manx, Norwegian, Portuguese, Spanish, and Swedish. SpellCatcherX does English, Dutch, French, German, Italian, Portuguese, Swedish, and Spanish.
The Safari add-on Live Dictionary offers Chinese/Japanese plus access to FreeDict dictionaries for Africaans, Czech, Danish, English, French, German, Greek, Hungarian, Irish, Italian, Japanese, Latin, Nederlands (Dutch) Portuguese, Russian, Serbo-Croatian, Swedish, Slovak, Spanish, Swahili, Swedish, Turkish and Welsh.
For doing this the browser which comes with OS X, Safari, is one of the best. The latest version of Mozilla/Firefox, Opera, Camino, and Netscape also offer good performance with a wide variety of scripts. The Mac-only browser OmniWeb is also excellent and can do Arabic and Hebrew starting with version 5.1. Internet Explorer is not recommended.
When appropriate fonts are installed, the better browsers have many encoding choices and can display a large number of languages and scripts, even on the same page. They convert all incoming characters to Unicode, and then search all installed fonts for corresponding glyphs. Opera 6, the latest version of Mozilla, and (with some glitches) Safari are the only browsers which can read pages that employ Unicode beyond the Basic Multilingual Plane (BMP). A good source for info on several OS X browsers is the
Mac Multilingual
Browser Page
For information on Safari support for International Domain Names, and possible security issues, see this article.
Some scripts (e.g. Bengali, Telugu, Myanmar, Khmer, Lao, Tibetan) are still put on the web using non-standard encodings and embedded font (.eot, .pfr) technology that only works with Windows browsers. Reading these may require the downloading of custom fonts (usually available from the site itself) and experimenting with browsers, encodings, and font preferences. The Opera browser seems to work better than others in such situations.
A full list of fonts included with OS X is here.OS X, unlike OS 9, can make routine use of big Windows fonts which contain characters for dozens of languages. Note, however, that viewing complex scripts which require reordering, contextual shaping, or stacking of characters (such as Arabic, Devanagari, Tibetan, Classic Mongolian, and Thai) requires a combination of font and rendering engine technology. On the Mac this is accomplished via an AAT (Apple Advanced Typography) font and ATSUI, while Windows uses an OpenType font plus Uniscribe. The result is that when you select a Windows font in OS X, complex scripts are unlikely to display correctly, and an Apple font should be used if available. Mellel is an OS X app that uses some OpenType layout tables. Instructions for using Apple's font tools to add some AAT features to other fonts can be found here.
The largest easily available Windows font I am aware of is Code2000 with 30,000 characters. If you have access to Arial Unicode MS, provided with certain MS products, this is still larger, with 50,000 characters. The multilingual capabilities of various browsers under OS X can be demonstrated by installing these and going to UTF-8 Sampler or Alan Wood's Unicode Sample Pages.
OS X has a built-in font inspector called the Character Palette, found in the Flag (keyboards) menu. This shows all the characters in a selected fonts in any Unicode range, and allows you to copy/paste them into documents. Note that copy/paste will not work for many characters into non-Unicode-savvy apps like Word X and AppleWorks. Similar utilities are UnicodeChecker, and Unicode Font Info. Some specialized fonts, for example those for music symbols, will not display properly in Character Palette (or Keyboard Viewer). The best way to input from these is to use a program like PopChar or Font Explorer.
The behavior of fonts used for non-Roman scripts and languages like Vietnamese can sometimes be adjusted to suit particular needs. Open the Font panel, select the font, hit the "gear wheel" at lower left, and select "Typography" to see any options which may be available.
TextEdit can save plain text in 86 different encodings. To see them all, open the encoding menu in the Save dialogue and check "Customize Encoding Menu." The Chinese Text Converter, located in Applications/Utilities/Asia Text Extras, appears provide the capability to translate just about any encoding into any other. Cyclone and Codepage Converter are alternatives for this function.
OS X currently has only English text-to-speech. The program Speechissimo offers French, German, Spanish, and Italian. Cepstral has UK English, Canadian French, German, and Americas Spanish. A Chinese text-to-speech program can be found here.
Further details on OS X Unicode reading and input capabilities, including CJK Extension B in Plane 2 and scripts in Plane 1, is contained in the section below on Unicode.
OS X also allows the sharing of files named in multiple languages over a network. If you use the Go/Connect to Server dialogue, after you enter a password you can select the character set from among Arabic, Central European, Chinese, Croatian, Cyrillic, Greek, Hebrew, Icelandic, Japanese, Korean, Romanian, Thai, Turkish, and Western.
OS X includes the Darwin OS, based on the FreeBSD variety of Unix, which can be accessed via the Terminal program in Applications/Utilities. Terminal offers the choice of 3 shells (csh, bash, zsh) and 15 encodings (including 4 Japanese, 2 Chinese, EUC Korean, Latin 1,2, and 9, and UTF-8). Use zsh for best results for inputting anything beyond ascii. To see file names in their proper script, use ls -v. The Unix X Window GUI is included with OS X (as an optional install on the 3rd CD), and in principle this can be internationalized by modifying various parameter files. Open Office is a suite of programs designed to run in X Window which should eventually have multilingual capabilities. For info on this, you can consult the OO OS X testing forum at OOoDocs and Apple's X11-user mailing list.
Finally, OS X, via the program Virtual PC for Mac allows you to run WindowsXP Home Edition (and other Windows OS's), which have extensive language capabilities of their own.
The Mail program included with OS X is fully Unicode-savvy and automatically searches for glyphs in installed fonts for whatever encoding is indicated on the incoming text. The user can change the encoding for received messages from the Message/Text Encodings menu, and these can also be selected for outgoing messages. The range of encodings you have to choose from in Mail depends on the languages you have on the list in System Preferences/International/Languages, which you can change using the Edit button. One shortcoming is that Mail cannot set the default encoding for incoming messages, which is tedious if you get a lot of mail with the wrong charset specified. The default encoding for outgoing messages in Mail is sensitive to the order of languages in System Preferences/International/Languages, especially for Russian, Greek, Chinese, Japanese, and Korean. Before sending email in these you should test it with a message to yourself to see whether the default encoding is what your recipients will expect, and set it manually or adjust the preferences if necessary.
A Unicode-savvy mail client similar to Mail is GyazMail. To activate the outgoing encoding choices, you must go to View/Customize Toolbar when in "new message" mode and add the Encoding selector to the toolbar. GyazMail reportedly works better than Mail communicating with cellphones in Japan.GNUMail.app has similar capabilities. Another alternative is to use the Mail programs included in Mozilla or Netscape 7 for OS X. The Entourage mail program that comes with MS Office is also Uncode-savvy.
For email in OS 9, the situation is complex because of the more limited language capabilities in this enviroment. One technique worth trying is to employ the mail client contained in Netscape or Mozilla, since these are programs which lots of people have, and the encoding for both sending and receiving can be set via the View/Character Set menu item. Outlook Express has similar facilities, with the encoding set in the Format/Character Set menu.
The mail client Eudora requires the addition of special "tables" to function properly with many languages. When these are installed you can choose character sets via the Message/Change/Transliteration menu. John Delacour offers means for Eudora to do Unicode UTF-8 here.
The email program Magellan is especially designed to handle multilingual text, as is PowerMail.
Many OS 9 mail programs have OS X versions, but without Unicode capability.
When doing Webmail, you are at the mercy of the behavior of the particular browser and web site being used when it comes to faithful transmission of non-English mail text. It is best to explore the settings for the site to see if anything special exists for unusual scripts, and set the encoding of the browser as best you can before composing or reading. Trial and error may be required to get it right, and sending yourself a test message is a good idea. .Mac webmail essentially does only Roman and Japanese script, and has a button at the bottom of the page for choosing which one. For the best multilingual email experience, use one of the standard mail programs rather than webmail.
Traditionally computer systems could deal with only a limited number of distinct characters at once. Handling diverse languages meant remapping the same 256 codes to different characters for each one, using a font specifically designed for it. Successful communication over the internet sometimes required synchronizing the fonts at each end and translating among a couple dozen mutually incompatible character set standards, a list of which you can find in the "character encoding menu" of any browser or email program.
The development of Unicode, which is the agreed international standard for the unique encoding of all the characters used in different languages, changes this situation radically for the better. By creating a single character set that covers all scripts, Unicode allows the reading and writing of texts in any language, or the simultaneous display of many languages, without changing encodings and fonts. It should eventually become the common basis for text processing across all platforms and programs. A recent New York Times interview provides some useful general info.
The basic principle of Unicode is to assign a unique number (usually expressed in hexadecimal form) to every character. 1.1 million "codepoints" have been allocated for this purpose, divided among 17 "planes" with about 65,000 characters each. All characters in common use have been assigned to Plane 0, also known as the Basic Multilingual Plane (BMP), and some others have been placed into Planes 1, 2, and 14, as part of an ongoing process. Under the current version, Unicode 4.0, just over 96,000 characters have been allocated (plus 136,000 codepoints reserved for private use), and another 90 or so scripts are in the pipeline under consideration by various committees. For further information see the Roadmap to Unicode and Michael Everson's Paper Leaks in the Unicode Pipeline.
In practice Unicode data is represented by one of several possible "transformation formats," or UTF's. There are two common ones, UTF-16 and UTF-8. However, only UTF-8 is normally used over the internet. Unfortunately some Mac programs use the word Unicode in their encoding menus to mean UTF-16, so users need to watch out for this and specifically select UTF-8 when dealing with Unicode web pages and email. (Email also often has an additional "content transfer encoding," either "base64" or "printed-quotable," which is not related to language or character set issues.) Here is a summary of some UTF details.
Mac OS 9, actually beginning with OS 8.5, includes some limited support for Unicode. For example, the most advanced OS 9 browsers like Mozilla 1.2 can read UTF-8 web pages with Chinese, Japanese, Korean, Cyrillic, Greek, Arabic, Hebrew, and Devanagari, plus some languages using accented Roman if fonts such as Everson Mono Unicode and Gentium are installed. The MacBrowsers Page provides more info.
On the input side, if the Unicode language kit is installed, you can see two new items at the bottom of the Keyboard (flag) menu : Unicode Hex Input and Extended Roman (U). There are currently two (experimental) text editors available online which can make use of this OS 9 input support:
Mozilla 1.2 Composer can be used in a similar way.
Also OS 9.1/9.2 includes a Unicode text editor called World Text, which works only on those systems. It is basically a Unicode-savvy version of SimpleText, but can work with files larger than 32K and also embed pictures, sound, and movies. A carbonized version of this is available in OS X. This page has more info.
To use the Unicode Hex Input system, you hold down the Option key and (for a character in the BMP) enter the 4-digit Unicode hex code. For example, 99AC gives the Chinese character for "horse." Any Mac or Windows TrueType Unicode font containing the characters you want to generate can be used.
The US Extended input system (called Extended Roman in 10.1 and earlier) lets you access a more limited set of characters, namely Unicode Extended Latin-A, via various key sequences. You can see how this works with the KeyCaps utility or on this page. Also available is non-Apple Latin Extended Keyboard.
For a guide to Unicode fonts (produced essentially for Windows) of various sizes and capabilities, see
Alan Wood's Font Page.
Mac OS X, covered in the section above, has a much broader level of Unicode support than OS 9. Under OS X 10.1 and higher, with appropriate fonts intalled, TextEdit can read characters in Unicode Planes 1 through 16, in addition to the usual Plane 0. The Unicode Hex Input system can also type characters from Planes 1 and above if you know the pair of 4-digit Hex "surrogates" which represent them (just input the two sequences in succession.) The same range of characters can be copy/pasted from the Character Palette. Custom keyboards, based on XML text files, can be created to access and input any desired set of Unicode characters.
OmniWeb 4.2 and higher, Opera 6, Safari, and Mozilla can display characters beyond the BMP in UTF-8. Both OmniWeb and Opera 6 can read such characters (assuming the font for them is installed) if a web page is encoded in UTF-16. An example is at Tex Texin's Unicode Examples Page.
One way to find the surrogate pairs for a given character code (or the character represented by a pair of surrogates) is to use Michael Kaplan's UTF-32 to 16 Translator.
A beta test font for Plane 1 and some other areas (planes 0 and 15) is Code2001, which contains characters for Old Persian Cuneiform, Deseret, Tengwar, Cirth, Old Italic, Gothic, Aegean Numbers, Cypriot Syllabary, Pollard Script, and Ugaritic.
The Hiragino Japanese font in OS X includes a small number of characters in Unicode Plane 2, (including about 300 from JIS X 0213) which can be accessed via the JLK character palette or the Unicode Hex Input keyboard (and some are in the phonetic input dictionary as well). A much more complete font for Plane 2 is Simsun (Founder Extended), which comes with MS Office XP and includes 37,000 Chinese Mainland, HKSAR, and Taiwan characters (in addition to another 28,000 Chinese and many western language characters from Plane 0). Info on its contents can be found here.
To see what Unicode characters are available on your system, a good utility is UnicodeChecker. It covers all 17 Unicode planes, can be searched by character block or name, and characters can be copy/pasted into TextEdit.
The sort order of filenames in OS X is based on a Unicode system. The full list can be found here, and the Apple modifications are here and here.
Numbers come before Latin and Greek comes after.
In OS X, Symbol and Zapf Dingbat characters are also produced using Unicode fonts, so that special keyboards (10.1) or the Character Palette (10.2) need to be activated in order to type them (you cannot just select the font as was possible in OS 9). This is explained in TIL 106731. If you need Wingdings-like symbols, use the Webdings font and look in the Unicode Private Use range in the Character Palette.
For codepoints in the Unicode Private Use Area (PUA) used by Apple, see this page.
The ability of applications to use OS X's excellent Unicode support varies widely. Cocoa programs like TextEdit are normally "Unicode-savvy": They can accept Unicode input via keyboard or copy/paste, and save and open Unicode text. Carbon programs are usually only "Unicode-aware" and lack key features. For example, Word X does not accept direct Unicode input, but it can save text as UTF-16 and HTML as UTF-8 (although it can only open the latter). Some Carbon programs, including AppleWorks, and almost all Classic programs, are "Unicode-deaf" and can neither input, save, nor open Unicode text.
The Language Display Capabilities of iTunes should be the same as those of OS X or WinXP/2000, that is to say just about any language for which you can find a font. But the iPod is more limited and its capabilities can differ by model. For the most recent iPods (as of 2/2005), the technical specs say that Menu Languages are Danish, Dutch, English, Finnish, French, German, Italian, Japanese, Korean, Norwegian, Simplified Chinese, Spanish, Swedish and Traditional Chinese. Additional language support for display of song, album and artist information includes Bulgarian, Croatian, Czech, Greek, Hungarian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian,Turkish and Ukrainian.
A non-Apple Russian localization for the iPod can be found here.
Out of the box the iPod cannot display song names in Hebrew, Arabic, Thai, Hindi, or other languages not on the lists above, and some earlier iPods may not even do all of those. There are no downloads to fix this, and it is not known when Apple will add support for these and other missing languages. A non-Apple hack for Thai can be found here and for Hebrew here.
Correct display of the language in song titles in both iTunes and on the iPod depends on the language being properly encoded and identified in the ID3 tags of the song. If it isn't working right, you can try to fix the tags. These docs give some info on doing this:
A program that may be useful for fixing certain tags is Unicode Rewriter
If fixing the tags doesn't do the trick, the only alternative is to type titles in manually. If iTunes crashes when you edit ID3 tags, try removing any Visual plug-ins.
WinXP/2000, unlike OS X, does not have all languages enabled by default. Instructions for enabling Asian languages in Windows can be found here and here.
If an iPod has its menus in the wrong language, you can change them to another one by going to Main Menu > Settings > Language or by doing a Reset. Doing a Restore may also work, but it will erase the contents of the iPod.
iTunes localizations are currently (version 4.7) available in Chinese, Danish, Dutch, Finnish, French, German, Korean, Italian, Japanese, Norwegian, Spanish, and Swedish. They can be downloaded individually using the links at the bottom of this page.
You cannot buy music from the iTunes store in another country. Because of licensing agreements you can only buy music from the store which services the billing address for your credit card. As of 2/05, iTunes stores are only available for the US, Canada, UK, Ireland, France, Germany, Austria, Belgium, Finland, Greece, Luxembourg, the Netherlands, Portugal, and Spain. If you use a multilingual store (like Canada or Belgium) and are getting it in the wrong language, you may need to do the following: While your system is set to your preferred language, use the button at the bottom of the main store page to change to a store in another country, then change back to your own country.
Correct display of languages in iPode Notes depends on the encoding. Languages other than those of Western Europe and Chinese/Japanese/Korean may require UTF-8 encoding and a special header on the notes files. For detailed info see the Encoding section of the Notes Reader Guide.Correct display of vCards may require UTF-8 or UTF-16 encoding.
For a language translation program for the iPod, check iLingo.
OS 9 can handle input for a considerable number of languages
out of the box: Danish, Dutch, Finnish, French, German, Italian,
Norwegian, Portuguese, Spanish, Swedish, Czech, Hungarian, Polish, Slovak,
Russian, Ukrainian, Bulgarian, Hebrew, Arabic, Persian, Hindi, Nepali, Gujarati,
Punjabi, Chinese, Korean, and Japanese.
By default, multilingual input capabilities are invisible to the Mac OS 9 user. To turn
them on, you need to go to the Keyboard control panel and select at least
one keyboard layout in addition to the default. Your menu bar will then include a new
flag icon at the far right. Clicking on this menu will show whichever languages or alternative keyboards
you have checked, and selecting one will activate it.
The Keyboard control panel should allow you to choose Danish, Dutch, Finnish, French, German, Italian,
Norwegian, Portuguese, Spanish, and Swedish. For the others, you will need to install their Language Kits. To do this you should get out
your system disk and do a Custom Install for the kits of interest.
Afterwards also look in the CD Extras file on the system disk, where the
Language Kits CD Extras folder has some additional fonts and keyboards.
A nice tutorial on installing the kits is at NISUS Software
If your system does not appear to offer a Custom Install with language kits, look on the CD in the folder OS9 Applications/Apple Extras or in the folder Software Installers/Language Kits for an installer. Also look for a folder OS9 Applications/Apple Extras on your hard drive if you have used Restore disks to install your OS9. Some latest versions of OS 9 seem to be missing Persian and Unicode Hex input, and Punjabi is called Gurmukhi. The Persian and Unicode folders from standard 9.2.1 can be found here.
After you install a language kit, except for Chinese/Japanese/Korean, you need to go back to the Keyboard Control Panel and check that the keyboards you want to use are activated. Click on the "Script" menu to choose the script you are interested in to make its keyboards visible for selection. If you are naming files in another language, you may want to go to the Fonts tab of the Appearances Control Panel and choose an appropriate font.
The keyboard shortcut for switching from one "script" to another is Command+Space. The scripts are Roman, Central European, Cyrillic, Arabic, Hebrew, Devanagari, Gurmukhi, and Gujarati. By going to the Keyboard Control Panel and poking the Options button, you can activate another shortcut, Command+Option+Space, which switches from one language to another within a given script. Note that if the same keyboard shortcut is used for an application, the script/language switching function will take precedence, and the first of these cannot be turned off. The only workaround is to deactivate the other scripts or to always type a space before invoking Command+Space (a common shortcut for "zoom") in the application.
Software for working in many other languages not included in the standard US OS 9 is available on the Internet and commercially. See the "Other Resources" section at the end of this page for some suggestions about where to look.
Prior to OS 9, working in a foreign language usually required installing separate software packages. Apple itself at one time sold individual Cyrillic, Arabic, Indian, Hebrew, Chinese, Japanese, and Korean language kits, and these can sometimes be found on eBay if you need one for an old system. Some more limited multilingual access was included starting
with OS 8.5. An explanation can be found at
See Hermessoft and Evertype for add-ons that do a number of unusual languages in older systems.
Note that standard US OS 9 and earlier comes with only one "system language," in which all the standard menus and dialogues are written. "Localized" versions, with the system in a language other than English, are sometimes available in other countries. Those produced were French, German, Spanish, Italian, Japanese, Dutch, Danish, Portuguese, Chinese, Korean, Norwegian, Swedish, and Finnish. If you want one of these on your machine you will normally have to buy it separately and install it in place of the existing system (or on a separate partition). However, the install disks provided with the Mac Mini, introduce in January 2005, reportedly have all OS 9/Classic localizations included.
Whether a particular application has a localized version for a particular language depends on whether its authors have provided it. You may need to use the Language Register program in Applications/Utilities to enable all the features of a localized application. If having a non-Roman system font is critical for your purpose, a hack to allow this is found here.
Setting up your browser to read foreign language web sites usually requires having the proper
font installed for the language in question and setting some browser parameters.
In particular, you need to go into Preferences/Fonts and make sure the right
font is selected under the right language. You may also need to go to the
Character Set item under the View menu and set it for the right language.
Sometimes experimentation is necessary because there is more than one choice
for a language. The "user defined" option can be useful where
your language/font does not fit one of the other categories.
For a nice explanation of how to do this, see Alan Wood's page on Setting
Up Mac Browsers for Multilingual Support at
For best results with foreign languages, you should use the latest browsers, such as Netscape 7 or Mozilla 1.2, rather than the versions of IE and Netscape that come installed with OS 9.
If you want to write in a foreign language, the procedure is usually pretty
simple. You first need to open a Mac text editor or word processor that
is able to handle other scripts (known as "WorldScript-savvy").
SimpleText, WorldText, AppleWorks 5, Nisus Writer, Word Perfect 3.5, BBEdit 6.0, Mariner Write 2.0,
and Word 2001 are examples. For scripts that run from right to left, like
Arabic and Hebrew, we have heard that Nisus
Writer and Mariner Write give
the best results.
Then you go to the Flag (keyboard) menu and select the language. To see
how the keys are mapped, go to Keycaps in the Apple menu. A small keyboard
will appear on the screen with the foreign letters in place of the usual
ones (you may have adjust the font in the Keycaps Font menu to get this
right). You can type directly into the document from the real keyboard or
type on the screen keyboard and copy/paste the result.
You might want to print out a copy of Keycaps for the language you are using.
You can try using the Mac's built-in screen capture function (Command-Shift-3
or Command-Shift-4), or other third-party capture utilities you may have,
but sometimes these make the keys go back to normal and won't work. One
program I found that seems to do the trick is Gif-gIf-giF.
When you can't find a keyboard with specific letters that you need, all is not lost. Many fonts contain characters beyond those which the keyboard can access or which require obscure key sequences. For example, Mac Central European fonts have macron vowels which are otherwise hard to find. Useful utilities for locating and typing these are PopChar and FontBuddy..
For the more unusual scripts that use keyboards you can download helpful
manuals from the Apple web site: Cyrillic (Part 030-7977), Indian (U96600-025),
Arabic (030-7912), and Hebrew (030-7978). For a full list, see.
What about Chinese, Japanese, and Korean, which may require the use of thousands of ideographs? For these the Mac kits
include special input methods, operated via an additional "pencil" menu located next to the Keyboard menu. Each language has several options for generating final text.
Finding English-language
documentation on how these methods function requires some extra work, since the manuals
are not provided on the OS 9 CD. "Help" in Japanese and Chinese is, however,
available from the on-screen Help Center, and in Korean via the "pencil" menu.
Fortunately, manuals for Chinese input are available at the Apple site (Parts 034-0602 and 030-4900). And
a good explanation has also been put into the Chinese-Mac
FAQ:
The Mac's Traditional Chinese input system covers about 13,000 characters and gives you the choice of three modes using strokes/radicals (Cangjie, Jianjie, and Dayi), two using phonetics (Pinyin and Zhuyin/Bopomofo), plus Big5 hex codes. For Simplified Chinese, covering about 6,700 characters, there are two modes using strokes/radicals (Wubi Xing and Wubi Hua) and one using phonetics (ABC/Pinyin), plus GB numeric codes (Quwei). In both cases, hitting the space bar after input generates a list of possible characters for selection.
Many users find Apple's Traditional "pinyin" input option to be too primitive for their needs, as it cannot parse sentences or phrases, but requires each character to be chosen separately. Hanin and BoPoMoFo are non-Apple input methods which are considerably "smarter" in this regard. Info on Hanin can be found on the Chinese-Mac page. Another system is Cihui.
For Korean the manual can also be obtained from Apple (Part U95602-004). The Mac's "Power Input Method" provides the user with Jamo, Romaja, and (Japanese) Kana phonetic keyboards to create Hangul characters. You can also transform Hangul and Romaja into Hanja (Korean Chinese characters) using a 5,000 character dictionary: Typing option-return or control-return after your input will generate a list of Hanja for selection.
The OS 9.1/9.2 Korean Language Kit fixes some font problems with 9.04's Hanja converter and adds a second input method for Korean, called "Hangul Direct," but without any English menu or explanation regarding how it works. A rough translation of the "pencil" menu is at:
During May, 2001 updated fonts for Korean and Traditional Chinese were made available for System 9.1 only (also via the Software Update control panel).
For Japanese the manual is also available online (Part 030-4174, 22MB). This is particularly important because (unlike for Chinese and Korean) the "pencil" menu cannot be switched to English. For a rough translation of the key menus see:
The Mac's "Kotoeri" input method offers a choice of Romaji, Hiragana, and Katakana phonetic keyboards. Romaji can be transformed automatically into either Hiragana or Katakana, and hitting the space bar twice will convert Hiragana to a list of possible Kanji (Japanese Chinese characters). You can also key in Kanji directly via a dictionary that lets you search for about 6000 characters by radical or by any of four numerical codes. For info on other input systems favored by intensive Japanese users, ATOK and EG Word, see the Other Resources section at the end of this page.
A fourth language, Classical Vietnamese, uses a subgroup of about 3000 Traditional Chinese characters (Chu Han) or a different, unique character set derived from these (Chu Nom -- about 2000 glyphs). For Nom one must download a special font (look under "Vietnamese" in the "Other Resources" section) and use it with the Japanese Language Kit.
For info on the (limited) Unicode capabilities of OS 9, see the section devoted to this topic above.
1. OS X - Can't Find KeyCaps Utility: This is now the Keyboard Viewer Palette and is activated in the System Prefs/International/Input Menu pane and then selected from the "Flag" menu. If you obtain KeyCaps from an older version of OS X it should also function.
2. OS X - Can't Find Pencil Menus For Asian Language Input Kits: These are now at the bottom of the "Flag" menu.
3. OS X - Cherokee and Inuktitut Keyboards Don't Seem To Work: You have to activate Caps Lock for these keyboards to generate the expected non-Roman characters.
4. OS X - Kotoeri Preferences and Word Register Don't Appear: Install the Japanese system localization files from the 2nd install CD.
5a. OS X - Can't Input Japanese with Dvorak, AZERTY, etc: Select the keyboard you want to use in the Kotoeri Preferences (at bottom of "flag" menu), first tab, last item.
6a. OS X - Can't Input Certain Accented Characters with US Extended and Finnish Extended Keyboards: This is a Cocoa bug. The workaround is to input the characters directly with the Unicode Hex keyboard or in the KeyCaps from Jaguar and copy/paste to the Cocoa app. Or in the name of a new folder in the Finder and copy/paste. Fixed in 10.3.
6b. OS X - Can't Type S-Comma with Romanian Keyboard: The S-Comma only works with ISO hardware keyboards sold in Europe. Download an alternative Romanian keyboard here.
7. OS X - In Mail I Can't Find the Encoding I Need: Go to System Preferences/International/Languages and add the language for which the encoding is used to the list using the Edit button.
8. OS X - System Language is set to Y, but Folder Names Are Still In English: Try going to Finder Preferences/Advanced and unchecking the box "Show All File Extensions." If not enough folders change to what you want, try opening Terminal, make sure you are in the directory containing the folder in question, and for any non-localized folder type "touch Nameoffolder/.localized".
9.OS X - Character Palette Keeps Popping Up When I Don't Want It: See this article. (Trash the com.apple.HIToolbox .plist in Users/username/Library/Preferences/ByHost/)
10. OS X - Keyboard Keeps Switching Every Time I Change Apps: Try going to the Options button in the Input Menu pane and unchecking "match keyboard with text."
11.OS X - I Use ATOK For Japanese But Kotoeri Won't Go Away: See this article.
12. OS X - Kana/Kanji Conversion Has Stopped Working in Japanese IM: See this article.
13a. OS X - Mozilla/Netscape Can't Read Thai: Probably a Mozilla bug. Easiest fix is to remove the Thonburi font. If you must have it, replace the Panther version with the one from Jaguar.
13b. OS X - Safari 1.2 Suddenly Won't Display Arabic Correctly: Remove the fonts Arial and Times New Roman from Users/username/Library/Fonts (installed by Office 2004). Or create a custom style sheet arabic.css with the text "* { font-family: "Geeza Pro" !important }"
14. OS X - Keyboard Choice in Preferences Won't Stick: Try disabling automatic login, then logging in as Root and resetting your keyboard there. Also try cleaning your System and User caches with Cocktail or Panther CacheCleaner. Also try trashing com.apple.HIToolbox.plist in both system and user preferences.
15. OS X - Can't Type Accented Characters Like I Always Did in Windows: Mac's use the Option key to access various accent dead keys. For a simple keyboard that works like the Windows US International, try USIntl.rsrc available here. (Use a Mac to access this link.)
16. OS X - Display Is Gibberish, Especially In Safari And Mail: Search your system for the font Helvetica Fractions and try removing this.
17. OS X - My Username Is In Asian Characters Instead of What It Should Be: Try adding a new user so that you have an odd number of them, or change the names of all your users so they begin with different letters.
19. OS X - Character, Kana, Keyboard View, or Input Mode Palette Misbehaving: Try trashing the file Users/username/Library/Preferences/com.apple.X.plist belonging to the application causing trouble.
20. OS X - Can't Copy/Paste Language X from Browser Y Into Program Z: Try using Mozilla instead of browser Y. Try using TextEdit instead of Program Z. Try pasting into TextEdit, changing to Rich Text, and then copy/pasting into Program Z.
21. OS X - Some Keyboards Missing: Install the Additional Fonts Package from the 3rd Install CD.
22. OS X - Can't Type Cyrillic or Central European in Program X: Make sure you install the "additional fonts" in the 3rd install CD. Some programs require that you select fonts which end in CY or CE for correct input in these scripts.
23. OS X - Can't Read Webpages in X Language: Install Additional Fonts Packages from 3rd Install CD, and use a good multilingual browser like Mozilla, Opera, or Safari (not IE or iCab).
24. OS X - I Get an Accented E When I Type ?: Somehow the Canadian keyboard has been activated. Go to System Prefs/International/Input Menu and change the selection to U.S.
25. Classic - Can't Install Extra Keyboard in Classic System File on Machine That Doesn't Boot Into OS 9: See these instructions. Also you can try copying the Classic System file and moving the copy to a machine that does boot into OS 9, installing the extra keyboard there, moving it back to the original machine, and replacing the original with the modified version.
26. OS 9 - Can't Find Language Kits: Look on the CD in the folder OS9 Applications/Apple Extras for an installer. Also look for a folder with the same name on your hard drive if you have used Restore disks to install your OS9.
27. OS 9 - Need Non-Roman Fonts in System to Have Certain Menus and File Names Show Up Correctly: Infomation on how to hack your system to use non-roman fonts can be found here. (Note that this does not translate system dialogues, etc. into another language.)
28. General - My Foreign Language Web Page is Fine Locally But Garbage on the Server: Make sure the FTP or other program you use to upload the page is set to allow double-byte text. If using Fetch, make sure you uncheck the item "Translate ISO Characters" in Customize > Preferences > Misc.