Polyglot Emacs 20.4

A look at multilingual Emacs.

Ken'ichi Handa's Mule (multilingual Emacs) first appeared at the end of 1992. After almost five years, the Mule enhancements were included with GNU Emacs 20.x. For those of us who have yearned for multi-script capability since our first encounter with a computer (more than twenty years ago), it has been a long and often frustrating wait. We are now at GNU Emacs version 20.4 and things are finally beginning to look interesting to people who wish to work with multiple scripts on Linux. I am using Emacs for translation, exegesis and preparing reference material in multiple scripts.

Who wants to install a special Japanese Linux just to be able to read a Japanese source file for a translation job or read and write Japanese e-mail? I want to be able to include Chinese bibliographies, text and notes in the papers I write. I would also like to be able to include Tibetan or Greek scripts for philosophical or technical terms, along with their transliteration into Latin script. When I am discussing the structure of Chinese characters, I want to be able to make comparisons with Egyptian hieroglyphs. I want my quotations of French and German to look like French and German. I want to be able to publish all this on web pages as well as my PostScript printer. Some high-priced programs are coming out that address these issues, but Emacs 20.4 is here now. It runs on the best generally available operating system in the world, GNU/Linux, and it is free.

Figure 1

As an example of using Emacs to prepare multi-script reference material, let's look at the Buddhist numerical lists I am currently working on. Without much difficulty, I can write this list with a pen on a piece of paper. See Figure 1 for a bitmap of a page from my handwritten list. (Apologies for my poor calligraphy.) To get these scripts into a computer text file requires an input method specific to each script. Fortunately, Emacs comes with Quail, which has a method to input each script in this example and many more.

To invoke Quail for Devanagari, use the Mule menu or type:

ctrl-x return
ctrl-\ devanagar-transliteration

Three other choices are available. For Tibetan, you will want tibetan-wylie. For Chinese, more than twenty methods are available. I use chinese-py-b5 for traditional characters and chinese-tonepy for the simplified characters. The Quail Japanese input method is adequate for short strings, but not extensive input. This is one place where I feel a free input method editor for Linux is needed that equals or surpasses Microsoft's free Japanese IME for Windows. Wnn 4.2, the last free version of Wnn, worked well. I have used it in the past with Mule, but so far I have not be able to get it to work with Emacs.

Soon, I hope to add the Korean, Thai, Lao and Vietnamese equivalents of each term in each list. All these are supported by Emacs. Finally, I hope to add the Mongolian script, which is not yet available in Emacs.

As you start to use several different input methods, you may soon find the command to invoke them, ctrl-x return ctrl-\ cumbersome. I rebind it with:

meta-x global-set-key f3 return

While you're at it, you might as well bind the command universal-coding-system-argument to something handy—I use f2.

universal-coding-system-argument is the command that lets you specify which coding system you want Emacs to use when you execute your next command. If you do much multi-script work, this will probably be ctrl-x ctrl-v return, which revisits the file you just visited. On the revisit, Emacs uses the coding system you specified as the universal-coding-system-argument. From the main Emacs menu, you can select Mule/Set Coding System/Next Command to do the same thing. (See Emacs Manual 31.4.5 for details on rebinding keys.)

For information on each input method and sometimes a list of the characters you can use with it, type ctrl-h I. As usual in Emacs, tab will give you a list of choices if you don't know the exact name of the input method you are after. ctrl-g escapes from whatever you are doing in the mini-buffer. I said “sometimes” because some lists are missing. For example, in response to ctrl-h I ipa, Emacs returns “Input method: ipa (IPA in mode line) for IPA International Phonetic Alphabet for English, French, German and Italian” but provides no list of the actual symbols. For Devanagari, on the other hand, a full list of the letters of the script is presented. Not given are the details on how to evoke several operations essential for being able to input the script properly. If you are familiar with the script, you can probably hack your way through. If, like me, you are a beginner and merely attempting to input it from a transcription in Latin script, even assuming your transcription is precise, you will not be pleased. Detailed descriptions of the various input methods are needed.

Start with Tibetan. Type the Wylie transliteration and the script appears—very smooth. For beginners, it is easier than writing Tibetan by hand. For the time being, I had to give up trying to input Devanagari. You may have better luck.

All these input methods come in a package called Leim. As of this writing, Leim is bundled with the Emacs-20.3.92 from ftp://ftp.etl.go.jp/pub/mule/.notready/, but it must be downloaded as a separate package for the Emacs pre-release from ftp://alpha.gnu.org/. Anyway, if Emacs cannot find all files included with Leim when it compiles on your system, you won't have any input methods. Let's hope the distributions will include Leim by default and give you an option to exclude it.

To evoke the multi-script capabilities of the new Emacs, another essential ingredient is Intlfonts 1.x. At present, it is version 1.2. This package provides all the fonts you need to display all the scripts. It, too, must be installed before you will find any joy in multi-script work.

The latest version of ps-print.el that comes with Emacs allows you to at least dump your multiple-script files to the laser printer. Currently, you must be content with “not scalable” bit-mapped fonts where one size fits all, but this is an essential first step. Perhaps CJK-TeX by Werner Lemberg (xlwy01@uxp1.hrz.uni-dortmund.de) or Omega, the new, purportedly internationalized TeX, will generously give us the ability to produce high-class, camera-ready, multi-script output for printing and a description of how to do it.

Figure 2

In Figure 2, we see the same text as in Figure 1 entered into an Emacs buffer, or as much of it as I could enter without a better understanding of the Devanagari input method.

It could be argued that we should not look to Emacs for help in printing, aside from the minimum requirement of being able to dump multi-script texts to the printer. Maybe the same holds for Web publishing—I don't know. However, it is frustrating to create a document in five or more scripts perfectly well in Emacs and then not be able to print it at a camera-ready level of quality, or publish it on the Web where the only character set that would allow the inclusion of all five scripts, with some on-going support, is Unicode, particularly in its UTF-8 encoding.

If there were an option to map the multi-script file in Emacs to the Unicode character set and then save it in UTF-8 encoding, ithe file would be directly available as content for an XML or HTML document. True, some browsers cannot understand Unicode yet, and the browser user may not have all the fonts installed, but this is bound to change for the better soon. One thing for sure: few will be able to read it in Mule internal format or even in a mix of ISO-2022 character sets/encodings.

Regarding a Unicode converter for Emacs, Miyashita Hisashi took a first shot at a Mule internal-encoding-to-Unicode converter with his MULE-UCS converter, but as of this writing I have not been able to install it. I noticed on the Unicode mailing list that Mark Leisher (mleisher@crl.nmsu.edu) also has one under construction. Hopefully, by the time this article is printed, we will be able to produce a Unicode/UTF-8-encoded file from Emacs.