Introduction to Internationalization Programming
Typically, a programmer works on the source code, and a translator deals with the corresponding .po file, which may be created with the copy command from the .pot file or directly from the source with the xgettext program.
Glance over a .po file, and you will see it has a header and entries for translating. Here is an example of an entry:
# This is my own commentary #: counter.c:25 #, c-format msgid "You typed %d %s \n" msgstr "Vous avez tapé %d %s\n"
It's simple. You translate phrases from msgid and put the results into the msgstr fields. If a line starts with #:, it is a reference to the source; if it starts with #, it shows an entry's attributes. You can add your own comments in lines starting with two symbols: # (the pound sign and then a white space).
After copying the template .pot file information (.po) or creating a new one with the xgettext command, you can start translating. Generally, this job can be done by another person using his or her favorite editor. (X)Emacs is not a bad choice for this job, but KBabel, part of KDE, is an even better one. If you are going to participate in a team of translators, it is highly recommended that you use KBabel. Describing KBabel is beyond the scope of this article, but you can read more about it in “The KBabel Handbook” (see Resources).
Translation is a kind of art. Writing a correct phrase can be difficult, and you sometimes may doubt your ability with a particular language. So, you may want to leave some entries untranslated, or having translated them doubtfully, mark them as “fuzzy”. With KBabel or (X)Emacs, you easily can find such entries and edit them again later. Do not worry; only fully translated entries will be compiled later by msgfmt and become usable in programs. This simply means that an entry may be marked “translated”, “untranslated” or “fuzzy”, and as software changes quickly, there is also an “obsolete” attribute.
Languages are flexible. English messages are not always perfect either. In our case, the message “You typed 0 digit” is incorrect. GNU gettext can manage translating problems like word order, plural forms and ambiguities, but you have to use extra functions that hold more arguments than gettext().
Once you have translated the file, you should convert it into a .mo file that gettext will use if you run the program with the corresponding locale. Do not forget to put this file in the right place, in our case:
mv counter.mo fr/LCMESSAGES/
Now the counter can speak French! (See Listing 1.)
Programs evolve, and if their source code is changed, the corresponding .po files also have to be updated. Using only xgettext in this case is not an ideal solution. All translated messages will be lost, because it overwrites .po files. In this case, you should use the program msgmerge. This program merges two .po files, keeps translations already made (if the new strings match with the old), updates entries' attributes and adds new strings. Of course, these new strings will be untranslated entries. A typical call is simple:
msgmerge old.po new.po > up-to-date.po
In this article, the input method is not described, although it is also important. Generally, non-X11 software doesn't need to worry about i18n input methods, because it is the responsibility of the console and X terminal emulators.
For input in X11, three methods exist: Xsi, Ximp and XIM. The first two are old-fashioned; the last one is the de facto standard. Their descriptions are beyond the scope of this article; however, the source code for the rxvt program provides an excellent example.
Modern tools provide their own special subroutines for input of internationalized strings, using gettext for output messages. To make program code 8-bit transparent for internal proposals, Unicode is used. Qt, for instance, works in such a way, providing additional functions for input and output of i18n strings correctly (see Resources).
You also may want to look at the source code of mutt, which is a good i18n program (www.mutt.org). This program supports aliases for charsets.
Using Unicode in your programs is described by Tomohiro Kubota (see Resources). Happy i18ning!
- Brent Laster's Professional Git (Wrox)
- Machine Learning Everywhere
- Smoothwall Express
- Own Your DNS Data
- Bash Shell Script: Building a Better March Madness Bracket
- Returning Values from Bash Functions
- From vs. to + for Microsoft and Linux
- Simple Server Hardening
- Understanding OpenStack's Success
- Tech Tip: Really Simple HTTP Server with Python