Internationalizing Messages in Linux Programs
Linux is becoming increasingly popular each day. Until now, the typical Linux user has been a system administrator, student or UNIX hacker. New projects such as GNOME, KDE and GNUStep are preparing the way for a different, less technically prepared user.
Running software in English is usually not a problem for someone with at least moderate computer skills, but end users need (and want) software that speaks their own language in order to be productive or feel comfortable with the system. Moreover, many programs need to know local conventions for things such as dates or money amounts in order to be useful and complete.
This article is an introduction to the GNU gettext system, a set of tools and libraries for both programmers and translators that enables them to produce multilingual programs with textual messages in specified languages. We will deal with languages that use one of the ISO-8859-X character sets, except for Japanese and Chinese as they require extra care.
Two words appear frequently when talking about support of different languages in programs: internationalization and localization. Since writing these words over and over (without spelling errors) is annoying and time-consuming, people abbreviate them as I18N and L10N. The 18 and 10 indicate the number of letters between the first and the last letter of each word.
Internationalizing a program means taking the necessary steps to make it aware of different languages and national standards.
The process of localization takes place when an internationalized program is given the information needed to behave correctly with a certain language and set of cultural habits.
The first thing to do, for both programmers and end users, is configure the Linux machine to use locales. Most users need only follow the Locales mini-HOWTO downloadable from ftp://sunsite.unc.edu/pub/Linux/docs/ and mirrors. Recent distributions (for example, Red Hat 5.0) include everything to support locales.
Once the system is enabled to support locales, you must specify the particular standards and languages you wish to use. This is done through a set of environment variables. Each one controls a specific aspect of the locale system:
LANG specifies the global locale, but can be overridden by the following variables.
LC_COLLATE specifies the locale used for sorting and comparing.
LC_CTYPE specifies the character set in use, so that isupper('<\#192>') returns true in an Italian locale.
LC_MONETARY provides information about representing money in a specific locale.
LC_NUMERIC gives information about numbers: how digits are divided and separated in groups, what the decimal point is, etc.
LC_TIME specifies which locale to use to represent time: AM/PM or 24-hour values, for example.
LC_MESSAGES indicates the language you prefer for programs' text messages.
LC_ALL overrides any previous indication and sets a global locale.
Examples of values for global locale are:
en_US indicates English in the United States.
it_IT is for Italian in Italy.
fr_CA is for French in Canada.
The locale used by default, unless overridden by the previous variables, is called the C (or POSIX) locale. Thus, it is very easy to illustrate the behavior of a locale-aware program by using date, for example (see Listing 1). First, without setting the LC_ALL variable, the response is in English. Next, LC_ALL is set to obtain an Italian response, a French one (French in Canada is specified), then an English one (English in Canada). The “No such file or directory” for the Italian locale is not translated, which means the Italian information is not available; therefore, the default is used instead.