Internationalizing Messages in Linux Programs

An introduction to the GNU gettext system for producing multilingual programs.

Linux is becoming increasingly popular each day. Until now, the typical Linux user has been a system administrator, student or UNIX hacker. New projects such as GNOME, KDE and GNUStep are preparing the way for a different, less technically prepared user.

Running software in English is usually not a problem for someone with at least moderate computer skills, but end users need (and want) software that speaks their own language in order to be productive or feel comfortable with the system. Moreover, many programs need to know local conventions for things such as dates or money amounts in order to be useful and complete.

This article is an introduction to the GNU gettext system, a set of tools and libraries for both programmers and translators that enables them to produce multilingual programs with textual messages in specified languages. We will deal with languages that use one of the ISO-8859-X character sets, except for Japanese and Chinese as they require extra care.

Definitions

Two words appear frequently when talking about support of different languages in programs: internationalization and localization. Since writing these words over and over (without spelling errors) is annoying and time-consuming, people abbreviate them as I18N and L10N. The 18 and 10 indicate the number of letters between the first and the last letter of each word.

Internationalizing a program means taking the necessary steps to make it aware of different languages and national standards.

The process of localization takes place when an internationalized program is given the information needed to behave correctly with a certain language and set of cultural habits.

First Things First

The first thing to do, for both programmers and end users, is configure the Linux machine to use locales. Most users need only follow the Locales mini-HOWTO downloadable from ftp://sunsite.unc.edu/pub/Linux/docs/ and mirrors. Recent distributions (for example, Red Hat 5.0) include everything to support locales.

Once the system is enabled to support locales, you must specify the particular standards and languages you wish to use. This is done through a set of environment variables. Each one controls a specific aspect of the locale system:

  • LANG specifies the global locale, but can be overridden by the following variables.

  • LC_COLLATE specifies the locale used for sorting and comparing.

  • LC_CTYPE specifies the character set in use, so that isupper('<\#192>') returns true in an Italian locale.

  • LC_MONETARY provides information about representing money in a specific locale.

  • LC_NUMERIC gives information about numbers: how digits are divided and separated in groups, what the decimal point is, etc.

  • LC_TIME specifies which locale to use to represent time: AM/PM or 24-hour values, for example.

  • LC_MESSAGES indicates the language you prefer for programs' text messages.

  • LC_ALL overrides any previous indication and sets a global locale.

Examples of values for global locale are:

  • en_US indicates English in the United States.

  • it_IT is for Italian in Italy.

  • fr_CA is for French in Canada.

Basically, to use the standards of language LL in country CC, the locale value will be LL_CC.

Listing 1.

The locale used by default, unless overridden by the previous variables, is called the C (or POSIX) locale. Thus, it is very easy to illustrate the behavior of a locale-aware program by using date, for example (see Listing 1). First, without setting the LC_ALL variable, the response is in English. Next, LC_ALL is set to obtain an Italian response, a French one (French in Canada is specified), then an English one (English in Canada). The “No such file or directory” for the Italian locale is not translated, which means the Italian information is not available; therefore, the default is used instead.

______________________

White Paper
Fabric-Based Computing Enables Optimized Hyperscale Data Centers

Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions