Internationalizing Messages in Linux Programs

An introduction to the GNU gettext system for producing multilingual programs.
Dealing with Messages in C Programs

Let's have a first look at the package GNU gettext. If you don't have it installed on your system, you can download it from ftp://prep.ai.mit.edu/pub/gnu/ or its mirrors.

When writing multilingual programs with this package, strings are “wrapped” in a function call instead of being coded directly in the source. The function is called gettext and accepts exactly one string argument and returns a string.

Despite its simplicity, gettext is very effective: the string passed as an argument is looked up in a table to find a corresponding translation. If a translation is found, then gettext returns it; otherwise, the passed string is returned and the program will continue to use a default language.

Our first, internationalized Hello, world! program could be:

#include <stdio.h>
#include <libintl.h>
void main(void) {
        textdomain("hello-world");
        printf(gettext("Hello, world!\n"));
}

Always remember to include <libintl.h> in each C program that makes use of the gettext package.

The function textdomain should be called before using gettext. Its purpose is to select the correct “database” of messages (a more appropriate term would be “message catalog”) for the program to use.

Then, each translatable string must be used as a parameter of gettext. Writing gettext("foobar") each time can be annoying. That's why many programmers use this macro:

#define _(x) gettext(x)

By doing so, the overhead introduced by internationalization of messages is quite small: instead of writing "foobar", one can just write _("foobar"). That's only three characters more per translatable string, with the advantage that this macro eliminates the gettext code from the module completely.

Translating Messages

Once a program has been internationalized, the localization process can begin. The first thing to do is extract all the strings needing translation from the source code.

This automatic process is carried out by xgettext. The result is an editable .po (portable object) file. xgettext scans the source files passed as parameters and extracts each translatable string marked by the programmer with gettext or some other identifier.

Listing 2.

In our case, we can invoke xgettext in this way:

xgettext -a -d hello-world -k_ -s
-v hello-world.c

The resulting hello-world.po is shown in Listing 2.

I suggest you take a look at the gettext info documentation to learn about other useful switches. The ones I used here are defined in this way:

  • -a extracts all strings.

  • -d outputs the results in hello-world.po (the default is messages.po).

  • -k instructs xgettext to look for _ when searching translatable strings (the defaults gettext and gettext_noop are still looked for).

  • -s generates a sorted output and removes duplicates.

  • -v tells xgettext to be verbose when it generates messages.

At this point, the translator can simply fill hello-world.po with the messages without any knowledge of the source code. In fact, a program can be internationalized and compiled, before adding the new languages.

A portable object must be compiled into a machine object (a .mo file) to be useful. This is done with the command:

msgfmt -o hello-world.mo -v hello-world.po

Figure 1

Figure 1. A block diagram representing all the steps necessary to obtain a .mo file from a C source. The most critical part is running tupdate (see below) to merge the new, untranslated strings with the previous work without losing it.

The final step is copying hello-world.mo to a suitable location, where it can be found by the gettext system. On my Linux box, the default location is /usr/share/locale/LL/ LC_MESSAGES/ or /usr/share/locale/LL_CC/LC_MESSAGES/, where LL is the language and CC is the country. For example, the Italian translation should be placed in /usr/share/locale/it/ LC_MESSAGES/hello-world.mo.

textdomain must be called in the beginning of the program, so that the system can select the proper .mo file according to the current locale variables. In order of precedence (higher precedence first), they are LC_ALL, LC_MESSAGES and LANG.

A .mo file can be shared among many programs if the programmers decide to make it so. This is true with GNU fileutils, for example.

______________________

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions