Internationalizing Messages in Linux Programs

An introduction to the GNU gettext system for producing multilingual programs.
Dealing with Messages in C Programs

Let's have a first look at the package GNU gettext. If you don't have it installed on your system, you can download it from ftp://prep.ai.mit.edu/pub/gnu/ or its mirrors.

When writing multilingual programs with this package, strings are “wrapped” in a function call instead of being coded directly in the source. The function is called gettext and accepts exactly one string argument and returns a string.

Despite its simplicity, gettext is very effective: the string passed as an argument is looked up in a table to find a corresponding translation. If a translation is found, then gettext returns it; otherwise, the passed string is returned and the program will continue to use a default language.

Our first, internationalized Hello, world! program could be:

#include <stdio.h>
#include <libintl.h>
void main(void) {
        textdomain("hello-world");
        printf(gettext("Hello, world!\n"));
}

Always remember to include <libintl.h> in each C program that makes use of the gettext package.

The function textdomain should be called before using gettext. Its purpose is to select the correct “database” of messages (a more appropriate term would be “message catalog”) for the program to use.

Then, each translatable string must be used as a parameter of gettext. Writing gettext("foobar") each time can be annoying. That's why many programmers use this macro:

#define _(x) gettext(x)

By doing so, the overhead introduced by internationalization of messages is quite small: instead of writing "foobar", one can just write _("foobar"). That's only three characters more per translatable string, with the advantage that this macro eliminates the gettext code from the module completely.

Translating Messages

Once a program has been internationalized, the localization process can begin. The first thing to do is extract all the strings needing translation from the source code.

This automatic process is carried out by xgettext. The result is an editable .po (portable object) file. xgettext scans the source files passed as parameters and extracts each translatable string marked by the programmer with gettext or some other identifier.

Listing 2.

In our case, we can invoke xgettext in this way:

xgettext -a -d hello-world -k_ -s
-v hello-world.c

The resulting hello-world.po is shown in Listing 2.

I suggest you take a look at the gettext info documentation to learn about other useful switches. The ones I used here are defined in this way:

  • -a extracts all strings.

  • -d outputs the results in hello-world.po (the default is messages.po).

  • -k instructs xgettext to look for _ when searching translatable strings (the defaults gettext and gettext_noop are still looked for).

  • -s generates a sorted output and removes duplicates.

  • -v tells xgettext to be verbose when it generates messages.

At this point, the translator can simply fill hello-world.po with the messages without any knowledge of the source code. In fact, a program can be internationalized and compiled, before adding the new languages.

A portable object must be compiled into a machine object (a .mo file) to be useful. This is done with the command:

msgfmt -o hello-world.mo -v hello-world.po

Figure 1

Figure 1. A block diagram representing all the steps necessary to obtain a .mo file from a C source. The most critical part is running tupdate (see below) to merge the new, untranslated strings with the previous work without losing it.

The final step is copying hello-world.mo to a suitable location, where it can be found by the gettext system. On my Linux box, the default location is /usr/share/locale/LL/ LC_MESSAGES/ or /usr/share/locale/LL_CC/LC_MESSAGES/, where LL is the language and CC is the country. For example, the Italian translation should be placed in /usr/share/locale/it/ LC_MESSAGES/hello-world.mo.

textdomain must be called in the beginning of the program, so that the system can select the proper .mo file according to the current locale variables. In order of precedence (higher precedence first), they are LC_ALL, LC_MESSAGES and LANG.

A .mo file can be shared among many programs if the programmers decide to make it so. This is true with GNU fileutils, for example.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState