Internationalizing Messages in Linux Programs

An introduction to the GNU gettext system for producing multilingual programs.
Dealing with Messages in C Programs

Let's have a first look at the package GNU gettext. If you don't have it installed on your system, you can download it from ftp://prep.ai.mit.edu/pub/gnu/ or its mirrors.

When writing multilingual programs with this package, strings are “wrapped” in a function call instead of being coded directly in the source. The function is called gettext and accepts exactly one string argument and returns a string.

Despite its simplicity, gettext is very effective: the string passed as an argument is looked up in a table to find a corresponding translation. If a translation is found, then gettext returns it; otherwise, the passed string is returned and the program will continue to use a default language.

Our first, internationalized Hello, world! program could be:

#include <stdio.h>
#include <libintl.h>
void main(void) {
        textdomain("hello-world");
        printf(gettext("Hello, world!\n"));
}

Always remember to include <libintl.h> in each C program that makes use of the gettext package.

The function textdomain should be called before using gettext. Its purpose is to select the correct “database” of messages (a more appropriate term would be “message catalog”) for the program to use.

Then, each translatable string must be used as a parameter of gettext. Writing gettext("foobar") each time can be annoying. That's why many programmers use this macro:

#define _(x) gettext(x)

By doing so, the overhead introduced by internationalization of messages is quite small: instead of writing "foobar", one can just write _("foobar"). That's only three characters more per translatable string, with the advantage that this macro eliminates the gettext code from the module completely.

Translating Messages

Once a program has been internationalized, the localization process can begin. The first thing to do is extract all the strings needing translation from the source code.

This automatic process is carried out by xgettext. The result is an editable .po (portable object) file. xgettext scans the source files passed as parameters and extracts each translatable string marked by the programmer with gettext or some other identifier.

Listing 2.

In our case, we can invoke xgettext in this way:

xgettext -a -d hello-world -k_ -s
-v hello-world.c

The resulting hello-world.po is shown in Listing 2.

I suggest you take a look at the gettext info documentation to learn about other useful switches. The ones I used here are defined in this way:

  • -a extracts all strings.

  • -d outputs the results in hello-world.po (the default is messages.po).

  • -k instructs xgettext to look for _ when searching translatable strings (the defaults gettext and gettext_noop are still looked for).

  • -s generates a sorted output and removes duplicates.

  • -v tells xgettext to be verbose when it generates messages.

At this point, the translator can simply fill hello-world.po with the messages without any knowledge of the source code. In fact, a program can be internationalized and compiled, before adding the new languages.

A portable object must be compiled into a machine object (a .mo file) to be useful. This is done with the command:

msgfmt -o hello-world.mo -v hello-world.po

Figure 1

Figure 1. A block diagram representing all the steps necessary to obtain a .mo file from a C source. The most critical part is running tupdate (see below) to merge the new, untranslated strings with the previous work without losing it.

The final step is copying hello-world.mo to a suitable location, where it can be found by the gettext system. On my Linux box, the default location is /usr/share/locale/LL/ LC_MESSAGES/ or /usr/share/locale/LL_CC/LC_MESSAGES/, where LL is the language and CC is the country. For example, the Italian translation should be placed in /usr/share/locale/it/ LC_MESSAGES/hello-world.mo.

textdomain must be called in the beginning of the program, so that the system can select the proper .mo file according to the current locale variables. In order of precedence (higher precedence first), they are LC_ALL, LC_MESSAGES and LANG.

A .mo file can be shared among many programs if the programmers decide to make it so. This is true with GNU fileutils, for example.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix