Introduction to Internationalization Programming

Olexiy describes the basic aspects of i18n and provides a sample i18n output program.
Making a Program I18n-Compliant

If you are going to write a real i18n program, it would be wise to think that you know nothing about a specific language and take charsets into account. Ideographic languages have many more than 26 letters: Japanese has about 2,000, and Chinese has about 5,000. To deal with such characters, the POSIX locale has multibyte and Wide Class (wchar_t). The latter is done for Unicode. To convert one into another, functions like mblen(), mbstowcs(), wctomb(), mbtowc() and wcstombs() are used. However, using Unicode is beyond the scope of this article.

Producing real multilingual software is a complex task. Hopefully, the GNU gettext system that now conforms with SUN XView, will help you write i18n programs.

Figure 1. Producing an i18n Program

Figure 1 represents all necessary steps for producing an i18n program:

  • To create an i18n version, you have to edit a non-i18n program. If you use a special editor mode you will create an additional file at the same time, called a POT file, where PO stands for portable object, and the letter T is for template.

  • If you merely make a revision of an existing i18n program, or if a POT file does not exist, you have to use the xgettext program to produce it.

  • Copy the template file into ll.po, where ll refers to a certain language.

  • Translate messages into the language ll.

  • Create a file with the msgfmt program (mo stands for machine object). Sometimes you can see gmo files (g stands for GNU).

  • Compile your source; put the binary program and files into the right place. This and the previous steps are better accomplished with a Makefile.

Before looking briefly at all the steps of a simple program, please read these golden rules of internationalization:

1) Put the following lines into the non-executable part of your program, and mark messages in the source file as _("message") instead of "message" in the executable part of the program and N_("message") in the non-executable part. Pay attention to the output to guarantee passing the strings declared as constants through gettext, i.e., in the non-executable part:

#include <libintl.h>
#include <locale.h>
#define _(str) gettext (str)
#define gettext_noop(str) (str)
#define N_(str) gettext_noop (str)

2) Start your program by setting the locale:

setlocale (LC_ALL, "");
3) Indicate the message catalog name, and if necessary, its location:
textdomain (PACKAGE);
PACKAGE and LOCALEDIR usually are provided either by config.h or by the Makefile file.

4) To check a symbol's properties and conversion, use calls like isalpha(), isupper(), ..., tolower() and toupper().

5) To compare strings, use the strcoll() and strxfrm() functions instead of strcmp().

6) To guarantee portability with old versions of locale, use a variable of type unsigned char for symbols, or compile your program with the -funsigned-char key.

Let's make a simple internationalized program in which these rules are ignored (Listing 2). The program outputs an invitation to type, reads a string from the terminal and counts the digits in it. The results of this counting are output in the terminal, then the program exits.

Listing 2. A Non-I18n Program

Because the program is small, we can change it easily according to the rules with your favorite editor; if the program is large, it is better to use special tools. Editors like (X)Emacs or vi with po mode can create a counter.pot file at the same time that you are changing the program source!

The changed file is shown in Listing 3. Lines 4-8 are added according to rule 1. Definitions from the locale.h file may not be necessary; they may be included within the libintl.h definitions. Writing gettext and gettext_noop many times is annoying, so we will use macros, as defined in lines 6-8. Using gettext_noop is an example of pre-initialized strings at the compile stage. A possible solution is shown in our program where using gettext_noop allows the strings to be recognized by gettext at the time of executing.

Listing 3. I18n Version of the Program Shown in Listing 2

Without line 15 (rule 2), the program will not understand your locale and will use the C locale. Note that sometimes it is necessary to set special categories of locale, such as LC_CTYPE and LC_MESSAGES. See man setlocale and Table 1 in this article for more information.

Table 1. Categories of Locle and Shell Variables

Lines 16 and 17 were inserted according to rule 3. Usually the parameters of these calls are provided in either a Makefile or a special file (like config.h) that holds configuration information, but in this program we put in the names directly. According to line 16, searching will be started in the current directory. If the line with the call is omitted, Linux will use the default location, /usr/share/locale.

The call textdomain() must be presented in any i18n program. It points the gettext system to the filename with i10n messages.

Lines 19, 25 and 26 where changed according to rule 1. Lines 19 and 25 are simple: instead of using strings directly, we call them through gettext to use a message catalog. Line 26 demonstrates the exception. We cannot transform strings defined in the non-executable part through gettext because there the values are initialized before running the program by the compiler. The problem is solved according to rule 1. We marked the strings with N_ in line 12 to make them recognizable by xgettext; we used _(mess) instead of mess in line 26, as with normal strings. We do not need to do more, because of the function isdigit (see rule 4).

Now the program is internationalized. Compiling and running it, however, produces exactly the same result as the previous non-i18n one. Messages from the counter.pot file have to be translated into a specific language.

There is another way to create an initial .pot file. Once you have an i18n program, you can use xgettext. This scans the source files and creates corresponding strings for translation. In our case, we can invoke it like this:

xgettext -o counter.pot -k_ -kN_ counter.c

where -o is for output file name and -k_ -kN_ is to extract strings that start with corresponding symbols. Consult info xgettext to get more details.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

linux i18n support sucks

Anonymous's picture

Wow, linux internationalization support sucks.
There's so little documentation, and no standard system support.
Windows is much better in this aspect.