Introduction to Internationalization Programming
In the old days when only a few people used computers, they were, for the most part, English speakers. Today, computers are widely available, and differences in languages, traditions and cultures need to be reflected in the world of programming. This article introduces the GNU gettext system.
The idea of using the same program but changing its properties according to the cultural traditions of different peoples is called internationalization. However, because programmers like to make words shorter, instead of typing 20 characters they type only four: i18n. I18n means programming designed to handle many languages.
Once you've written an i18n program, you may want to add a new language. This is not an i18n problem. In general, you need a person who will translate messages from the program for a specific nation. This problem is called localization, or i10n. I10n refers to the implementation of a specific language for an internationalized software, or in other words, the creation of localized objects according to the specific region's rules.
Although each organization and company that designs and distributes software tries to implement this in its own way, in general, the i18n idea is simple. Software should be created with two parts in mind: common and nation-dependent. This second part is known as localized objects.
Hopefully, standards will make life more comfortable. The basic concept of locale was introduced by the ISO (International Standard Organization) with the C standard in 1990, which was expanded in 1995. POSIX also has rules for i18n, so the term POSIX locale is used together with National Language Support (NLS). Formally, NLS is not a part of POSIX but has some functions that help when using the POSIX locale. X11 has its own i18n implementation, but the common way for programmers is to move the X11 i18n/i10n “up a level” into the POSIX/NLS locale. [Other software has its own i18n and i10n. See “Bridging the Digital Divide in South Africa” for one way to handle it].
What should we take into account when speaking of locale? Of course, the name of the language, but that is not enough. Everybody knows there are differences between American and British English, so we also have to know where the particular language is used, or in other words, the territory, taking into account individual traditions and cultural rules.
Every language has its own system of writing, and sometimes even several. Languages have alphabets, or character repertoires, but computers deal with digits. So, a character should be associated with a digit. This kind of association is called a coded character set (CCS). There are plenty of them, and each has its own name, such as ASCII, ISO-8859-1, KOI8-U. Instead of CCS, the term charset is often used. There is no special standard for the name of a charset, so ISO-8859-1, ISO8859-1 and iso8859-1 all refer to the same thing. There are some definitions from IANA, the organization that also is responsible for the registration of charsets (see Resources). As you probably know, the X11 system has its own system for charset naming, and their document “Logical Font Description Conversion” (described in Jim Flower's article, see Resources) provides a good name and alias charset creating system.
Charsets are important. Some countries have several different charsets for the same language! In the Ukraine, for instance, the same text may be displayed nicely in koi8-u but may be absolutely unreadable if a terminal uses the Ukrainian charsets iso8859-5, cp1251 or Unicode. In those cases, we would have to convert the text from one charset into another.
In order to take all of this into account, the POSIX locale defines some things that all together are called locale categories. They are shown in Table 1. Knowing them is important; C functions work differently with different locales! Categories are reflected in shell as environment variables with the same name. An example of using LC_ALL is shown in Listing 1.
Listing 1. Example of Using LC_ALL
The syntax to build a locale name looks like this:
language[_territory][.codeset][@modifier]
where language is represented by two lowercase letters, such as en for English and fr for French; territory is represented by two uppercase letters, such as GB for United Kingdom and FR for France, and in these two cases, euro would be the modifier. So, you can change your locale by setting the corresponding environment variables. See Listing 1, where we use the programs date, cat and our example program, counter. Note, we use only language and territory; we cannot change the charset for the terminal with this command. Now imagine that the program messages are written in one charset but are output in another. POSIX does not have functions to determine the current charset, but XPG has nl_langinfo(). In some distributions, the man page for this function may be missing (Debian does not have it, but SuSE and Red Hat do). In any case, you can obtain additional information from /usr/include/langinfo.h. To determine the current charset, use the following code:
#include <locale.h>
#include <langinfo.h>
...
setlocale(LC_ALL,"");
printf ("Current charset = %s\n",
nl_langinfo(CODESET));
To convert from one coding into another for the correct output, you
can use the conv() function. For more details, consult
“Introduction to i18n” (see Resources).
In order to provide output information, a message catalog for that locale was created. This means that all software messages are kept separately from a program that may have (and must have) its own catalog. NLS provides a set of utilities for creating and supporting such catalogs, as well as functions for extracting information according to three keys: 1) program name, 2) current categories of locale and 3) pointer to a particular message to be output.
There are two general realisations of the NLS mechanism:
X/Open Portability Guide XPG3/XPG4/XPG5 with the functions catopen(), catgets() and catclose() and the gencat utility.
SUN XView with functions gettext() and textdomain(). The GNU Project has its own fully compatible release called GNU gettext.
Usually programs, as well as system libraries, use one (or even two) NLS systems.
Although XPG5 is included in the UNIX specification version 2, and all versions of UNIX systems support it, with Linux, GNU gettext is the most popular solution.
The POSIX locale has the following components:
Locale API, i.e., subroutines like setlocale(), isalpha(), etc.
Shell environment variables to manage locale categories.
The locale utility to get information about the current locale; see man locale for more details.
Objects of localization. The default directory for their location is /usr/share/locale/.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Dart: a New Web Programming Experience
- Developer Poll
- What's the tweeting protocol?
- May 2013 Issue of Linux Journal: Raspberry Pi
- Reply to comment | Linux Journal
3 hours 52 min ago - Reply to comment | Linux Journal
4 hours 39 min ago - Web Hosting IQ
6 hours 13 min ago - Thanks for taking the time to
7 hours 49 min ago - Linux is good
9 hours 47 min ago - Reply to comment | Linux Journal
10 hours 4 min ago - Web Hosting IQ
10 hours 34 min ago - Web Hosting IQ
10 hours 35 min ago - Web Hosting IQ
10 hours 36 min ago - Reply to comment | Linux Journal
13 hours 36 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
linux i18n support sucks
Wow, linux internationalization support sucks.
There's so little documentation, and no standard system support.
Windows is much better in this aspect.