Internationalizing Those Bash Scripts
The first software that I was actually paid to develop was a 2-page shell script that prompted the user for a dozen or so pieces of information, before launching a set of cooperating processes. Those processes formed the core of a performance evaluation suite for the public telephone network - a rather sizable system for its day with high visibility.
Thinking through that assignment and the greater application, I can say with complete certainty that none of its stakeholders were contemplating [human] language independence - that is, how to render prompts, error messages, progress diagnostics, etc. in a language other than US-English. Even if we had been thinking that progressively, the level of facilitation provided by development languages/platforms was either very limited or non-existent.
Fast forward to 2010, language independence - or Internationalization as it has come to be known - is something that is now expected of commercial grade software. That shell script that I had proudly written back in 1982 was one of a few application modules that interacted directly with the user or generated progress diagnostics. That is exactly the sort of shell script that would compel us to consider Internationalization.
My motivation to offer up this column is grounded in a recent experience. Our team was asked to assess the 'Internationalization readiness' of a large-scale legacy-system - that is, identify modules that were not internationalized and needed to be, and estimate the effort to apply all required changes. The gap was mainly found to be in modules implemented in interpreted languages such as Rexx, TCL and the bash shell. I found that while there seemed to be generally available documentation around Internationalization for most programming languages used in this application, there wasn't much to be found for shell scripting (at least nothing that provided a "how-to" with code samples). One of the more complete online resources I found was an appendix to a bash-scripting guide, which started out with the following sentence "Localization is an undocumented Bash feature.". Well, at least it offered some hope, basic information and code fragments. This column goes on to distill what I thought was missing in a complete but summary form.
The Big Picture (in a small frame).
First, let's agree on a common vocabulary - terms that begin to lay out a framework for the effort and code samples presented thereafter.
- Message Catalog: is an indexed repository of natural language messages used by Internationalized applications. The Message Catalog provides for the decoupling of the [human] language content and the application code. When an application needs to access a message at run time, something in the underlying processing stack knows how to retrieve it based on a unique key. The format and maintenance details of a Message Catalog is typically development-platform specific, but the goal is always the same - decouple and centralize the application's natural language text.
- Internationalization: the term Internationalization (hereafter referred to as its commonly known abbreviation: I18N - "I - eighteen letters - N") applies to the steps that software designers/developers take in order to make an application language-independent. At the coding level, user readable text is never compiled into the application or intermixed with a markup language. Instead, the application code refers to such content through unique message-catalog keys.
- Localization: (sometimes abbreviated as "L10N") applies to the process of adapting an application to specific target languages. If IN18 has been applied, Localization should not involve re-coding, but rather focuses on language translation and re-deployment. Stated another way, Localization is simply the process of adding support for a new Language - translating Message Catalog content from one language to another.
- Locale: is the part of a user's environment that defines location, country and culture information - most noticeably, the user's language preference. The Locale is typically installed and configured as part of the underlying operating system or rendering application such as a browser.
So let's summarize. We international, so that we can localize. I18N is a design and coding time effort that requires developers to adhere to certain design and coding practices with one primary goal in mind - decoupling language sensitive content from the source code. For every language that needs to be supported, a Localization effort is performed - creating a new Message Catalog for that language.
The good news here is that the I18N process need not start from first principals. Most modern development languages, including Bash, offer features that facilitate the basics - leaving the developer with the task of deciding how to integrate these basics into the lifecycle process and the code base.
In and Out of Scope
The only soft prerequisites to getting the most out of this material is a general understanding of I18N, (independent of programming language, as presented above) and a basic familiarity with shell scripting.
In the grand scheme, I18N/L10N goes beyond natural language independence. Although not the focus of this column, a Locale can include preferences that define date/time format, currency symbols, time zones, non-working days, … which all serve to drive aspects of processing and presentation. The process, coding and testing examples presented here only focus on language preference. It also should be noted that a rather simple example of Localization is presented - US English to Italian - languages that share the same alphabet (more or less). This precludes the need to cover details like extended character sets and the role of localized I/O devices such as keyboards. Other deeper and broader areas of I18N can be researched for further study through online and other resources. Here are some examples:
| Unicode character encoding standards | http://www.unicode.org |
| Decent I18N intro to I18n | http://www.debian.org/doc/manuals/intro-i18n/ |
| W3C related I18N Material | http://www.w3.org/International/ |
| Advanced Bash Scripting Guide | http://www.tldp.org/LDP/abs/html//td> |
The Moving Parts of I18N in Bash
Building on the fundamentals outlined above, let's move onto a real example. This section demonstrate how I18N and Localization are supported and applied in a bash environment, using a simple bash script to drive home concepts and details.
First, what sort of shell script elements are sensitive to natural language support? Well, the short answer is anything that a human user visually reviews as part of using an application. So that would include:
- Textual prompts to the user
- Error messages
- Progress or error diagnostics diverted to log files or presented on a console
- Help text, and other usage information and interactive documentation.
Just how does Bash facilitate I18N and Localization? We'll begin answering that question by presenting a shell script that cannot be considered internationalized. The short script below doesn't have much of a commercial value, but that "quality" will allow us to focus on the task at hand - identifying and applying changes to language sensitive areas. This script generates and displays a random number within a range provided by the user, and logs its activity.
- orig-rand.sh
#!/bin/bash
function random {
typeset low=$1 high=$2
echo $(( ($RANDOM % ($high - $low) ) + $low ))
}
# (1)
echo "Hello, I can generate a random number between 2 numbers that you provide"
#(2)
echo -n "What is your low number? "
read low
#(3)
echo -n "What is your high number? "
read high
if [[ $low -ge $high ]]
then
#(4)
echo "1st number should be lower than the second - leaving early." >&2
exit 1
fi
rand=$(random $low $high )
#(5)
echo "from/to generated (by/at): $low / $high $rand (${LOGNAME} / $(date))" >> /tmp/POC
#(6)
echo "Your Random Number Is: $rand "
exit 0
Running the script produces the expected output.
$: orig-rand.sh Hello, I can generate a random number between 2 numbers that you provide What is your low number? 50 What is your high number? 125 Your Random Number Is: 95 $:
Commented lines (1) through (6) have been flagged as requiring change - as they contain natural language. With this content identified, we can move onto creating a Message Catalog that can be used by an altered, internationalized script. To introduce the format, here's an example Message Catalog. It contains 2 messages - a greeting and an error message. The general format of the file consists of key/value line pairs. The "msgid" portion naming a key, and the "msgstr" portion associating a natural language value. Each Message Catalog supports exactly one language - in this case, US-English.
File: en.po
msgid "Main Greeting" msgstr "Welcome, what do you want to do today?" msgid "Missing File Error" msgstr "File Not Found"
Message Catalogs like this can be constructed manually, post processed and installed in the environment to support one or more application. (These Message Catalogs reside in files that are otherwise referred to as Portable Object files, and by convention, are named with a .po suffix).
Now let's construct a Message Catalog to maintain the user viewable content found in the example script above. Notice there are 6 distinct messages that line up with the content that was embedded in the original script.
File: en.po
msgid "Greeting" msgstr "Hello, I can generate a random number between 2 numbers that you provide" msgid "Low Number Prompt" msgstr "What is your low number" msgid "High Number Prompt" msgstr "What is your high number" msgid "Input Error" msgstr "1st number should be lower than the second - leaving early." msgid "Result Title" msgstr "Your Random Number Is: " msgid "Activity Log" msgstr "from/to generated (by/at): "
Okay, at least as far as the Message Catalog is concerned, we now have US English content covered. Now let's assemble one for another language - Italian.
File: it.po
msgid "Greeting" msgstr "Ciao, posso generare un numero casuale fra il numero 2 che assicurate" msgid "Low Number Prompt" msgstr "Che cosa il vostro numero basso" msgid "High Number Prompt" msgstr "Che cosa il vostro alto numero" msgid "Input Error" msgstr "il primo numero dovrebbe essere pi basso del secondo - andando presto." msgid "Result Title" msgstr "Il vostro numero casuale :" msgid "Activity Log" msgstr "da/al generato a (da/a):"
Notice that the "msgid" values are constant and have not changed. They will be used by a modified script - an internationalized script. Now that the language catalogs exist, what needs to be done to make them accessible by Internationalized scripts? Linux provides a utility called "msgfmt" that creates 'message object files' (*.mo) from portable object files (*.po), without changing the portable object files. Refer to the installed or online manual page for complete command line usage details. Executing the following commands will generate and install the message object files for both US-English and Italian.
msgfmt -o rand.sh.mo it.po
cp -p rand.sh.mo $HOME/locale/it/LC_MESSAGES/
msgfmt -o rand.sh.mo en.po
cp -p rand.sh.mo $HOME/locale/en/LC_MESSAGES/
Now that the Message Catalogs for two languages are installed, how can a bash script leverage them? The other Linux utility critical to our example is called "gettext".
Given a directory and file naming organization for the Message Catalogs, gettext provides access to the messages stored in the catalog. First, depicting how Message Catalogs must be stored on the file system, see the listing below. For each 2 letter language code ('en' and 'it' in our example), some number of "text domain" message object files are stored under a subdirectory called LC_MESSAGES. By convention, a text domain is related to a single application, but this is an organizational decision to be made when localizing.
Directory/file listing:
en en/LC_MESSAGES en/LC_MESSAGES/rand.sh.mo it it/LC_MESSAGES it/LC_MESSAGES/rand.sh.mo
As shown above, we chose to install the Message Catalogs under the user's HOME directory under a subdirectory called locale. System Message Catalogs that get distributed with Linux are normally found under /usr/lib/locale. Here's what some of the directory listing looks like on my distribution:
aa_DJ aa_DJ/LC_MESSAGES aa_DJ.utf8 aa_DJ.utf8/LC_MESSAGES aa_ER aa_ER/LC_MESSAGES aa_ER@saaho ... many others not shown
Retrieving a message stored in a Message catalog is very straightforward - the following 2 lines demonstrate basic access. See installed or online manual page for complete command line usage. Setting the environment variable TEXTDOMAINDIR to the base of the Message Catalog directory is required.
$: export TEXTDOMAINDIR=/home/lji/locale $: gettext -s "Greeting" Hello, I can generate a random number between 2 numbers that you provide $:
Notice that the invocation above compelled the 'gettext' utility to present the US-English copy of the message. This was driven by the language preference value assigned to the user's Locale. Without elaborating on the details, the 'locale' Linux utility displays the following values. Of course, the first value drives language preference.
$: locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" ?? other values not shown. $:
So if you're following along, the next natural question to ask is how to alter language preference. How can we test access to our Italian Message Catalog? Once again, without elaborating of the details, setting the environment variable LC_ALL to a value that includes language and country codes, will reset every Locale attribute. Notice the updated output from the 'locale' utility after Italian/Italy (it/IT) has been assigned as the language/country.
$: export LC_ALL="it_IT.UTF-8" $: locale LANG=it_IT.UTF-8 LC_CTYPE="it_IT.UTF-8" LC_NUMERIC="it_IT.UTF-8" LC_TIME="it_IT.UTF-8" ?? other values not shown. $:
Now if the same 'gettext' command is executed, we would expect to display the equivalent Italian content, and we do as shown below.
$: gettext -s "Greeting" Ciao, posso generare un numero casuale fra il numero 2 che assicurate $:
So if the 'msgfmt' and 'gettext' utilities are the core of basic I18N and Localization in the bash shell, what's the best way of internationalizing the original example script and other scripts like it? The first step I took was to build a thin convenience library, which offers 4 useful functions. I chose this general approach for two reasons: it insolates the lowest level details from the application code, and promotes code reuse by offering developers a straightforward way of dealing these common natural-language sensitive operations:
- displaying text to standard output
- displaying an error message
- prompting a user for a response
- logging a message to a file
The library code below sets the TEXTDOMAINDIR environment variable and implements 4 functions.
Source code for i18n-lib.sh
#!/bin/bash
##
# Thin library around basic I18N facilitated function
# basic text display, file logging, error display, and prompting
export TEXTDOMAINDIR=/home/lji/locale
###############################################
##
## Display some text to stderr
## $1 is assumed to be the Message Catalog key
function i18n_error {
echo "$(gettext -s "$1")" >&2
}
###############################################
##
## Display some text to sdtout
## $1 is assumed to be the Message Catalog key
## rest of args are used as misc information
function i18n_display {
typeset key="$1"
shift
echo "$(gettext -s "$key") $@"
}
###############################################
## Append a log message to a file.
## use $1 as target file to append to
## use $2 as catalog key
## rest of args are used as misc information
function i18n_fileout {
[[ $# -lt 2 ]] && return 1
typeset file="$1"
typeset key="$2"
shift 2
echo "$(gettext -s "$key") $@" >> ${file}
}
## Prompt the user with a message and echo back the response.
## $1 is assumed to be the Message Catalog key
function i18n_prompt {
typeset rv
[[ $# -lt 1 ]] && return 1
read -p "$(gettext "$1"): " rv
echo $rv
}
So how can we transform the original sample script to leverage this library - that is, internationalize it? See the re-implemented script below. There are 4 noticeable changes:
- The TEXTDOMAIN environment variable is set to the base application value
- Our I18N library file is sourced in.
- The user is given the opportunity to select Italian as the preferred language.
- All "echo" statements that directed natural-language content were replaced by calls to functions offered by the I18N library.
File: i18n-rand.sh
#!/bin/bash
##
# POC around i18n/Localization in a bash script
#(1)
export TEXTDOMAIN=rand.sh
I18NLIB=i18n-lib.sh
#(2)
# source in I18N library - shown above
if [[ -f $I18NLIB ]]
then
. $I18NLIB
else
echo "ERROR - $I18NLIB NOT FOUND"
exit 1
fi
## Start of example script
function random {
typeset low=$1 high=$2
echo $(( ($RANDOM % ($high - $low) ) + $low ))
}
#(3)
## ALLOW USER TO SET LANG PREFERENCE
## assume lang and country code follows
if [[ "$1" = "-lang" ]]
then
export LC_ALL="$2_$3.UTF-8"
fi
#(4)
# Display initial greeting
i18n_display "Greeting"
# ask for input
low=$(i18n_prompt "Low Number Prompt" )
high=$(i18n_prompt "High Number Prompt" )
# check for error condition and display error if found
if [[ $low -ge $high ]]
then
i18n_error "Input Error"
exit 1
fi
rand=$(random $low $high )
# Log what was just done
i18n_fileout "/tmp/POC" "Activity Log" "$low / $high $rand (${LOGNAME} / $(date))"
# Display Results
i18n_display "Result Title" $rand
exit 0
Now we can prove that it all works. Two test runs appear below - one using the English content and the other the Italian content.
$: i18n-rand.sh Hello, I can generate a random number between 2 numbers that you provide What is your low number? 100 What is your high number? 1000 Your Random Number Is: 615 ## now specify Italian as language preference $: i18n-rand.sh -lang it IT Ciao, posso generare un numero casuale fra il numero 2 che assicurate Che cosa il vostro numero basso? 500 Che cosa il vostro alto numero? 1000 Il vostro numero casuale : 601 $:
The content of the log file is as expected. Notice, that this script was not the only processing affected by changing the Locale. The output of the 'date' command shows the Italian abbreviation of Sunday (dom) and June (giu). Yes, Linux and all of its utilities are to be considered internationalized.
from/to generated (by/at): 50 / 125 95 (lji / Sun Jun 10 12:57:38 EDT 2010) from/to generated (by/at): 100 / 1000 615 (lji / Sun Jun 10 12:57:59 EDT 2010) da/al generato a (da/a): 500 / 1000 601 (lji / dom giu 10 12:58:48 EDT 2010)
Summary/Conclusions
Just as information exchange standards such as XML allow systems to be more interoperable, at its core, I18N allows applications to be more usable - by a broader, more global user base. I'm not suggesting that every trivial shell script necessarily warrants I18N, but because all commercial software is potentially a global commodity, language independence is something that needs to be considered - and considered early in the design/development process. The lack of such planning would be quite shortsighted in 2010. As with all core application services, I18N is much less expensive (overwhelmingly so) to address at the outset of a project rather than to shoehorn in a solution deep into a product lifecycle.
Every modern development language supports I18N / Localization in a unique way. But whether your application is a major web site or a 2-page shell script, the same general concepts always apply. Optimally, architects and designers set the tone by providing a convenient way for developers to leverage the existing I18N and Localization tools/APIs. Lead developers can and should implement a thin convenience wrapper around the low level details of obtaining content from a Message Catalog. Offering functionality at this level goes a long way to encourage developers to apply a common solution across all applications and prevent code bloat.
It may be sparsely documented, but there is real support in Linux and its bash shell for creating and using Message Catalogs. As a relatively small part of large-scale applications, shell scripts that present a textual interface, or control progress and error logging, are often forgotten in a sea of browser accessible content. It's just easy to forget the shell scripts. My hope is that the minor investment in time and effort put into assembling this material can be leveraged on development efforts that include shell scripts.
Miscellaneous Notes
- These code samples used here were built and tested on a Suse Linux 10.
- the google translator (http://www.google.com/translate_t) was used to translate the base English Message Catalog into Italian, so they may not be the most appropriate, in-context translations. More often than not, language translation for Locaization is performed by a human translator that's familiar with the application and its customer base.
Photo Credit: © asharkyu/Shutterstock
Louis Iacona has been designing and developing software since 1982 on UNIX/Linux and other platforms. Most recently, his efforts have focused on Java/J2EE constructed solutions for enterprise-scoped applications. Louis is currently on assignment at Je
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Using Salt Stack and Vagrant for Drupal Development
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
- Tech Tip: Really Simple HTTP Server with Python
- Kernel Problem
1 hour 31 min ago - BASH script to log IPs on public web server
5 hours 58 min ago - DynDNS
9 hours 34 min ago - Reply to comment | Linux Journal
10 hours 6 min ago - All the articles you talked
12 hours 30 min ago - All the articles you talked
12 hours 33 min ago - All the articles you talked
12 hours 35 min ago - myip
16 hours 59 min ago - Keeping track of IP address
18 hours 50 min ago - Roll your own dynamic dns
1 day 4 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
human translator for Italian
The above Google translation is ugly indeed. Here are my 2 cents:
Trouble with gettext on Ubuntu 10.04
I'm having trouble getting this to work on my system. I followed all the directions, but when I get to the part where I'm supposed to call "gettext," after setting the environment variable TEXTDOMAINDIR, I just get:
toby@toby-laptop:~/Desktop/i18n$ gettext -s "Greeting"
Greeting
No message... Just an echo of the msgid... I'm running Ubuntu 10.04
It looks like the author
It looks like the author forgot a step.
export TEXTDOMAIN=rand.sh
should be performed when step shown as export TEXTDOMAINDIR=/home/lji/locale is done (substituting the appropriate directory name). This issues is discussed here:
http://stackoverflow.com/questions/3848142/internationalizing-bash-scripts
re: TEXTDOMAIN
Hello - well, I'm truly sorry if that's the case ...
The code here was tested on the SuSe distro.
The TEXTDOMAIN setting either came along for the ride from the system profile (and went unnoticed by me), or was not required in SuSe.
I'll wager on the prior scenario.
Thanks for reading and commenting!
--Lou Iacona
Louis.Iacona@Verizon.net
Thanks for this info
Nice article, but it only scratches the surface. utf-8 is a nightmare.
I have serious problems getting utf-8 characters from bash and perl script on a redHat5 server into oracle,DB2 and mysql. Even though you use the same character encoding in the databases they are converted differently.
The ugly solution is to use blobs and store binary data. We usually just turns off
utf-8 via /etc/sysconfig/i18n so it just do not confuses the 8bit ascii characters.
Thanks for this info
Using utf-8, bash, Qt and PostgreSQL I have not any problems of character encoding or similar.
yes its true let me know some
yes its true
let me know some info about linux :D