As I indicated earlier, Unicode is a complex standard, and it has taken some time for different languages and technologies to support it. For example, Perl 5.6.x used Unicode internally, but input and output operations couldn't easily use it, which made such support basically useless. Perl 5.8 by contrast has excellent Unicode support, allowing developers to write regular expressions that depend on Unicode properties.
There are still some problems, however. A major problem that developers have to deal with is the issue of input encoding vs. storage encoding, such as when your terminal might use Latin-1 but the back end might use UTF-8. This sort of arrangement means you can continue to use your old (non-Unicode) terminal program and fonts but connect to and use your Unicode-compliant back-end program.
Various implementations also have some holes, which might not be obvious when you first start to work on a project. For example, I recently worked on a J2EE project that used PostgreSQL on its back end and stored all of the characters in Unicode. Everything was fine until we decided to compare the user's input string with text in the database in a case-insensitive fashion. Unfortunately, the PostgreSQL function we used doesn't handle case insensitivity correctly for Unicode strings. We found a workaround in the end, but it was both embarrassing and frustrating to encounter this.
Collating, or sorting, is also a difficult issue—one that has bitten me on a number of occasions. Unicode defines a character set, but it does not indicate in which order the characters in that set should be sorted. Until recently, for example, “ch” was sorted as its own separate letter in Spanish-speaking countries; this was not true for speakers of English, German and French. The sort order thus depends not only on the character set, but on the locale in which the character set is being applied. You may need to experiment with the LANG and LC_ALL environment variables (among others) to get things to work the way you expect.
Unicode is clearly the way of the future; most operating systems now support it to a certain degree, and it is becoming an entrenched standard in the computer world. Unfortunately, Unicode requires unlearning the old practice of equating characters and bytes and handling a great deal of new complexities and problems.
If you only need to use a single language on your web site, then consider yourself lucky. But if you want to use even a single non-ASCII character, you will soon find yourself swimming in the world of Unicode. It's worth learning about this technology sooner rather than later, given that it is slowly but surely making its way into nearly every open-source system and standard.
Reuven M. Lerner (email@example.com) is a consultant specializing in web/database technologies. His first book, Core Perl, was published by Prentice Hall in January 2002. His next book, about open-source web/development environments, will be published by Apress in late 2003. Reuven lives with his wife and daughters in Modi'in, Israel.
Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report
August 27, 2015
12:00 PM CDT
DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.
Free to Linux Journal readers.Register Now!
|Secure Server Deployments in Hostile Territory, Part II||Jul 29, 2015|
|Hacking a Safe with Bash||Jul 28, 2015|
|KDE Reveals Plasma Mobile||Jul 28, 2015|
|Huge Package Overhaul for Debian and Ubuntu||Jul 23, 2015|
|diff -u: What's New in Kernel Development||Jul 22, 2015|
|Shashlik - a Tasty New Android Simulator||Jul 21, 2015|
- Secure Server Deployments in Hostile Territory, Part II
- Hacking a Safe with Bash
- KDE Reveals Plasma Mobile
- Huge Package Overhaul for Debian and Ubuntu
- Home Automation with Raspberry Pi
- The Controversy Behind Canonical's Intellectual Property Policy
- Shashlik - a Tasty New Android Simulator
- Embed Linux in Monitoring and Control Systems
- diff -u: What's New in Kernel Development
- Purism Librem 13 Review