Indian Language Solutions for GNU/Linux
South Asia, home to nearly one-sixth of humanity, is struggling to attain regional language solutions that would make computing accessible to everyone. Even if most are poor and have low purchasing ability, this could open the floodgate to greater computing power and much-needed efficiency in a critical area of the globe. However, some call Indic and other South Asian scripts the final challenge for full-i18n support.
Some Indian regional languages are larger than those spoken by whole countries elsewhere. Hindi, with 366 million speakers, is second only to Mandarin Chinese. Telugu has 69 million; Marathi, 68 million; and Tamil, 66 million. Sixteen of the top 70 global languages are Indian languages with more than 10 million speakers. Other languages spoken in India are also spoken elsewhere. Bengali has 207 million speakers in India and Bangladesh, and Urdu has 60 million in Pakistan and India.
The Simputer is a simple and relatively inexpensive Linux computer for people in Indian villages. The creation of the Simputer is being organized with a hardware license, the Simputer General Public License, modeled on the GPL. Although the license provides for free publication of specifications, it does require a one-time royalty payment before licensees sell Simputers.
dhvani is a text-to-speech system for Indian languages developed by the Simputer Trust developers and others. It is promising to have a better phonetic engine, Java port and language-independent framework soon. (See sourceforge.net/projects/dhvani.) Meanwhile, IMLI is a browser created by the Simputer Trust for the IML markup language. It is designed for easy creation of Indian language content and is integrated with the text-to-speech engine.
In Kerala, a southern state with an impressive 90% literacy rate whose language Malayalam is spoken by 35 million people, senior local government official Ajay Kumar (email@example.com) is leading an initiative to make GNU/Linux Malayalam-friendly: “We propose to develop a renderer for our language. Specifically, we are looking for a renderer for Pango (the generic engine used with the GTK toolkit).”
He adds, that in nine months time, “we want to create an atmosphere where language computing in Malayalam improves.” He also says, “We are confident that once we deliver the basic framework, others will start localizing more applications in Malayalam.”
At the toolkit level, GTK and Qt are the most used. GTK already has a good framework through the Pango Project and has basic support for Indian languages. Qt also now has Unicode support for all languages, but rendering is not yet ready.
International efforts also are helping India. Yudit, the free Unicode text editor, now offers support for three South Indian languages: Malayalam, Kannada and Telugu. Delhi-based GNU/Linux veteran Raj Mathur commented, “The current version of Yudit has complete support for Malayalam and other Indic languages. It can also use OpenType layout tables of Malayalam fonts. I think Yudit is the first application that can use OpenType tables for Malayalam.”
K Ratheesh was a student of the Indian Institute of Technology-Madras (at the South Indian town of Chennai) when he worked on enabling the GNU/Linux console for local languages a couple years ago. He said:
As the [then] current PSF format didn't support variable width fonts, I have made a patch in the console driver so that it will load a user-defined multiglyph mapping table so that multiple glyphs can be displayed for a single character code. All editing operations also will be taken care of.
In Indian languages, there are various consonant/vowel modifiers that result in complex character clusters. “So I have extended the patch to load user-defined, context-sensitive parse rules for glyphs and character codes as well. Again, all editing operations will behave according to the parse rule specifications”, Ratheesh commented.
Ratheesh also said, “Even though the patch has been developed keeping Indian languages in mind, I feel it will be applicable to many other languages (such as Chinese) that require wider fonts on console or user-defined parsing at I/O level.”
The package, containing the patch, some documentation, utilities and sample files then weighed in at around 100KB.
- Machine Learning Everywhere
- Own Your DNS Data
- Bash Shell Script: Building a Better March Madness Bracket
- Understanding OpenStack's Success
- Simple Server Hardening
- Understanding Firewalld in Multi-Zone Configurations
- From vs. to + for Microsoft and Linux
- Natalie Rusk's Scratch Coding Cards (No Starch Press)
- Returning Values from Bash Functions
- Ensono M.O.