Using Unicode to Power the World's Largest Democracy
With the upcoming US elections occurring this November, there have been lot of discussions regarding the use of open source in the technology that facilitates the electoral process. The center of attention and focus of these discussions has been electronic voting machines. But half-way across the planet, in the State of Maharashtra, India, the focus is on another aspect of the election process--the voter list. In a multilingual country such as India, access to the voter list in local languages is as pivotal as the democratic activity of voting itself.
"Through the voter list, we wish to strengthen the hands of the Chief Election Commissioner (CEC), the statutory body orchestrating the great dance of democracy. We also believe that the citizen must be given due rights and respect as compared to what the official machinery can presently afford", explains Professor Jitendra Shah, whose IndicTrans team came up with the idea. Prof. Shah is a known name in the localization development community in India for his gnubhaaratii "Indianized" live-CD distro. He also has converted legacy documents to Unicode for the director of IT in the Maharashtra government.
Getting back to the issue at hand, Prof. Shah explains that the voter list data already is computerized and available in local languages. But there is no provision in the system for a public interface in Indian languages. He believes that Linux and free software, localized in all Indian languages, and the Unicode standard alone can provide an affordable universal interface. "It will provide access to people who wish to work with proprietary software as well as those who wish to use free software", he says.
The voter list will be only a starting point in the move to generalize Indian language-enabled e-governance initiatives.
The first issue concerns the voter list as such. In its current form, the lists are composed in Indian languages--for example, in Marathi for Maharashtra--and stored in databases. The storage traditionally has been using ISCII, a national font-independent standard. But when the data has to be displayed it inevitably has to be done in vendor-specific font-encoding, such as ISFOC. In addition, in order to not expose this data to the public, the electronic data files are not published by the CEC. On a side-note, whole lists easily can be downloaded from state sites for Delhi, Kerala or Andhra Pradesh and so on. Hence in Maharashtra, the Electoral Office converts the rolls to PDF files that then are displayed on its Web site. This requires visual scanning of page after page to look up one's name.
This method doesn't make extracting information an easy process for non-skilled computer users. Although it is possible to use certain tools to extract the information from the PDF files, you can do this for only certain files, not all of them.
Now, suppose your name was misspelled. To get the information updated, you would have to write a letter to the CEC by hand. And guess what? The content in the files (in its current state) would require some proprietary software to make any amendments.
What Prof. Shah is proposing is "citizen-friendly access". According to him, the same data should be available in a public, font-independent standard that is multilingual and accepted in all major operating systems--Unicode.
Shah also believes the CEC should relax its policy of restricting access to this data. If CEC decides to share the data--with appropriate security checks, of course--it would be imperative to shift to standards that can be accessed without proprietary software.
Prof. Shah asserts that public information, such as voter lists, must be available in formats that follow open public standards and must be available for amending and interaction without any expenditure on or binding to a closed software system of a specific vendor. It is, of course, subject to the law of the land as to how much access is given to the lay citizen.
Access to free software ensures that an official at any administrative level could make the amendments and submit the same to a higher authority for approval.
Now that we understand the difficulties in the existing system and how we want the situation to be handled, let's look at the technology that bridges the gap.
As already stated, the government has to adopt an accepted standard for maintaining its records. Unicode is the best option available, and according to Prof. Shah, both the state and the central government has accepted it as its future direction. All e-Governance projects most likely will be funded for conversion from older standards or non-standards to Unicode. For people not yet convinced about the usability of Unicode, Prof. Shah has tabulated various configurations to demonstrate its compatibility. The table is available here.
As a proof of concept, Prof. Shah and his team have converted the content for the voter list from the non-standard font-encoded format to the standard Unicode format. You can view screenshots, see sample converted files and search on sample data here.
As a demonstration of the power and use of Unicode, the team also has converted the Marathi files into Gujarati. This proves that interlingual translatability is not exclusive to ISCII and even Unicode can achieve the same.
Prof. Shah's team now is adept at converting various formats of data to and from Unicode. They parse the files from their software-specific structure, say .rtf, .dbf or .html, to a generic structure. This often is done implicitly or explicitly. Then, the information is converted either as files or on the fly. The same can be restored to its original structure whenever needed.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Non-Linux FOSS: Caffeine!
- SuperTuxKart 0.9.2 Released
- Doing for User Space What We Did for Kernel Space
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- SourceClear Open
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide