Using Unicode to Power the World's Largest Democracy
With the upcoming US elections occurring this November, there have been lot of
discussions regarding the use of open source in the technology that
facilitates the electoral process. The center of attention and focus of
these discussions has been electronic voting machines. But half-way across
the planet, in the State of Maharashtra, India, the focus is on
another aspect of the election process--the voter list. In
a multilingual country such as India, access to the voter list in local
languages is as pivotal as the democratic activity of voting itself.
The Objective
"Through the voter list, we wish to strengthen the hands of the
Chief Election Commissioner (CEC), the statutory body orchestrating
the great dance of democracy. We also believe that the citizen must
be given due rights and respect as compared to what the official
machinery can presently afford", explains Professor Jitendra Shah, whose
IndicTrans team came up with
the idea. Prof. Shah is a known name in the localization development
community in India for his gnubhaaratii "Indianized" live-CD distro. He
also has converted legacy documents to Unicode for the director of IT in the
Maharashtra government.
Getting back to the issue at hand, Prof. Shah explains that the voter list
data already is computerized and available in local languages. But there is
no provision in the system for a public interface in Indian languages.
He believes that Linux and free software, localized in all Indian
languages, and the Unicode standard alone can provide an affordable universal
interface. "It will provide access to people who wish to work with
proprietary software as well as those who wish to use free software",
he says.
The voter list will be only a starting point in the move to generalize
Indian language-enabled e-governance initiatives.
Understanding the Issues in Detail
The first issue concerns the voter list as such. In its current form,
the lists are composed in Indian languages--for example, in Marathi for
Maharashtra--and stored in databases. The storage traditionally has been
using ISCII, a national font-independent
standard. But when the data has to be displayed it inevitably has to be
done in vendor-specific font-encoding, such as ISFOC. In addition, in
order to not expose this data to the public, the electronic
data files are not published by the CEC. On a side-note, whole lists
easily can be downloaded from state sites for Delhi, Kerala or Andhra
Pradesh and so on. Hence in Maharashtra, the Electoral Office converts the
rolls to PDF files that then are displayed on its Web site. This
requires visual scanning of page after page to look up one's name.
This method doesn't make extracting information an easy process for non-skilled
computer users. Although it is possible to use certain tools to extract the
information from the PDF files, you can do this for only certain files,
not all of them.
Now, suppose your name was misspelled. To get the information updated,
you would have to write a letter to the CEC by hand. And guess what? The
content in the files (in its current state) would require some
proprietary software to make any amendments.
The Ideal World Case
What Prof. Shah is proposing is "citizen-friendly access". According
to him, the same data should be available in a public, font-independent standard
that is multilingual and accepted in all major operating systems--Unicode.
Shah also believes the CEC should relax its policy of restricting access to this
data. If CEC decides to share the data--with appropriate security
checks, of course--it would be imperative to shift to standards that can
be accessed without proprietary software.
Prof. Shah asserts that public information, such as voter lists, must be
available in formats that follow open public standards and must be available
for amending and interaction without any expenditure on or binding to a
closed software system of a specific vendor. It is, of course, subject to
the law of the land as to how much access is given to the lay citizen.
Access to free software ensures that an official at any administrative
level could make the amendments and submit the same to a higher authority
for approval.
Proof of Concept
Now that we understand the difficulties in the existing system and
how we want the situation to be handled, let's look at the technology that
bridges the gap.
As already stated, the government has to adopt an accepted standard
for maintaining its records. Unicode is the best option available, and
according to Prof. Shah, both the state and the central government
has accepted it as its future direction. All e-Governance projects
most likely will be funded for conversion from older standards
or non-standards to Unicode. For people not yet convinced
about the usability of Unicode, Prof. Shah has tabulated various
configurations to demonstrate its compatibility. The table is
available here.
As a proof of concept, Prof. Shah and his team have converted
the content for the voter list from the non-standard font-encoded
format to the standard Unicode format. You can view screenshots,
see sample converted files and search on sample data
here.
As a demonstration of the power and use of Unicode, the team also has
converted the Marathi files into Gujarati. This proves that interlingual
translatability is not exclusive to ISCII and even Unicode
can achieve the same.
Prof. Shah's team now is adept at converting various formats of data to
and from Unicode. They parse the files from their software-specific
structure, say .rtf, .dbf or .html, to a generic structure. This
often is done implicitly or explicitly. Then, the information is converted either as
files or on the fly. The same can be restored to its original structure
whenever needed.
The Response
The Election Office has yet to decide on the adoption of the solution. The
software solution has been proposed first as working for standalone
machines for telephone help-lines. It may be extended for use on the
Internet.
Prof. Shah feels that the bureaucracy hesitates with the technology
because many commercial software vendors have been promising a lot, but
so far they have not delivered. Often, support is a problem. The need for support is
felt even more acutely in the open-source domain. Plus, there is the added
responsibility that comes with the freedom associated with
open-source software, which the bureaucracy is not equipped to handle.
Prof. Shah, acting as a teacher, finds the bureaucrats to be quite
amenable to open source. This is partly because no price tag is attached to his
opinions and nor a hidden agenda other than making the democratic process more
democratic.
Mayank Sharma is a 21-year-old technology writer and developer from
India. He does his bit to highlight and strengthen localization efforts in
India and is working on connecting FLOSS with students and the education
system.










This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
Voter list a breaking news!
Some updates:
We have shifted the old demo to a new more reliable server. It is now available here: http://demo.indictranstech.com/voterlist/
There are some news items recently on this specific issue where the election commission has been asked this question that in spite of this good solution existing, why is it that the commission is not hosting it on their site as service and why is it still providing an inferior solution based on non standard fonts.
- http://www.loksatta.com/daily/20090321/ipl09.htm (Loksatta Marathi daily - 21st March)
- Asian Age (Mumbai) - 24th March 2009 page no.4 (Online copy not available)
Let’s hope Election Commission wakes up now!
Link to the demo
See http://demo.binyasit.com/indictrans/voterlist/ for a live demo of the Unicode based voter list search engine (mentioned in the article) with data from Dharavi Constituency (Mumbai)
Why aren't some of the linked pages in UTF8/16?
There's probably a good reason but some of the links like like this one say they are using iso8859-1 but leave my browser showing what I assume are nonsense characters. Is this because of some proprietary encoding or are am I ill equipped font wise?
Re: Why aren't some of the linked pages in UTF8/16?
Dear Sir
Elx happens to be the first and the only Linux Distribution company of
India. The desktop has been rated as the closest to Windows XP by the
technical Guru's in USA. We are three year old organization. We have
commercially launched the products in the last week of May in India.
Positive reviews have started coming up in the media. In the past we
received rave reviews in the western media too, some thing which is
unheard of for Indian products. As per Desktop Linux.Com ? Elx matches
Windows XP function per function?scores higher in terms of
functionality?scores higher in terms of manageability??. PC Magazine
Brazil has rated Elx as ?.. Superior to Red Hat.?
The desktop, namely Biz Desk 4.0 is costing Rs 750 and comprises of the
OS and every day usage application software.
We would be keen to give solution to your organization for this
Problem. Kindly let us know when and to whom should we give the
demonstration of Elx Biz Desktop 4.0.
With warm regards
Devkant Aggarwal
(Sr. Marketing Executive)
Everyones's Linux Pvt. Ltd.
1506-1507,Devika Towers
Nehru Place
New Delhi-110019
Mobile:011-31254780
Ph:011-51013188 / 87
Re: Why aren't some of the linked pages in UTF8/16?
It's not you. I have the right fonts and the page still looked liked gibberish until I changed the encoding to utf-8 in konqueror.
-- Jaldhar
Re: Why aren't some of the linked pages in UTF8/16?
Looks like the page is encoded in utf-8 but the encoding tag is wrong in the document
Post new comment