Systran Internet Translation Technologies was born during the Cold War when the US government wanted to translate a large quantity of Russians texts quickly. At the end of the sixties, it became a private company called Systran, located in La Jolla, California.
In the nineties, Systran decided to dump the OS/390 running under MVS and port the whole system to UNIX. By then, PCs had become powerful enough to host the translation engines. An automatic translator was used to migrate most of the assembly code into C code.
The original port was made to Solaris, but they quickly switched to cheaper hardware, PCs and Slackware (they've since moved to Red Hat). The reasons for the choice of Linux were: it runs on a variety of hardware; it provides all the tools a developer may need; natural language processing uses large texts requiring powerful tools; the translation engine uses a large set of rules, hence the migration produced large C/C++ programs and needed powerful tools such as Make and gcc/g++; clients like AltaVista have a large audience and need a robust application on a stable system; and cost.
To these can be added the fact that drivers for newer hardware appear more quickly on Linux than on other platforms, and it uses marginally less resources. Linux provides a homogeneous configuration easy to replicate and is very scalable. Linux also comes with a firewall, sendmail, Apache, modperl and PostgreSQL, all of which are needed for Systran's on-line services (http://www.systranlinks.com/, http://www.systranet.com/). Moreover, environments like GNOME or KDE make it possible to put Linux in the hands of non-programmers as well. This is important because a large number of the Systran staff are linguists rather than programmers. Finally, POSIX compliance ensures that Systran can port easily to other forms of UNIX.
Systran software is behind most of the automatic translation done in the world. Clients include not only US government agencies and European institutions, but also AltaVista, Microsoft, Apple, Lycos and AOL.
Machine translation is at the confluence of linguistics and computer science. Developing a product is simply translating into computer language all the rules of human language. The main problem is a linguistic one, since you need to start with an accurate description of the languages concerned. There is a description of the source language (the analysis phase) and one of the target language (the synthesis phase).
The code is divided into four parts: 1) the analysis of the source language; 2) the synthesis of the target language; 3) the transfer rules; and 4) the common procedures to all translation engines, i.e., memory management, command-line management, dictionary lookup procedures, filters, pre-processing, post-processing, etc.
The dictionaries used are very specific; they do not only include the translation of the words (i.e., manger = to eat) but also syntactic and lexical information, such as “this verb is transitive, it can be used in this specific context in which case it means this.” There are three kinds of dictionaries. The first two are internal, one with simple word stems and the other with complex or idiomatic expressions. The third is external. The latter are created on-demand for a specific customer on a specific theme. Systran also has resource files that contain the flexions for the verbs or the declensions for languages that have them, as well as specific priority rules for the external (customer) dictionary and stylistic indications. All this is coded in C, although the newer extensions generally are coded in C++.
In order to produce the rules, linguists use a graphical interface coded in GTK. The data is stored in an ASCII file that goes through a Perl program to generate the macroinstructions from the data in the code. The dictionaries are built semiautomatically using the rules discovered during the analysis of the language. A unilingual master dictionary is created for each language; terminology is entered, and Systran's tools automatically add the relevant linguistic information on the base of tables. For example, “automatically” would be recognized as an adverb because it ends in “ally”. Bilingual dictionaries are then built by creating a simple double-entry list, which will then retrieve the relevant syntactic information from the master unilingual dictionary.
It is only at the last stage that the dictionaries are compiled into binary format in order to increase processing speed at runtime. When you clicked on the Translate button on AltaVista, you probably never thought the process behind it was so complex!
Systran is preparing a free Linux release with all the features of the Systran Personal Windows edition.
—Thunus F., Director of Systran Luxembourg
Doc Searls is Senior Editor of Linux Journal
|Free Today: September Issue of Linux Journal (Retail value: $5.99)||Sep 27, 2016|
|nginx||Sep 27, 2016|
|Epiq Solutions' Sidekiq M.2||Sep 26, 2016|
|Nativ Disc||Sep 23, 2016|
|Android Browser Security--What You Haven't Been Told||Sep 22, 2016|
|The Many Paths to a Solution||Sep 21, 2016|
- Free Today: September Issue of Linux Journal (Retail value: $5.99)
- Android Browser Security--What You Haven't Been Told
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Epiq Solutions' Sidekiq M.2
- The Many Paths to a Solution
- Securing the Programmer
- Identity: Our Last Stand
- Naztech's Roadstar 5 Car Charger
- Nativ Disc
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide