Systran Internet Translation Technologies was born during the Cold War when the US government wanted to translate a large quantity of Russians texts quickly. At the end of the sixties, it became a private company called Systran, located in La Jolla, California.
In the nineties, Systran decided to dump the OS/390 running under MVS and port the whole system to UNIX. By then, PCs had become powerful enough to host the translation engines. An automatic translator was used to migrate most of the assembly code into C code.
The original port was made to Solaris, but they quickly switched to cheaper hardware, PCs and Slackware (they've since moved to Red Hat). The reasons for the choice of Linux were: it runs on a variety of hardware; it provides all the tools a developer may need; natural language processing uses large texts requiring powerful tools; the translation engine uses a large set of rules, hence the migration produced large C/C++ programs and needed powerful tools such as Make and gcc/g++; clients like AltaVista have a large audience and need a robust application on a stable system; and cost.
To these can be added the fact that drivers for newer hardware appear more quickly on Linux than on other platforms, and it uses marginally less resources. Linux provides a homogeneous configuration easy to replicate and is very scalable. Linux also comes with a firewall, sendmail, Apache, modperl and PostgreSQL, all of which are needed for Systran's on-line services (http://www.systranlinks.com/, http://www.systranet.com/). Moreover, environments like GNOME or KDE make it possible to put Linux in the hands of non-programmers as well. This is important because a large number of the Systran staff are linguists rather than programmers. Finally, POSIX compliance ensures that Systran can port easily to other forms of UNIX.
Systran software is behind most of the automatic translation done in the world. Clients include not only US government agencies and European institutions, but also AltaVista, Microsoft, Apple, Lycos and AOL.
Machine translation is at the confluence of linguistics and computer science. Developing a product is simply translating into computer language all the rules of human language. The main problem is a linguistic one, since you need to start with an accurate description of the languages concerned. There is a description of the source language (the analysis phase) and one of the target language (the synthesis phase).
The code is divided into four parts: 1) the analysis of the source language; 2) the synthesis of the target language; 3) the transfer rules; and 4) the common procedures to all translation engines, i.e., memory management, command-line management, dictionary lookup procedures, filters, pre-processing, post-processing, etc.
The dictionaries used are very specific; they do not only include the translation of the words (i.e., manger = to eat) but also syntactic and lexical information, such as “this verb is transitive, it can be used in this specific context in which case it means this.” There are three kinds of dictionaries. The first two are internal, one with simple word stems and the other with complex or idiomatic expressions. The third is external. The latter are created on-demand for a specific customer on a specific theme. Systran also has resource files that contain the flexions for the verbs or the declensions for languages that have them, as well as specific priority rules for the external (customer) dictionary and stylistic indications. All this is coded in C, although the newer extensions generally are coded in C++.
In order to produce the rules, linguists use a graphical interface coded in GTK. The data is stored in an ASCII file that goes through a Perl program to generate the macroinstructions from the data in the code. The dictionaries are built semiautomatically using the rules discovered during the analysis of the language. A unilingual master dictionary is created for each language; terminology is entered, and Systran's tools automatically add the relevant linguistic information on the base of tables. For example, “automatically” would be recognized as an adverb because it ends in “ally”. Bilingual dictionaries are then built by creating a simple double-entry list, which will then retrieve the relevant syntactic information from the master unilingual dictionary.
It is only at the last stage that the dictionaries are compiled into binary format in order to increase processing speed at runtime. When you clicked on the Translate button on AltaVista, you probably never thought the process behind it was so complex!
Systran is preparing a free Linux release with all the features of the Systran Personal Windows edition.
—Thunus F., Director of Systran Luxembourg
Doc Searls is Senior Editor of Linux Journal
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- Rogue Wave Software's Zend Server
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide