Let Linux Speak
“User root is now on-line”. Words to be dreaded when one is away from the terminal, and not logging in otherwise. But how does one know what is going on with one's machine when not in front of it? If only the machine could tell you. In this article I discuss a tool which enables your machine to do just that.
It all started a year back, when, thumbing through one of those odd electronic magazines, I came across an ad for a little speech synthesizer. This device was essentially a low cost serial-based text-to-speech synthesizer using the SPO256-AL2 chip. I believe this was the same chip used in the original Mattel “Speak & Spell” toy.
After a couple of months, I thought about it again and decided I just had to have it. Certainly, the price was right (about US $50.00), and serial ports grew on my main Linux machine like branches on a tree. So I ordered one. After a few weeks, I called and was told my order had just been hand-made and would be out in a few days. It is a delight to find hand-made electronics in these modern times—almost like the days when furniture manufacturing involved real craftsmanship.
In any case, the unit arrived as promised, complete with schematics, a disk filled with DOS programs and a thin manual. The disk I have yet to look at; after all, this was for use with a Linux machine. The board slid into a PC slot easily enough. The card uses the PC slot for power only. An RS-232 connector in the back connects to a serial port. A separate stand-alone power unit and case is available for $29.00 more. But having another power pack to plug in was enough to keep me awake at night. A slot I could afford; though I now foresee the time when I will fill up all eight slots in the machine.
The board has its own built-in speaker and an RCA jack. The RCA jack I quickly adapted to feed the background music (BGM) source on my PBX at home. (Okay, so it's really a Panasonic digital hybrid key system, to be technical, although it has ambitions.) I connected the serial port and got a brief noise as DTR was raised. I shortly learned this was supposed to say “Okay”, but the impedance-matching on the RCA jack was poor.
Next, I changed the stty settings on the port to match the speed I had selected for the device via dip switches, and, with high expectations, I tried a simple test:
echo "Hello, my name is Rochester" >/dev/ttyS2
The monotone response I received back sounded a little like “Hewlo, my name is Rokheestar” and reminded me of my last visit to Atlanta, where they use a deliberately harsh-sounding cybernetic voice on the inter-terminal shuttle trains. Hmmm, maybe it is time to look at the manual, and maybe even that disk...
Several limitations and problems became immediately obvious. The first was the text-to-speech algorithms handled words only. Numbers are simply spoken as a series of digits. Hence 91 becomes “nine one”, instead of “ninety-one”. This can be solved by some simple look-up tables and text substitution.
Second, while technically the device acts as a text-to-phonetic speech device, no special means, such as control or escape sequences, allow direct access to the phonetic elements and sounds the device can produce; the text-to-speech code hides them. This second limitation can be resolved by using alternate spellings, though not necessarily phonetic spellings, that saturate the internal algorithm toward different phonetic choices. A little experimentation was required to get a good idea of how the device actually translated text to speech.
Since extensive table substitution was now needed, I considered the next logical step; to develop a driver as a front end for the device. Ideally, any driver should be able to read straight text the way a person normally would. First, numbers should be pronounced as numbers and not as digits. Similarly, many common numeric constructs used in normal text—such as currency amounts, standard formatted date and time fields, percentages, telephone numbers, etc.—have pronunciation rules I wished to encapsulate and emulate properly. The Internet has its own idioms, like email@example.com, which should be pronounced as “x at y dot z”. I decided to cover all of these, as well as in-line text substitution for correct word pronunciation.
In the end, I decided on a server sitting on a TCP socket. The server would accept a connection from the user application on a known port and pronounce any text received according to a reasonable set of rules (as stated above). I added an escape mode to allow for spelling words out and single-digit announcement modes. I could establish a simple telnet session with the server, then test the device by typing text.
The TCP server offered another advantage. Only one application can be serviced by the device at a time—otherwise speech would be garbled together from multiple sources. The use of a TCP session assures that only one connection would be accepted by the server and kept active until closed by the client. Other client applications can block as backlog while waiting for the current application to finish talking. The simplicity offered by backlogging, over the use of lock files was the reason I chose to use a full server instead of a task initiated by inetd.
With the server in place, it was only a matter of time before speech synthesis would pervade other system services. The first use I made of the server was to monitor my BBS system. By connecting it to the user login quota manager, I could have the device announce as users logged in and out. Similarly, the traditional sysop page can be carried over this device.
Eventually I tied the SPO server into my implementation of the wall command and then created other utilities to provide verbal monitoring of my Internet server. Verbal monitoring would watch for and announce new e-mail for me, as well as basic system stats such as uptime and disk usage every hour. As all this speech can be annoying at night, I added a simple muting schedule to the server. Most curious and entertaining is my replacement for shutdown, called simply “down”.
For system monitoring, the speech device has proven to be quite a useful tool—not a nuisance. The server was developed for the ability to read written text and properly pronounce common usages and conventions, and while I use this capability minimally, others might have more occasion for it. The pronunciation dictionary can be expanded as needed to cover a wider range of words as they are identified in everyday use.
One use for the device which was suggested to me is as a screen reader for visually-impaired computer users. Another application I am looking at is in parking incoming phone calls and paging or announcing calls through the telephone system. I have often wished the board included a DTMF tone generator and a SLICK, so I may look at modifying the schematics provided.
The SPO-256-AL2 text-to-speech board described here may be purchased through B.G. Micro, P.O. Box 280298, Dallas, TX 75228 (214) 271-5546. The Computalker lists for around $50.00 (U.S.) as a PC card or $80.00 (U.S.) stand-alone with a power adapter. Chips are available separately, and I believe the Computalker may be purchased in kit form.
While the SPO is serial-based and can be used on almost any machine or OS, I originally obtained it for use on my main server, which runs Linux. For this reason, the speech server was developed and tested under Linux. The server was originally developed using libraries and part of the code base of my BBS package, so these are included as part of the published source. I am working on a more portable public source implementation that should be more easily and widely compatible to non-Linux systems as well. I must go now, as I am being paged...
David Sugar Best known for WorldVU, a public BBS system for Linux, he is currently employed as director of software engineering for Fortran Corp. and uses Linux for commercial telephony development. He maintains his own Internet server under Linux.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
|Fancy Tricks for Changing Numeric Base||May 29, 2016|
|Working with Command Arguments||May 28, 2016|
|Secure Desktops with Qubes: Installation||May 28, 2016|
|CentOS 6.8 Released||May 27, 2016|
|Secure Desktops with Qubes: Introduction||May 27, 2016|
|Chris Birchall's Re-Engineering Legacy Software (Manning Publications)||May 26, 2016|
- Tips for Optimizing Linux Memory Usage
- Secure Desktops with Qubes: Introduction
- Working with Command Arguments
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Secure Desktops with Qubes: Installation
- Fancy Tricks for Changing Numeric Base
- CentOS 6.8 Released
- Linux Mint 18
- The Italian Army Switches to LibreOffice
- Petros Koutoupis' RapidDisk
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide