Experimenting with Bison Just for the hoc of It

by Jack Dennon

The hoc interpreter was created by Brian Kernighan and Rob Pike and published in their 1984 book The UNIX Programming Environment. Their hoc project demonstrated the strategy of starting first with a small test program and then adding capabilities step-by-step to grow the program into a useful tool.

Using Steve Johnson's famous yacc (yet another compiler-compiler) parser-generator, the authors present development details and program sources for their high order calculator, implementing an interpreted C-like programming language suited to arithmetic applications.

The book presents six stages of step-by-step development. Stage 1 uses yacc to generate the parser for a four-function calculator. Stage 2 adds single-letter variables and error recovery. Stage 3 introduces arbitrary variable names and built-in functions such as sqrt(), cos() and so on.

In the next development step, Stage 4, the interpreter undergoes a major enhancement. It becomes a compiler, the output of which is threaded code in the form of a stored-program "machine" that can be executed. Stage 5 adds control flow and relational operators, so that the stored program can have loops and conditional branching. Stage 6, the final development stage presented in the book, provides recursively callable functions and procedures and adds the ability to print character strings and read values from stdin or from a named file.

In this article we continue development, presenting a stage 7 version of hoc designed for i386-family IBM-compatible PCs running GNU/Linux. The features added provide capabilities tailored mainly to scientific and engineering applications. As in the original hoc, our numeric data type is the double, augmented now with arrays of doubles, strings and arrays of strings.

For experimental purposes we have a home-brew device driver module for the speaker. SVGAlib is used for screen graphics, which is yet another reason that the program runs with the set-user-ID bit set. Since our goal is to have fun while learning about Bison, the PC and device driver modules, hoc7 is best run on a computer dedicated to a single-user, where we have root-access and can blame only ourselves for any havoc created while experimenting.

Although we speak here mainly to further development, yet for a sizeable range of applications the present version of hoc7 is usable out of the box, so to speak, as an alternative to BASIC. The code, of course, is GPLed. So, recalling those times when the command line was king, if "you pine for the days when men were men and wrote there own device drivers", and you prefer to arrange things to suit yourself, this project may offer those opportunities.

Installation

Download the file hoc7n.tgz from www.seasurf.com/~jdennon, create a directory on your system for the source with the command mkdir hoc7, for example, and then copy the compressed file hoc7n.tgz into that directory. Go to that directory and decompress the files with the command tar -xvzf hoc7n.tgz.

Go to the </dev> directory, promote yourself to superuser with the command su and type in root's password.

The hoc7 programs that use the speaker driver expect their device file to be in the /dev directory, so put it there with mknod spkr c 13 0. Then restore your normal user status with exit.

Now go to the directory that contains the device driver module for your version of the Linux kernel. For example, for Linux 2.4, use the command cd drivers2_4. Compile the speaker device driver module with the command mcomp.sh spkdrv.

Promote yourself to superuser with the command su, and type in root's password. Install the device driver with the command insmod spkdrv.o, then restore normal user status and return to the hoc sources directory with the command cd ..

Now create the hoc7 executable with the commands

   
        make clean
        make

The hoc7 executable needs to be "set user ID", so promote yourself to superuser and then use the command ./enable.sh hoc7 to change the file's owner to root and mark it set-user-ID.

Testing, Testing

To test the hoc7 interpreter, go to the examples directory with the command cd examples and run, for example, the following stirl program that calculates approximate factorial numbers using Stirling's approximation:

        '  stirling's factorial approximation formula
        '  provides a fast approximation to factorial(x)
        '  that gets better as x becomes larger.
        '
        func stirl() {
                return (sqrt(2*$1*PI) * ($1/E)^$1*(1 + 1/(12*$1)))
        }
        '
        '  calculate factorial(x)
        func fac() {
                if ($1 <= 0) { return (1) }
                return ($1 * fac($1-1))
        }
        '
        ' display the ratio of factorial(x) to stirling's approximation of
        ' the factorial, for x from 1 to 10.
        '
                print "\n Compare Stirling's approximation with the\n"
                print " factorials that it approximates:\n\n"
                i = 0
                print "        i    ratio: factorial(i)/stirling(i)\n"
                while ((i=i+1) <= 10) {
                        print " i = %4.0f ",i
                        ratio = fac(i)/stirl(i)
                        print "   ratio  = %12.9f",ratio
                        print "\n"
                }

You can execute this program with the command ../hoc7 stirl. You should get a response that looks like this:

         Compare Stirling's approximation with the
         factorials that it approximates:
                i    ratio: factorial(i)/stirling(i)
         i =    1    ratio  =  1.001019278
         i =    2    ratio  =  1.000518836
         i =    3    ratio  =  1.000278990
         i =    4    ratio  =  1.000171400
         i =    5    ratio  =  1.000115396
         i =    6    ratio  =  1.000082810
         i =    7    ratio  =  1.000062254
         i =    8    ratio  =  1.000048479
         i =    9    ratio  =  1.000038808
         i =   10    ratio  =  1.000031761
Speaker Test

To test the speaker driver, execute the testplay program with the command ../hoc7 testplay. The program should sound musical notes on the PC speaker.

The hoc7 executable could, of course, be placed in your local bin directory, for example, and your shell path set so that the command hoc7 testplay would work as expected. For now, we assume that you are running hoc7 just where make has left it, in the hoc7n directory.

Mouse via SVGAlib

Mouse drivers borrowed from SVGAlib access the device via the symbolic link /dev/mouse. To check your existing setup, use the command ls -l /dev/mouse. To point this symbolic link to a serial mouse on ttyS0, known also as COM1, promote yourself to superuser, then use the command ln -s /dev/ttyS0 /dev/mouse. Restore normal user status with the command, then go to the examples directory and test access to the serial mouse on COM1 with the command ../hoc7 mouse1.

Use control-c to exit from the mouse program. The program mouse2 runs the same test for a serial mouse connected to COM2.

PS/2 Mouse

The SVGAlib on my Slackware Linux 2.2 and 2.4 systems can access a PS/2 type mouse. For a PS/2 type mouse, use the command ln -s /dev/psaux /dev/mouse to create the required symbolic link in your /dev directory. To test the PS/2 type mouse use the command ../hoc7 mouseps2.

Setup information for SVGAlib mouse drivers is in the file /etc/vga/libvga.config. If your mouse responds incorrectly, it may be that the gpm (general purpose mouse) dæmon is running. If so, you will need to promote yourself to superuser and tell the dæmon to drop off. The README in the examples directory shows how to do that.

A Circuit Design Application

Selection of pull-up resistor R1, timing resistor R2 and timing capacitor C1, for the 555-based oscillator circuit shown in Figure 1, is the purpose of cfg555, callable with the command ../hoc7 cfg555 -.

Figure 1. 555-based Oscillator Circuit

The program calculates any one of R1, R2, C1 or the frequency of oscillation, given the other three quantities. The program cfg555sr does the same and also looks up the nearest standard 1% metal film resistor values, such as Mouser catalog number 271-52.3K.

Graphics

For a rudimentary test of SVGAlib screen graphics, try the command ../hoc7 grfplot -. It should plot the letter V as two lines of pixels. Use control-C to exit from grfplot. The final dash (-) is necessary, otherwise the program will not pause for the keyboard. If you get a diagnostic message saying svgalib: Cannot open /dev/console., it probably means your hoc7 executable needs to have the SUID bit set, as described earlier. That bit enables hoc7 to gain access to the console device.

An Aerodynamics Application

For a more ambitious graphics example, try the command ../hoc7 joncoh joncohdata -. This program transforms perfect-fluid flow past a circle into the corresponding flow past an airfoil, using the method of Robert T. Jones and Doris Cohen for calculating the Joukowski conformal transformation.

Serial Port Access to Weeder Technology I/O Boards

Weeder Technology I/O boards can be plugged into the serial port of a PC to provide low cost data acquisition and process control. The boards are addressable in such a fashion that up to 32 of these boards can be series-connected to a single RS-232 computer port.

Boards available provide digital I/O, analog input, analog output, stepper motor control and a counter/timer. Also available is a board for connecting and addressing a third-party RS-232 device. Here is a hoc7 test program that reads the Weeder WTDIO digital I/O board:

        ' wtdioread - read Weeder Technology WTDIO module.
        ' The Weeder WTDIO board is connected to ttyS1 (COM2).
        ' Reads all pins of the 14-bit WTDIO input port.
        fd = open("com2","9600,n,8,1")
        
        cmd$ = "AR\r"           'cmd: read all pins of card A
        write(fd,cmd$)
        a$ = readline(fd,128)   'read echo'd card address and data
        print "%s\n",a$
        close(fd)
        stop(0)

When called with the command ../hoc7 wtdioread, the LED on the WTDIO board should flash and the response at the console will be a line of data such as

        A3FFE

The letter A is the WTDIO address set on the board's dipswitches. The hex value 3FFE exhibits 14 bits of data read from the wire terminal strip on the WTDIO board. In this example, the data 3FFE was created by leaving all inputs open except for the lowest order bit which was connected to ground.

Interactive Execution

When called without arguments, the interpreter will read the keyboard and execute your program statements one line at a time. Sometimes this is useful. For example, if you want to know the value of the built-in constant named DEG, call up hoc7 without arguments and then type in the statement print DEG. The program should respond 57.29578.

The reserved word DEG is a built-in constant that states the number of degrees in one radian. You can exit hoc7 with either control-d or control-c.

Data Types in hoc7

The hoc7 interpreter recognizes four types of data. The numeric type is the double. We have scalar doubles and vector doubles, that is, single-dimensioned arrays. An array of doubles is declared with a statement such as vector x[100].

The interpreter also has string scalars and vectors. A string scalar can be created with a$ = "Now is the time". Vectors of strings are also supported. A vector of string pointers is declared with a statement such as vector a$[10], and it can be assigned to with statements such as a$[3] = "for all good men".

Error Recovery

Compile-time parse errors print a line number that points to where the error was detected. Execution-time error handling is even less verbose. Some run-time errors truly put your debugging skills to the test by bombing off without comment. You do sometimes see cross words from the kernel. A good strategy is to expand slowly a hoc7 program that is known to execute correctly, the very strategy Kernighan and Pike demonstrated at the outset.

Stops on First Parse Error

Back in the batch processing days, a compiler bailing out on the first syntax error was considered poor form. You were supposed to keep going to find as many errors as possible before giving up. In an interactive environment, however, that approach is less appropriate. Where we have quick turn-around for edits and recompiles, it is perhaps better to correct the first error encountered and then try again. Correcting the first error is certainly the better strategy where subsequent errors depend on, or are caused by, the first error.

Use Your Own Editor

You must, of course, create and correct your hoc7 programs with your own text editor. Separate text editing is now considered old-fashioned, yet most text editors do display the number of the current text line, something true of my favorite editor as described in the book Build Your Own LINUX C Toolbox. Given the line number of the error, you can usually home in quickly on the place that needs correction. A somewhat integrated development environment could be created by passing the line number to the text editor, an exercise perhaps for another day.

Syntax Quirks

Do check the example programs and note the style used in placing braces. Although it was not by design intent, the parser does seem to have some ideas about style. For example, the arctan test program shows the style of if-else statements that works. Note also how the unary minus sign must be enclosed in parentheses. That requirement was introduced on purpose to get rid of the last of the shift-reduce conflicts.

Speaking of shift-reduce conflicts brings us the subject of Bison, our parser-generator that is the GNU equivalent of yacc. We use Bison to create our parser, then we compile the parser along with other run-time code to create the hoc7 interpreter.

Bison

So what is a "shift-reduce conflict?" For the short answer, it is the result of an ambiguity in the grammatical specification of the language. For a longer answer, see A Bison Tutorial.

Add Yet Another Feature

To provide an example of how to go about adding a feature to hoc7, let us add a qsort verb such that hoc7 can be used to sort an array of numbers into ascending order with a statement such as qsort(n,x), where n gives the number of values we have in the array x . To define the syntax of this statement, make the following modifications to the hoc.y grammar file.

To the Bison declarations, add %token <sym> QSORT, and then to the existing list of "statement" definitions, add this rule to define the grammar of the qsort statement:

        stmt:   QSORT '(' expr ',' VEC ')'
        {
                mcode((Inst)varpush,"varpush");
                mcode((Inst)$5,"->vector");
                mcode((Inst)sortcode,"sortcode");
        }

The mcode() function appends an instruction to the threaded code. The first mcode argument is the address of the code that performs the instruction; the second argument is a string containing the name of the instruction. The string identifies in readable fashion the instruction as printed in an optional listing of the treaded code machine that can be created by setting PRINT_MACHINE nonzero in the hoc.h header file. This feature can be used to help debug the compiler.

To make the terminal symbol QSORT known to yylex(), our lexical analyzer, edit the file init.c to add

        "qsort",        QSORT,

to the list of keywords.

The sortcode instruction for the threaded code machine is implemented by the following function that we can place in the code.c source file.

/*-----------------------------------------------------------------*/
void sortcode(void)     /* qsort(n, values) */
{
        Datum d1, d2;
        int n;
        double *dptr;
        
        d1 = pop();     /* values */
        d2 = pop();     /* n */
        
        n = (int)d2.val;        /* number of values in array */
        dptr = (double *)d1.sym->u.vec;
        quicksort(dptr, n);
}
/*-----------------------------------------------------------------*/

Also, add the function prototype

        extern void sortcode(void);     /* sort(n, values) */

to the header file hoc.h.

Quicksort

To create the quicksort() function that is used by sortcode, call your text editor with a command such as able sort.c, and type in the text below that has been copied almost verbatim from page 33 of Kernighan and Pike's The Practice of Programming.

        /*-------------------------------------------------------------*/
        /* sort.c - quicksort doubles for hoc7 */
        #include "hoc.h"
        void swap(double v[], int i, int j);
        void quicksort(double v[], int n)       /* sort n values in v[] */
        {
                int i, last;
        
                if ( n <= 1 )
                        return;                 /* nothing to do */
                
                swap(v, 0, rand()%n);           /* move pivot to v[0] */
                last = 0;
                for (i = 1; i < n; i++)         /* partition */
                        if ( v[i] < v[0] )
                                swap(v, ++last, i);
                        
                swap(v, 0, last);       /* restore pivot */
                quicksort(v, last);     /* sort left subarray */
                quicksort(v+last+1, n-last-1);  /* sort right subarray */
        }
        /* swap: interchange v[i] and v[j] */
        void swap(double v[], int i, int j)
        {
                double temp;
        
                temp = v[i];
                v[i] = v[j];
                v[j] = temp;
        }
        /*-------------------------------------------------------------*/

Modify the hoc7 Makefile to have this new additional source file compiled and linked. Call your editor with a command such as able Makefile, add sort.o to the OBJS macro and add the same object file to the list of targets, like so:

sort.o          : hoc.h

Your Makefile should now look like this:

        # Makefile for hoc7.
        OBJS = hoc7.o main.o yylex.o code.o init.o math.o symbol.o \
                 graphics.o sound.o comport.o morse.o sort.o
        #CFLAGS = -g -O2
        CC = gcc
        hoc7 : $(OBJS)
                gcc -o hoc7 $(OBJS) -lvgagl -lvga -lm
        
        
        hoc7.o          : hoc7.c hoc.h
        hoc7.c          : hoc.y  hoc.h
                                bison -v -d hoc.y
                                cp hoc.tab.c hoc7.c
        main.o          : hoc.h
        yylex.o         : hoc.y hoc.tab.h
        code.o          : hoc.h hoc.tab.h
        init.o          : hoc.h
        math.o          : hoc.h
        symbol.o        : hoc.h
        graphics.o      : hoc.h
        sound.o         : hoc.h
        comport.o       : hoc.h
        morse.o         : hoc.h
        sort.o          : hoc.h

Your new hoc7 interpreter can now be compiled and linked with the command make. To test the new gsort verb, call your editor with a command such as able testsort, and type in the text

        'testsort
        vector a[10]
        a[0] = 7
        a[1] = 2
        a[2] = 9
        a[3] = 4
        a[4] = 3
        a[5] = 6
        qsort(6,a)
        for (i = 0; i < 6; i++) { print a[i] }

When executed with the command hoc7 testsort, this program defines an array of numbers, sorts the array into ascending order and then prints the sorted array. The result should be

2 3 4 6 7 9
Goto Strikes Again

A simple testgoto program worked correctly, but the more complex gotofails program revealed a problem with the goto statement. The function gotocode() implements the goto statement. When used, it clashed with the interpreter's execute() function.

In a simple context the goto worked as expected: control transferred to the labeled location. However, when called from within an if-statement embedded in a for-loop, the code appeared to violate a design premise of the execute() function. The test program gotofails in the examples directory shows what happens when goto is used inside an if-statement, which is itself inside a for-loop. It seemed that you could jump out of the for-loop, but you could not jump to an another location within the for-loop.

The jump itself worked--in the sense that it transferred control to the labeled location--but then later, when control reached the bottom of the for-loop, the return location saved on the stack was unearthed as we unrolled the recursive calls to execute(). This caused control to continue from the next instruction after the goto. In other words, the stack caused the goto to be treated like a call.

A global flag named goto_active has been defined in code.c to fix this problem in the ifcode() and gotocode() functions. Time will tell whether this is a real fix or just an invitation to yet another problem.

Debugging

To debug the interpreter, you can cause the threaded code and also the symbol table, to be dumped to stdout. To cause the symbol table to be dumped, set PRINT_SYM_TAB nonzero in the header file hoc.h and then recompile. To create a dump of the threaded code, set PRINT_MACHINE nonzero. For the input string

        16 + 5 * 10

dumping the threaded code produced

      0   805f08c    804ca00  constpush
      1   805f090    8065ce8  16.000000
      2   805f094    804ca00  constpush
      3   805f098    8065d30  5.000000
      4   805f09c    804ca00  constpush
      5   805f0a0    8065d78  10.000000
      6   805f0a4    804ccd0  mul
      7   805f0a8    804cc30  add
      8   805f0ac    804bd10  discard
      9   805f0b0          0  stop

Line 0, for example, states that the contents of location 805f08c is the value 804ca00, which points to the constpush threaded code instruction. The constpush instruction expects to find its data via the next machine word, so the next location, at line 1, points to the value 16 that yylex(), our lexical analyzer, will have placed in the symbol table.

If PRINT_THREAD is defined nonzero in hoc.h, then as the threaded code is executed it will print a trace such as this example from the one-line program above.

   805f08c    804ca00   constpush
   805f094    804ca00   constpush
   805f09c    804ca00   constpush
   805f0a4    804ccd0   mul
   805f0a8    804cc30   add
   805f0ac    804bd10   discard
Bison Trace

Bison provides a trace feature that prints to stderr a blow-by-blow account of the parsing operations used to translate the given grammar rules into a parse table. The trace can be activated by setting the global symbol yydebug nonzero in the main() routine, located in the source file main.c. The trace that is produced details each step of the parsing operation, somewhat as we did in tracing the grammar of the two-function calculator in our Bison tutorial.

There we followed the action by repeatedly referring to the parse table. The parse table Bison created for hoc7 is listed in the file hoc7.output. The first portion of that file summarizes how the rules of precedence were used by Bison to resolve grammar conflicts.

Summary

Like the BASIC programming language, the hoc language can be used successfully after studying a few example programs and, like BASIC, it offers the possibility of obtaining usable results with a minimal amount of scaffolding. Unlike BASIC, however, the hoc language is rather easy to modify to suit yourself, the point this article has sought to demonstrate.

Sidelight on Threaded Code

A compiler that generates threaded code is rather closely related to a "real" compiler that generates machine code. A threaded code compiler is indeed the front-end of a real compiler. During work preliminary to his creation of UNIX, Ken Thompson wrote the B compiler for his cast-away PDP-7 generated threaded code. In a small tour de force, Dennis Ritchie used that compiler to create a genuine compiler that ran on Ken's PDP-7 and translated source programs written in B into machine instructions for the GE-635. Threaded code was thus involved in bootstrapping both UNIX and the C programming language.

Acknowledgement

When Alan Cox replied to my e-mailed question, his answer revealed the blunder that kept stalling all my attempts to port the device driver to Linux 2.4.

Resources

The hoc7 interpreter sources are in the compressed file hoc7n.tgz that can be downloaded from www.seasurf.com/~jdennon.

A detailed description of the original hoc interpreter can be found in the book The UNIX Programming Environment by Brian W. Kernighan and Rob Pike (1984. Prentice-Hall, ISBN 0-13-937681-X). This book is still in print and is available at Amazon.

Details on using Bison, including instructive example applications, can be found in the book The Bison Manual, by Charles Donnelly and Richard M. Stallman (1995. The Free Software Foundation, ISBN 1-882114-45-0).

The printed version of this book can be ordered on the GNU.org web site, or the book can be read on-line.

For heavy-duty compiler theory, find a copy of Principles of Compiler Design, by Alfred V. Aho and Jeffrey D. Ullman (1977. Addison-Wesley, ISBN 0-201-00022-9). This book is out of print, but Amazon does list some used copies.

A good source of information on installing and using SVGAlib is the book Linux Graphics Programming with SVGAlib, by Jay Link (2000. The Coriolis Group, ISBN 1-57610-524-5). Jay's book also provides details on using the SVGAlib routines that access the mouse.

Copying

The original hoc code published by Kernighan and Pike carries Bell Labs' copyright of 1984. Conditions on copying the additions that create hoc7 are detailed in the General Public License (GPL), published by the Free Software Foundation and referenced in the COPYING file included with the sources.

Jack Dennon studied mechanical engineering at Oregon State University in Corvallis, mayhem at the Infantry School at Fort Benning, low flying at the Army Aviation School at Fort Rucker. Nowadays he uses DOS to maintain legacy sawmill systems, studies how to replace them with GNU/Linux and helps his wife homeschool their four children.

Load Disqus comments