Quantcast
Username/Email:  Password: 

Experimenting with Bison Just for the hoc of It

A stage 7 version of hoc designed for running GNU/Linux and suitable for scientific and engineering applications.

The hoc interpreter was created by Brian
Kernighan and Rob Pike and published in their 1984 book
The UNIX Programming Environment. Their hoc
project demonstrated the strategy of starting first with a small
test program and then adding capabilities step-by-step to grow the
program into a useful tool.Using Steve Johnson's famous yacc (yet another
compiler-compiler) parser-generator, the authors present
development details and program sources for their high order
calculator, implementing an interpreted C-like programming language
suited to arithmetic applications.The book presents six stages of step-by-step development.
Stage 1 uses yacc to generate the parser for a four-function
calculator. Stage 2 adds single-letter variables and error
recovery. Stage 3 introduces arbitrary variable names and built-in
functions such as sqrt(), cos() and so on.In the next development step, Stage 4, the interpreter
undergoes a major enhancement. It becomes a compiler, the output of
which is threaded code in the form of a stored-program "machine"
that can be executed. Stage 5 adds control flow and relational
operators, so that the stored program can have loops and
conditional branching. Stage 6, the final development stage
presented in the book, provides recursively callable functions and
procedures and adds the ability to print character strings and read
values from stdin or from a named file.In this article we continue development, presenting a stage 7
version of hoc designed for i386-family IBM-compatible PCs running
GNU/Linux. The features added provide capabilities tailored mainly
to scientific and engineering applications. As in the original hoc,
our numeric data type is the double, augmented now with arrays of
doubles, strings and arrays of strings.For experimental purposes we have a home-brew device driver
module for the speaker. SVGAlib is used for screen graphics, which
is yet another reason that the program runs with the set-user-ID
bit set. Since our goal is to have fun while learning about Bison,
the PC and device driver modules, hoc7 is best run on a computer
dedicated to a single-user, where we have root-access and can blame
only ourselves for any havoc created while experimenting.Although we speak here mainly to further development, yet for
a sizeable range of applications the present version of hoc7 is
usable out of the box, so to speak, as an alternative to BASIC. The
code, of course, is GPLed. So, recalling those times when the
command line was king, if "you pine for the days when men were men
and wrote there own device drivers", and you prefer to arrange
things to suit yourself, this project may offer those
opportunities.InstallationDownload the file hoc7n.tgz from
www.seasurf.com/~jdennon,
create a directory on your system for the source with the command
mkdir hoc7, for example, and then copy the
compressed file hoc7n.tgz into that directory. Go to that directory
and decompress the files with the command tar -xvzf
hoc7n.tgz
.Go to the </dev> directory, promote yourself to
superuser with the command su and type in root's
password.The hoc7 programs that use the speaker driver expect their
device file to be in the /dev directory, so put it there with
mknod spkr c 13 0. Then restore your normal user
status with exit.Now go to the directory that contains the device driver
module for your version of the Linux kernel. For example, for Linux
2.4, use the command cd drivers2_4. Compile the
speaker device driver module with the command mcomp.sh
spkdrv
.Promote yourself to superuser with the command
su, and type in root's password. Install the
device driver with the command insmod spkdrv.o,
then restore normal user status and return to the hoc sources
directory with the command cd ..Now create the hoc7 executable with the commands

   
        make clean
        make

The hoc7 executable needs to be "set user ID", so promote
yourself to superuser and then use the command ./enable.sh
hoc7
to change the file's owner to root and mark it
set-user-ID.Testing, TestingTo test the hoc7 interpreter, go to the examples directory
with the command cd examples and run, for
example, the following stirl program that calculates approximate
factorial numbers using Stirling's approximation:

        '  stirling's factorial approximation formula
        '  provides a fast approximation to factorial(x)
        '  that gets better as x becomes larger.
        '
        func stirl() {
                return (sqrt(2*$1*PI) * ($1/E)^$1*(1 + 1/(12*$1)))
        }
        '
        '  calculate factorial(x)
        func fac() {
                if ($1 <= 0) { return (1) }
                return ($1 * fac($1-1))
        }
        '
        ' display the ratio of factorial(x) to stirling's approximation of
        ' the factorial, for x from 1 to 10.
        '
                print "\n Compare Stirling's approximation with the\n"
                print " factorials that it approximates:\n\n"
                i = 0
                print "        i    ratio: factorial(i)/stirling(i)\n"
                while ((i=i+1) <= 10) {
                        print " i = %4.0f ",i
                        ratio = fac(i)/stirl(i)
                        print "   ratio  = %12.9f",ratio
                        print "\n"
                }

You can execute this program with the command
../hoc7 stirl. You should get a response that
looks like this:

         Compare Stirling's approximation with the
         factorials that it approximates:
                i    ratio: factorial(i)/stirling(i)
         i =    1    ratio  =  1.001019278
         i =    2    ratio  =  1.000518836
         i =    3    ratio  =  1.000278990
         i =    4    ratio  =  1.000171400
         i =    5    ratio  =  1.000115396
         i =    6    ratio  =  1.000082810
         i =    7    ratio  =  1.000062254
         i =    8    ratio  =  1.000048479
         i =    9    ratio  =  1.000038808
         i =   10    ratio  =  1.000031761

Speaker TestTo test the speaker driver, execute the testplay program with
the command ../hoc7 testplay. The program should
sound musical notes on the PC speaker.The hoc7 executable could, of course, be placed in your local
bin directory, for example, and your shell path set so that the
command hoc7 testplay would work as expected.
For now, we assume that you are running hoc7 just where make has
left it, in the hoc7n directory.Mouse via SVGAlibMouse drivers borrowed from SVGAlib access the device via the
symbolic link /dev/mouse. To check your existing setup, use the
command ls -l /dev/mouse. To point this symbolic
link to a serial mouse on ttyS0, known also as COM1, promote
yourself to superuser, then use the command ln -s
/dev/ttyS0 /dev/mouse
. Restore normal user status with
the command, then go to the examples directory and test access to
the serial mouse on COM1 with the command ../hoc7
mouse1
.Use control-c to exit from the mouse program. The program
mouse2 runs the same test for a serial mouse connected to
COM2.PS/2 MouseThe SVGAlib on my Slackware Linux 2.2 and 2.4 systems can
access a PS/2 type mouse. For a PS/2 type mouse, use the command
ln -s /dev/psaux /dev/mouse to create the
required symbolic link in your /dev directory. To test the PS/2
type mouse use the command ../hoc7
mouseps2
.Setup information for SVGAlib mouse drivers is in the file
/etc/vga/libvga.config. If your mouse responds incorrectly, it may
be that the gpm (general purpose mouse) dæmon is running. If
so, you will need to promote yourself to superuser and tell the
dæmon to drop off. The README in the examples directory shows
how to do that.A Circuit Design ApplicationSelection of pull-up resistor R1, timing resistor R2 and
timing capacitor C1, for the 555-based oscillator circuit shown in
Figure 1, is the purpose of cfg555, callable with the command
../hoc7 cfg555 -.
Figure 1. 555-based Oscillator Circuit
The program calculates any one of R1, R2, C1 or the frequency
of oscillation, given the other three quantities. The program
cfg555sr does the same and also looks up the nearest standard 1%
metal film resistor values, such as Mouser catalog number
271-52.3K.GraphicsFor a rudimentary test of SVGAlib screen graphics, try the
command ../hoc7 grfplot -. It should plot the
letter V as two lines of pixels. Use control-C to exit from
grfplot. The final dash (-) is necessary, otherwise the program
will not pause for the keyboard. If you get a diagnostic message
saying svgalib: Cannot open /dev/console., it
probably means your hoc7 executable needs to have the SUID bit set,
as described earlier. That bit enables hoc7 to gain access to the
console device.An Aerodynamics ApplicationFor a more ambitious graphics example, try the command
../hoc7 joncoh joncohdata -. This program
transforms perfect-fluid flow past a circle into the corresponding
flow past an airfoil, using the method of Robert T. Jones and Doris
Cohen for calculating the Joukowski conformal
transformation.Serial Port Access to Weeder Technology I/O
BoardsWeeder Technology I/O boards can be plugged into the serial
port of a PC to provide low cost data acquisition and process
control. The boards are addressable in such a fashion that up to 32
of these boards can be series-connected to a single RS-232 computer
port.Boards available provide digital I/O, analog input, analog
output, stepper motor control and a counter/timer. Also available
is a board for connecting and addressing a third-party RS-232
device. Here is a hoc7 test program that reads the Weeder WTDIO
digital I/O board:

        ' wtdioread - read Weeder Technology WTDIO module.
        ' The Weeder WTDIO board is connected to ttyS1 (COM2).
        ' Reads all pins of the 14-bit WTDIO input port.
        fd = open("com2","9600,n,8,1")
        
        cmd$ = "AR\r"           'cmd: read all pins of card A
        write(fd,cmd$)
        a$ = readline(fd,128)   'read echo'd card address and data
        print "%s\n",a$
        close(fd)
        stop(0)

When called with the command ../hoc7
wtdioread
, the LED on the WTDIO board should flash and
the response at the console will be a line of data such as

        A3FFE

The letter A is the WTDIO address set on the board's
dipswitches. The hex value 3FFE exhibits 14 bits of data read from
the wire terminal strip on the WTDIO board. In this example, the
data 3FFE was created by leaving all inputs open except for the
lowest order bit which was connected to ground.Interactive ExecutionWhen called without arguments, the interpreter will read the
keyboard and execute your program statements one line at a time.
Sometimes this is useful. For example, if you want to know the
value of the built-in constant named DEG, call up hoc7 without
arguments and then type in the statement print
DEG
. The program should respond
57.29578.The reserved word DEG is a built-in constant that states the
number of degrees in one radian. You can exit hoc7 with either
control-d or control-c.Data Types in hoc7The hoc7 interpreter recognizes four types of data. The
numeric type is the double. We have scalar doubles and vector
doubles, that is, single-dimensioned arrays. An array of doubles is
declared with a statement such as vector
x[100]
.The interpreter also has string scalars and vectors. A string
scalar can be created with a$ = "Now is the
time"
. Vectors of strings are also supported. A vector of
string pointers is declared with a statement such as
vector a$[10], and it can be assigned to with
statements such as a$[3] = "for all good
men"
.Error RecoveryCompile-time parse errors print a line number that points to
where the error was detected. Execution-time error handling is even
less verbose. Some run-time errors truly put your debugging skills
to the test by bombing off without comment. You do sometimes see
cross words from the kernel. A good strategy is to expand slowly a
hoc7 program that is known to execute correctly, the very strategy
Kernighan and Pike demonstrated at the outset.Stops on First Parse ErrorBack in the batch processing days, a compiler bailing out on
the first syntax error was considered poor form. You were supposed
to keep going to find as many errors as possible before giving up.
In an interactive environment, however, that approach is less
appropriate. Where we have quick turn-around for edits and
recompiles, it is perhaps better to correct the first error
encountered and then try again. Correcting the first error is
certainly the better strategy where subsequent errors depend on, or
are caused by, the first error.Use Your Own EditorYou must, of course, create and correct your hoc7 programs
with your own text editor. Separate text editing is now considered
old-fashioned, yet most text editors do display the number of the
current text line, something true of my favorite editor as
described in the book Build Your Own LINUX C
Toolbox
. Given the line number of the error, you can
usually home in quickly on the place that needs correction. A
somewhat integrated development environment could be created by
passing the line number to the text editor, an exercise perhaps for
another day.Syntax QuirksDo check the example programs and note the style used in
placing braces. Although it was not by design intent, the parser
does seem to have some ideas about style. For example, the arctan
test program shows the style of if-else statements that works. Note
also how the unary minus sign must be enclosed in parentheses. That
requirement was introduced on purpose to get rid of the last of the
shift-reduce conflicts.Speaking of shift-reduce conflicts brings us the subject of
Bison, our parser-generator that is the GNU equivalent of yacc. We
use Bison to create our parser, then we compile the parser along
with other run-time code to create the hoc7 interpreter.BisonSo what is a "shift-reduce conflict?" For the short answer,
it is the result of an ambiguity in the grammatical specification
of the language. For a longer answer, see
A Bison Tutorial.Add Yet Another FeatureTo provide an example of how to go about adding a feature to
hoc7, let us add a qsort verb such that hoc7 can be used to sort an
array of numbers into ascending order with a statement such as
qsort(n,x), where
n gives the number of values we have in the
array x . To define the syntax of this
statement, make the following modifications to the hoc.y grammar
file.To the Bison declarations, add %token <sym>
QSORT
, and then to the existing list of "statement"
definitions, add this rule to define the grammar of the qsort
statement:

        stmt:   QSORT '(' expr ',' VEC ')'
        {
                mcode((Inst)varpush,"varpush");
                mcode((Inst)$5,"->vector");
                mcode((Inst)sortcode,"sortcode");
        }

The mcode() function appends an instruction to the threaded
code. The first mcode argument is the address of the code that
performs the instruction; the second argument is a string
containing the name of the instruction. The string identifies in
readable fashion the instruction as printed in an optional listing
of the treaded code machine that can be created by setting
PRINT_MACHINE nonzero in the hoc.h header file. This feature can be
used to help debug the compiler.To make the terminal symbol QSORT known to yylex(), our
lexical analyzer, edit the file init.c to add

        "qsort",        QSORT,

to the list of keywords.The sortcode instruction for the threaded code machine is
implemented by the following function that we can place in the
code.c source file.

/*-----------------------------------------------------------------*/
void sortcode(void)     /* qsort(n, values) */
{
        Datum d1, d2;
        int n;
        double *dptr;
        
        d1 = pop();     /* values */
        d2 = pop();     /* n */
        
        n = (int)d2.val;        /* number of values in array */
        dptr = (double *)d1.sym->u.vec;
        quicksort(dptr, n);
}
/*-----------------------------------------------------------------*/

Also, add the function prototype

        extern void sortcode(void);     /* sort(n, values) */

to the header file hoc.h.
QuicksortTo create the quicksort() function that is used by sortcode,
call your text editor with a command such as able
sort.c
, and type in the text below that has been copied
almost verbatim from page 33 of Kernighan and Pike's The
Practice of Programming
.

        /*-------------------------------------------------------------*/
        /* sort.c - quicksort doubles for hoc7 */
        #include "hoc.h"
        void swap(double v[], int i, int j);
        void quicksort(double v[], int n)       /* sort n values in v[] */
        {
                int i, last;
        
                if ( n <= 1 )
                        return;                 /* nothing to do */
                
                swap(v, 0, rand()%n);           /* move pivot to v[0] */
                last = 0;
                for (i = 1; i < n; i++)         /* partition */
                        if ( v[i] < v[0] )
                                swap(v, ++last, i);
                        
                swap(v, 0, last);       /* restore pivot */
                quicksort(v, last);     /* sort left subarray */
                quicksort(v+last+1, n-last-1);  /* sort right subarray */
        }
        /* swap: interchange v[i] and v[j] */
        void swap(double v[], int i, int j)
        {
                double temp;
        
                temp = v[i];
                v[i] = v[j];
                v[j] = temp;
        }
        /*-------------------------------------------------------------*/

Modify the hoc7 Makefile to have this new additional source
file compiled and linked. Call your editor with a command such as
able Makefile, add sort.o to the OBJS macro and
add the same object file to the list of targets, like so:

sort.o          : hoc.h

Your Makefile should now look like this:

        # Makefile for hoc7.
        OBJS = hoc7.o main.o yylex.o code.o init.o math.o symbol.o \
                 graphics.o sound.o comport.o morse.o sort.o
        #CFLAGS = -g -O2
        CC = gcc
        hoc7 : $(OBJS)
                gcc -o hoc7 $(OBJS) -lvgagl -lvga -lm
        
        
        hoc7.o          : hoc7.c hoc.h
        hoc7.c          : hoc.y  hoc.h
                                bison -v -d hoc.y
                                cp hoc.tab.c hoc7.c
        main.o          : hoc.h
        yylex.o         : hoc.y hoc.tab.h
        code.o          : hoc.h hoc.tab.h
        init.o          : hoc.h
        math.o          : hoc.h
        symbol.o        : hoc.h
        graphics.o      : hoc.h
        sound.o         : hoc.h
        comport.o       : hoc.h
        morse.o         : hoc.h
        sort.o          : hoc.h

Your new hoc7 interpreter can now be compiled and linked with
the command make. To test the new gsort verb,
call your editor with a command such as able
testsort
, and type in the text

        'testsort
        vector a[10]
        a[0] = 7
        a[1] = 2
        a[2] = 9
        a[3] = 4
        a[4] = 3
        a[5] = 6
        qsort(6,a)
        for (i = 0; i < 6; i++) { print a[i] }

When executed with the command hoc7
testsort
, this program defines an array of numbers, sorts
the array into ascending order and then prints the sorted array.
The result should be

2 3 4 6 7 9

Goto Strikes AgainA simple testgoto program worked correctly, but the more
complex gotofails program revealed a problem with the goto
statement. The function gotocode() implements the goto statement.
When used, it clashed with the interpreter's execute()
function.In a simple context the goto worked as expected: control
transferred to the labeled location. However, when called from
within an if-statement embedded in a for-loop, the code appeared to
violate a design premise of the execute() function. The test
program gotofails in the examples directory shows what happens when
goto is used inside an if-statement, which is itself inside a
for-loop. It seemed that you could jump out of the for-loop, but
you could not jump to an another location within the
for-loop.The jump itself worked--in the sense that it transferred
control to the labeled location--but then later, when control
reached the bottom of the for-loop, the return location saved on
the stack was unearthed as we unrolled the recursive calls to
execute(). This caused control to continue from the next
instruction after the goto. In other words, the stack caused the
goto to be treated like a call.A global flag named goto_active has been defined in code.c to
fix this problem in the ifcode() and gotocode() functions. Time
will tell whether this is a real fix or just an invitation to yet
another problem.DebuggingTo debug the interpreter, you can cause the threaded code and
also the symbol table, to be dumped to stdout. To cause the symbol
table to be dumped, set PRINT_SYM_TAB nonzero in the header file
hoc.h and then recompile. To create a dump of the threaded code,
set PRINT_MACHINE nonzero. For the input string

        16 + 5 * 10

dumping the threaded code produced

      0   805f08c    804ca00  constpush
      1   805f090    8065ce8  16.000000
      2   805f094    804ca00  constpush
      3   805f098    8065d30  5.000000
      4   805f09c    804ca00  constpush
      5   805f0a0    8065d78  10.000000
      6   805f0a4    804ccd0  mul
      7   805f0a8    804cc30  add
      8   805f0ac    804bd10  discard
      9   805f0b0          0  stop

Line 0, for example, states that the contents of location
805f08c is the value 804ca00, which points to the constpush
threaded code instruction. The constpush instruction expects to
find its data via the next machine word, so the next location, at
line 1, points to the value 16 that yylex(), our lexical analyzer,
will have placed in the symbol table.If PRINT_THREAD is defined nonzero in hoc.h, then as the
threaded code is executed it will print a trace such as this
example from the one-line program above.

   805f08c    804ca00   constpush
   805f094    804ca00   constpush
   805f09c    804ca00   constpush
   805f0a4    804ccd0   mul
   805f0a8    804cc30   add
   805f0ac    804bd10   discard

Bison TraceBison provides a trace feature that prints to stderr a
blow-by-blow account of the parsing operations used to translate
the given grammar rules into a parse table. The trace can be
activated by setting the global symbol yydebug nonzero in the
main() routine, located in the source file main.c. The trace that
is produced details each step of the parsing operation, somewhat as
we did in tracing the grammar of the two-function calculator in our
Bison tutorial.There we followed the action by repeatedly referring to the
parse table. The parse table Bison created for hoc7 is listed in
the file hoc7.output. The first portion of that file summarizes how
the rules of precedence were used by Bison to resolve grammar
conflicts.SummaryLike the BASIC programming language, the hoc language can be
used successfully after studying a few example programs and, like
BASIC, it offers the possibility of obtaining usable results with a
minimal amount of scaffolding. Unlike BASIC, however, the hoc
language is rather easy to modify to suit yourself, the point this
article has sought to demonstrate.Sidelight on Threaded CodeA compiler that generates threaded code is rather closely
related to a "real" compiler that generates machine code. A
threaded code compiler is indeed the front-end of a real compiler.
During work preliminary to his creation of UNIX, Ken Thompson wrote
the B compiler for his cast-away PDP-7 generated threaded code. In
a small tour de force, Dennis Ritchie used that compiler to create
a genuine compiler that ran on Ken's PDP-7 and translated source
programs written in B into machine instructions for the GE-635.
Threaded code was thus involved in bootstrapping both UNIX and the
C programming language.AcknowledgementWhen Alan Cox replied to my e-mailed question, his answer
revealed the blunder that kept stalling all my attempts to port the
device driver to Linux 2.4.ResourcesThe hoc7 interpreter sources are in the compressed file
hoc7n.tgz that can be downloaded from
www.seasurf.com/~jdennon.A detailed description of the original hoc interpreter can be
found in the book The UNIX Programming
Environment
by Brian W. Kernighan and Rob Pike (1984.
Prentice-Hall, ISBN 0-13-937681-X). This book is still in print and
is available at Amazon.Details on using Bison, including instructive example
applications, can be found in the book The Bison
Manual
, by Charles Donnelly and Richard M. Stallman
(1995. The Free Software Foundation, ISBN 1-882114-45-0).The printed version of this book can be ordered on the
GNU.org web
site
, or the book can be read
on-line.For heavy-duty compiler theory, find a copy of
Principles of Compiler Design, by Alfred V.
Aho and Jeffrey D. Ullman (1977. Addison-Wesley, ISBN
0-201-00022-9). This book is out of print, but Amazon does list
some used copies.A good source of information on installing and using SVGAlib
is the book Linux Graphics Programming with
SVGAlib
, by Jay Link (2000. The Coriolis Group, ISBN
1-57610-524-5). Jay's book also provides details on using the
SVGAlib routines that access the mouse.CopyingThe original hoc code published by Kernighan and Pike carries
Bell Labs' copyright of 1984. Conditions on copying the additions
that create hoc7 are detailed in the General Public License (GPL),
published by the Free Software Foundation and referenced in the
COPYING file included with the sources.Jack Dennon studied
mechanical engineering at Oregon State University in Corvallis,
mayhem at the Infantry School at Fort Benning, low flying at the
Army Aviation School at Fort Rucker. Nowadays he uses DOS to
maintain legacy sawmill systems, studies how to replace them with
GNU/Linux and helps his wife homeschool their four children.

email: jdennon@seasruf.net

______________________

Comments

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options