A Bison Tutorial: Do We Shift or Reduce?

 in
Reading, using and making Bison grammar files.

A shift-reduce conflict is the result of an ambiguity in the grammatical specification of a language, in our case, a programming language. The terms "shift" and "reduce" are explained in the course of this article.

For an example of grammatical specification, turn to page 201 of the book Principles of Compiler Design by Aho and Ullman and consider their two-function calculator that can add (+) and multiply (*). Here we have numbered the grammar rules that define the syntax for that calculator.

  1. expr : expr '+' term

  2. | term

  3. term : term '*' factor

  4. | factor

  5. factor: '(' expr ')'

  6. | id

An "id" is a numeric data value that has been entered by the user. When the user enters the number 16, for example, that creates a token type of id with a token value or semantic value of 16. A factor can be an id, that is, a number, or (|) it can be an expression enclosed in parentheses. A term can be a factor, or it can be a term followed by an asterisk followed by a factor. An expression can be a term, or it can be an expression followed by a plus sign followed by a term.

These six rules define the grammar of a two-function calculator designed to process input strings such as

16 + 5 * 10 \n

When the calculator finds the newline character, we will arrange for it to print the result of the indicated operations.

The Parsing Table

Grammar rules are the input data for the compiler--in our case, Bison--that creates a parsing table from the grammar rules. The parsing table defines a state machine that says what to do next at each step, as we work our way through an input string such as

id * id + id \n

where id represents some number that the user has entered. The newline character (\n) represents the end of the input stream.

In the parsing table below, E means expr, T means term and F stands for factor. A blank or empty space in the parse table represents an error exit. Column headings in the Action portion of the table are called terminal symbols. A terminal symbol is like an atom; it cannot be subdivided. Terminal symbols are also called tokens. Column headings in the Goto portion are called nonterminal symbols. A nonterminal is like a molecule; it is a grouping of terminal and nonterminal symbols.

State               Action                     Goto
_____   _______________________________     ___________      
        id    +     *     (     )    \n     E    T    F

    0   s5               s4                 1    2    3
    1        s6                     acc
    2        r2    s7          r2    r2
    3        r4    r4          r4    r4
    4   s5               s4                 8    2    3
    5        r6    r6          r6    r6
    6   s5               s4                      9    3
    7   s5               s4                          10
    8        s6               s11
    9        r1   s7           r1    r1
   10        r3   r3           r3    r3
   11        r5   r5           r5    r5

At the outset, we set state 0 on our stack and look ahead at the input stream. State 0 in our parsing table, with a look-ahead token of id, says s5, meaning shift and cover the stack with 5. To shift means to move the next input token onto the stack. The first two steps show what happens when we do that.

     Stack                                Input stream

     0                                    id * id + id \n 
     0 id 5                                  * id + id \n

Consulting the parse table for State 5 with a look-ahead of *, we find it says r6, meaning that we are to reduce using grammar rule 6. Reduce means apply the grammar rule to the items at the top of the stack. Reducing according to grammar rule 6, we replace id 5 on the stack with F and have

     0 F                                      * id + id \n

Now there is no state at the top of the stack; this tells us to look at the Goto table for state 0, column F, where we find that we are to cover the stack with state 3. This gives us

     0 F 3                                    * id + id \n

State 3 with a look-ahead of * says to reduce using rule 4. Rule 4 says replace F 3 with T and have

    0 T                                       * id + id \n

This stack tells us to look at the Goto table for state 0 at column T. There we find we are to cover the stack with state 2.

    0 T 2                                     * id + id \n

State 2 with a look-ahead token of * says shift and cover the stack with 7, giving us

    0 T 2 * 7                                    id + id \n

State 7 with a look-ahead of id says shift and cover the stack with 5.

    0 T 2 * 7 id 5                                  + id \n

State 5 with a look-ahead of + says reduce using rule 6. Rule 6 says replace id 5 with F.

    0 T 2 * 7 F                                      + id \n

This stack now tells us to look at the Goto table for state 7 at column F. There we find that we are to cover the stack with state 10.

    0 T 2 * 7 F 10                                   + id \n

State 10 with a look-ahead token of + says to reduce using rule 3. Rule 3 says to replace T * F with the product T, that is, remove two values from the stack, multiply them together and put the product back on the stack.

    0 T                                               + id \n

This stack tells us to look at the Goto table for state 0 at column T. There we find we are to cover the stack with state 2.

    0 T 2                                              + id \n

State 2 with a look-ahead of + says to reduce using rule 2. Rule 2 says to replace T 2 with E.

   0 E                                                 + id \n

This stack tells us to look at the Goto table for state 0 at column E. There we find we are to cover the stack with state 1.

   0 E 1                                               + id \n

State 1 with a look-ahead of + says shift and cover the stack with state 6.

   0 E 1 + 6                                             id \n

State 6 with a look-ahead of id says shift and cover the stack with state 5.

   0 E 1 + 6 id 5                                           \n

State 5 with a look-ahead of \n says reduce using rule 6. Rule 6 says replace id 5 with F.

   0 E 1 + 6 F                                              \n

This stack tells us to look at the Goto table for state 6 at column F. There we see that we are to cover the stack with state 3.

   0 E 1 + 6 F 3                                            \n

State 3 with a look-ahead of \n says reduce using rule 4. Rule 4 says replace F 3 with T.

   0 E 1 + 6 T                                              \n

This stack tells us to look at the Goto table for state 6 at column T. There we find we are to cover the stack with state 9.

   0 E 1 + 6 T 9                                            \n

State 9 with a look-ahead of \n says to reduce using rule 1. Rule 1 tells us to replace E + T with E, that is, remove two numbers from the stack, add them together and then put the sum on the stack.

   0 E                                                      \n

This stack tells us to look at the Goto table for state 0 at column E. There we see that we are to cover the stack with state 1.

   0 E 1                                                    \n

State 1 with a look-ahead of \n tells us to accept the result, we are done; it is time to print the result.

Tedious, isn't it? All this for a grammar that is almost too rudimentary to be interesting. The fundamental steps, however, are the same, no matter how elaborate the grammar may be.

The grammar for this two-function calculator was specifically designed to give multiplication (*) a higher precedence than addition (+). To prove that it does, try manually parsing the input string

id + id * id \n

such as we did above. The calculator should evaluate this input as though it had been written id + (id * id)\n.

A shift-reduce conflict occurs, therefore, when the grammar rules are ambiguous such that the parsing table created from those rules has more than one answer for the question, what do we do next?

Rules of precedence provide one way to resolve some types of ambiguities. Aho and Ullman discuss such matters at some length, and they also discuss a number of techniques for creating parsing tables for a given set of grammar rules. But rather than discuss more theory, let's get down to the nitty-gritty.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: A Bison Tutorial: Do We Shift or Reduce?

jackdennon's picture

My browser manages to seriously mangle

the display of this article. What was a

visible newline
disappears and the

parse table for the example two-function

calculator gets badly disfigured.

If yours does the same, you can

download the plain text version of the

article in the compressed file hoc7n.tgz

at

http://www.seasurf.com/~jdennon

Unpack it with

tar -xvzf hoc7n.tgz

and then read it with any text editor.

jdennon@seasurf.com

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix