lex and yacc: Tools Worth Knowing
Both tools are coded in a similar manner. There are three sections in each program: declarations, rules and user routines. Each is separated by a line containing only the two characters %%.
For yacc, the declarations section contains the tokens it can use while parsing the input. Each has a unique value greater than 256, and the set of tokens is introduced via %token at the beginning of the line. lex can use the declarations section to define aliases for items that it must recognize while looking for tokens to pass to yacc.
For example, lex needs to know about white space which, while not used in identifying tokens, must be accounted for in some way. Similarly, mathematical symbols such as + or = must be recognized. These are needed in the interpretation of the algebraic statement coded by the user.
Within the rules section, yacc holds its parsing intelligence. This is the section that contains the grammar rules referred to in the English sentence example earlier. In fact, the coding used earlier is typical of a yacc grammar: items to be recognized are separated from the means to recognize them by a colon (:), and alternative means of recognition are separated from each other via a pipe (|) symbol.
lex uses the rules section to contain the regular expressions that allow it to identify tokens to pass to yacc. These expressions may be the aliases from the declaration section, regular expressions, or some combination.
The last section contains C code which may be invoked as each of the tools processes its input.
One requirement is that the yacc tokens be known to the lex program. This is accomplished by including the following statement:
in the lex declarations section and creating it when compiling the yacc program code.
Compilation is accomplished in the following way:
yacc -d yacc.file -create 'y.tab.c and y.tab.h' flex flex.file -create 'lex.yy.c'
The -d option on yacc's command line creates the y.tab.h file needed by lex.yy.c.
To successfully interpret the user's desired process, the program needs to know which well logs were available for processing. This information is available in the ASCII file selected by the user. This text file contains a one-to-one correspondence between curve description and data values. A very small subset of a typical file is shown in Listing 1.
As can be seen, there are several sections including well information (includes some hole parameters), curve information (notes which curves are in the file) and “A” which holds the recorded data values. Each is introduced with a tilde (~). Because the format of the file is fixed by convention, these are easily parsed, and needed values are stored for subsequent processing.
As written for the client, the program is a Motif application. The user selected the file to be processed; it was read in its entirety and numeric text items were converted to double-precision values.
Besides allowing file and curve merging and editing, there is a menu item for curve processing. Upon selecting this menu item, a dialog box is presented containing a list of available curves and arithmetic operations. The user selects curve names, numeric constants and operations which in turn are displayed as an algebraic operation on a text input line. When satisfied with the mathematical operation, the user clicks OK and the lex and yacc magic occurs. The result is stored as a separate curve and can be included in subsequent operations.
lex processed the incoming algebraic statement with the code shown in Listing 2.
Between lines 1 and 16 are declarations to be used in the program generated by lex. In particular, you will notice the inclusion of the header file y.tab.h which contains the following definitions:
#define INTEGER 257 #define FLOAT 258 #define DOUBLE 259 #define NUMBER 260 #define VARIABLE 261 #define EQUAL 262 #define LPAREN 263 #define RPAREN 264 #define PLUS 265 #define MINUS 266 #define TIMES 267 #define DIVIDE 268 #define RAISE 269 #define LHS 270
These definitions are used by both lex and yacc to describe the items yacc expects to receive from lex. They are generated by statements 73 to 77 of the yacc source which will be examined shortly.
From lines 17 to 31 of the lex listing are declarations which amount to aliases for common items that we wish lex to recognize. For example, we declare DIGIT to be any single numeric between 0 and 9 on line 21. Doing this allows us to declare INT (an integer) to be one or more DIGIT's.
Lines 33 to 90 contain the rules by which lex interprets incoming text items. For example, on line 34 we recognize an equal sign (=) and return the token EQUAL to the caller. In y.tab.h, EQUAL is defined to be 262.
As you can see, the lex rules simply recognize text items and then advise the caller what was seen in the input stream.
yacc interprets the token stream passed to it by lex with the following code, only a subset of which is shown in Listing 3. The code for the yacc routine (with the calling subroutine do_arithmetic and its accessory functions) was in excess of 900 lines. For those interested, it is available for your perusal from SSC's public FTP site. Listing 3 is a sample indicating what needed to be done.
Like the lex routine, yacc begins with lines to be included in the output code. Programs written for graphical user interfaces sit in a loop waiting for the user to click on something. When the user's needs are so indicated, the GUI-based program calls a function to perform the required action. These “called functions” are popularly called callbacks. In this program, one of the callbacks was do_arithmetic, which in turn called the yacc routine, which in its turn called the lex routine.
In Listing 3, do_arithmetic is described in the first few lines, and a portion of the code may be seen in lines 428 to 532. They are shown only to give some indication of what was being accomplished.
yacc does the work with its rules section beginning at line 79, and ending at line 426. Although too long to be included completely, you can see that an equation is defined to be something called an lhs (left hand side) EQUAL rhs (right hand side) at line 80. Looking down the listing, you will see that an equation may also be described by an expr (expression). When either of these are encountered, yacc pops a key from an internal stack created by a function called push (see near line 557) and then causes a log curve to be returned to the caller by calling another function called get_curve (not shown here, but included with the yacc and lex code).
Between lines 118 and 139, you can see how some of the tokens yacc expects are processed when recognized. The complete listing has many more.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Designing Electronics with Linux
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
- Kernel Problem
3 hours 11 min ago
- BASH script to log IPs on public web server
7 hours 38 min ago
11 hours 13 min ago
- Reply to comment | Linux Journal
11 hours 46 min ago
- All the articles you talked
14 hours 9 min ago
- All the articles you talked
14 hours 12 min ago
- All the articles you talked
14 hours 14 min ago
18 hours 39 min ago
- Keeping track of IP address
20 hours 30 min ago
- Roll your own dynamic dns
1 day 1 hour ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?