Writing a GCC Front End
GCC, the premier free software compiler suite, has undergone many changes in the last few years. One change in particular, the merging of the tree-ssa branch, has made it much simpler to write a new GCC front end.
GCC always has had two different internal representations, trees and RTL. RTL, the register transfer language, is the low-level representation that GCC uses when generating machine code. Traditionally, all optimizations were done in RTL. Trees are a higher-level representation; traditionally, they were less documented and less well known than RTL.
The tree-ssa Project, a long-term reworking of GCC internals spearheaded by Diego Novillo, changes all that. Now, trees are much better although still imperfectly documented, and many optimizations are done at the tree level. A side effect of this work on trees was the clear specification of a tree-based language called GENERIC. All GCC front ends generate GENERIC, which is later lowered to another tree-based representation called GIMPLE, and from there it goes to RTL.
What this means to you is that it is much, much simpler to write a new front end for GCC. In fact, it now is feasible to write a front end for GCC one without any knowledge of RTL whatsoever. This article provides a tour of how you would go about connecting your own compiler front end to GCC. The information in this article is specific to GCC 4.0, due to be released in 2005.
For our purposes, compilation is done in two phases, parsing and semantic analysis and then code generation. GCC handles the second phase for you, so the question is, what is the best way to implement phase one?
Traditional GCC front ends, such as the C and C++ front ends, generate trees during parsing. Front ends like these typically add their own tree codes for language-specific constructs. Then, after semantic analysis has completed, these trees are lowered to GENERIC by replacing high-level, language-specific trees with lower-level equivalents. One advantage of this approach is the language-specific trees usually are nearly GENERIC already. The lowering phase often can prevent too much garbage from generating.
The primary disadvantage of this approach is trees are typed dynamically. In theory, this might not seem so bad—many dynamically typed environments exist that can be used efficiently by developers, including Lisp and Python. However, these are complete environments, and GCC's heavily macro-ized C code doesn't confer the same advantages.
My preferred approach to writing a front end is to have a strongly typed, language-specific representation of the program, called an abstract syntax tree (AST). This is the approach used by the Ada front end and by gcjx, a rewrite of the front end for the Java programming language.
For instance, gcjx is written in C++ and has a class hierarchy that models the elements of the Java programming language. This code actually is independent of GCC and can be used for other purposes. In gcjx's case, the model can be lowered to GENERIC, but it also can be used to generate bytecode or JNI header files. In addition, it could be used for code introspection of various kinds; in practice, the front end is a reusable library.
This approach provides all the usual advantages of a strongly typed design, and in the GCC context, it results in a program that is easier to understand and debug. The relative independence of the resulting front end from the rest of GCC also is an advantage, because GCC changes rapidly and this loose coupling minimizes your exposure.
Potential disadvantages of this approach are the possibilities that your compiler might do more work than is strictly needed or use more memory. In practice, this doesn't seem to be too important.
Before we talk about some details of interfacing your front end to GCC, let's take a look at some of the documentation and source files you need to know. Because it hasn't been a priority in the GCC community to make it simpler to write front ends, some things you need to know are documented only in the source. The documentation references here refer to info pages and not URLs, because GCC 4.0 has not yet been released. Thus, the Web pages reflect earlier versions. Your best bet is to check out a copy of GCC from CVS and dig around in the source.
gcc/c.opt: describes command-line options used by the C family of front ends. More importantly, it describes the format of the .opt files. You'll be writing one of these.
gcc info page, node Spec Files (source file gcc/doc/invoke.texi): describes the spec minilanguage used by the GCC driver. You'll write some specs to tell GCC how to invoke your front end.
gccint info page, node Front End (source file gcc/doc/sourcebuild.texi): describes how to integrate your front end into the GCC build process.
gccint info page, node Tree SSA (source file gcc/doc/tree-ssa.texi): describes GENERIC.
gcc/tree.def, gcc/tree.h: some attributes of trees don't seem to be documented, and reading these files can help. tree.def defines all the tree codes and is, in large part, explanatory comments. tree.h defines the tree node structures, the many accessor macros and declares functions that are useful in building trees of various types.
libcpp/include/line-map.h: line maps are used to represent source code locations in GCC. You may or may not use these in your front end—gcjx does not. Even if you do not use them, you need to build them when lowering to GENERIC, as information in line maps is used when generating debug information.
gcc/errors.h, gcc/diagnostic.h: defines the interface to GCC's error formatting functions, which you may choose to use.
gcc/gdbinit.in: defines some GDB commands that are handy when debugging GCC. For instance, the pt command prints a textual representation of a tree. The file .gdbinit also is made in the GCC build directory; if you debug there, the macros immediately are available.
gcc/langhooks.h: lang hooks are a mechanism GCC uses to allow front ends to control some aspects of GCC's behavior. Each front end must define its own copy of the langhooks structures; these structures consist largely of function pointers. GCC's middle and back ends call these functions to make language-specific decisions during compilation. The langhooks structures do change from time to time, but due to the way GCC expects front ends to initialize these structures, you largely are insulated from these changes at the source level. Some of these lang hooks are not optional, so your front end is going to implement them. Others are ad hoc additions for particular problems. For instance, the can_use_bit_fields_p hook was introduced solely to work around an optimization problem with the current gcj front end.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Nice article, thanks for the
1 hour 20 min ago
- I once had a better way I
7 hours 6 min ago
- Not only you I too assumed
7 hours 24 min ago
- another very interesting
9 hours 17 min ago
- Reply to comment | Linux Journal
11 hours 10 min ago
- Reply to comment | Linux Journal
18 hours 4 min ago
- Reply to comment | Linux Journal
18 hours 21 min ago
- Favorite (and easily brute-forced) pw's
20 hours 12 min ago
- Have you tried Boxen? It's a
1 day 2 hours ago
- seo services in india
1 day 6 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?