Writing a GCC Front End
GCC, the premier free software compiler suite, has undergone many changes in the last few years. One change in particular, the merging of the tree-ssa branch, has made it much simpler to write a new GCC front end.
GCC always has had two different internal representations, trees and RTL. RTL, the register transfer language, is the low-level representation that GCC uses when generating machine code. Traditionally, all optimizations were done in RTL. Trees are a higher-level representation; traditionally, they were less documented and less well known than RTL.
The tree-ssa Project, a long-term reworking of GCC internals spearheaded by Diego Novillo, changes all that. Now, trees are much better although still imperfectly documented, and many optimizations are done at the tree level. A side effect of this work on trees was the clear specification of a tree-based language called GENERIC. All GCC front ends generate GENERIC, which is later lowered to another tree-based representation called GIMPLE, and from there it goes to RTL.
What this means to you is that it is much, much simpler to write a new front end for GCC. In fact, it now is feasible to write a front end for GCC one without any knowledge of RTL whatsoever. This article provides a tour of how you would go about connecting your own compiler front end to GCC. The information in this article is specific to GCC 4.0, due to be released in 2005.
For our purposes, compilation is done in two phases, parsing and semantic analysis and then code generation. GCC handles the second phase for you, so the question is, what is the best way to implement phase one?
Traditional GCC front ends, such as the C and C++ front ends, generate trees during parsing. Front ends like these typically add their own tree codes for language-specific constructs. Then, after semantic analysis has completed, these trees are lowered to GENERIC by replacing high-level, language-specific trees with lower-level equivalents. One advantage of this approach is the language-specific trees usually are nearly GENERIC already. The lowering phase often can prevent too much garbage from generating.
The primary disadvantage of this approach is trees are typed dynamically. In theory, this might not seem so bad—many dynamically typed environments exist that can be used efficiently by developers, including Lisp and Python. However, these are complete environments, and GCC's heavily macro-ized C code doesn't confer the same advantages.
My preferred approach to writing a front end is to have a strongly typed, language-specific representation of the program, called an abstract syntax tree (AST). This is the approach used by the Ada front end and by gcjx, a rewrite of the front end for the Java programming language.
For instance, gcjx is written in C++ and has a class hierarchy that models the elements of the Java programming language. This code actually is independent of GCC and can be used for other purposes. In gcjx's case, the model can be lowered to GENERIC, but it also can be used to generate bytecode or JNI header files. In addition, it could be used for code introspection of various kinds; in practice, the front end is a reusable library.
This approach provides all the usual advantages of a strongly typed design, and in the GCC context, it results in a program that is easier to understand and debug. The relative independence of the resulting front end from the rest of GCC also is an advantage, because GCC changes rapidly and this loose coupling minimizes your exposure.
Potential disadvantages of this approach are the possibilities that your compiler might do more work than is strictly needed or use more memory. In practice, this doesn't seem to be too important.
Before we talk about some details of interfacing your front end to GCC, let's take a look at some of the documentation and source files you need to know. Because it hasn't been a priority in the GCC community to make it simpler to write front ends, some things you need to know are documented only in the source. The documentation references here refer to info pages and not URLs, because GCC 4.0 has not yet been released. Thus, the Web pages reflect earlier versions. Your best bet is to check out a copy of GCC from CVS and dig around in the source.
gcc/c.opt: describes command-line options used by the C family of front ends. More importantly, it describes the format of the .opt files. You'll be writing one of these.
gcc info page, node Spec Files (source file gcc/doc/invoke.texi): describes the spec minilanguage used by the GCC driver. You'll write some specs to tell GCC how to invoke your front end.
gccint info page, node Front End (source file gcc/doc/sourcebuild.texi): describes how to integrate your front end into the GCC build process.
gccint info page, node Tree SSA (source file gcc/doc/tree-ssa.texi): describes GENERIC.
gcc/tree.def, gcc/tree.h: some attributes of trees don't seem to be documented, and reading these files can help. tree.def defines all the tree codes and is, in large part, explanatory comments. tree.h defines the tree node structures, the many accessor macros and declares functions that are useful in building trees of various types.
libcpp/include/line-map.h: line maps are used to represent source code locations in GCC. You may or may not use these in your front end—gcjx does not. Even if you do not use them, you need to build them when lowering to GENERIC, as information in line maps is used when generating debug information.
gcc/errors.h, gcc/diagnostic.h: defines the interface to GCC's error formatting functions, which you may choose to use.
gcc/gdbinit.in: defines some GDB commands that are handy when debugging GCC. For instance, the pt command prints a textual representation of a tree. The file .gdbinit also is made in the GCC build directory; if you debug there, the macros immediately are available.
gcc/langhooks.h: lang hooks are a mechanism GCC uses to allow front ends to control some aspects of GCC's behavior. Each front end must define its own copy of the langhooks structures; these structures consist largely of function pointers. GCC's middle and back ends call these functions to make language-specific decisions during compilation. The langhooks structures do change from time to time, but due to the way GCC expects front ends to initialize these structures, you largely are insulated from these changes at the source level. Some of these lang hooks are not optional, so your front end is going to implement them. Others are ad hoc additions for particular problems. For instance, the can_use_bit_fields_p hook was introduced solely to work around an optimization problem with the current gcj front end.
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
|Trying to Tame the Tablet||May 08, 2013|
|Dart: a New Web Programming Experience||May 07, 2013|
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Developer Poll
- What's the tweeting protocol?
- Dart: a New Web Programming Experience
- Readers' Choice Awards
- Linux is good
1 hour 35 min ago
- Reply to comment | Linux Journal
1 hour 52 min ago
- Web Hosting IQ
2 hours 22 min ago
- Web Hosting IQ
2 hours 23 min ago
- Web Hosting IQ
2 hours 23 min ago
- Reply to comment | Linux Journal
5 hours 24 min ago
- play with linux? i think you mean work-around linux
13 hours 50 min ago
- Where is Epistle?
13 hours 56 min ago
- You forgot OwnCloud
14 hours 26 min ago
- aplikasi free
17 hours 40 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.