Compiling Java with GCJ
Java has not become as pervasive as the original hype suggested, but it is a popular language, used a lot for in-house and server-side development and other applications. Java has less mind-share in the free software world, although many projects are now using it. Examples of free projects using Java include Jakarta from the Apache Foundation (jakarta.apache.org), various XML tools from W3C (www.w3.org) and Freenet (freenet.sourceforge.net). See also the FSF's Java page (www.gnu.org/software/java).
One reason relatively few projects use Java has been the real or perceived lack of quality, free implementations of Java. Two free Java implementations, however, have been around since the early days of Java. One is Kaffe (www.kaffe.org), originally written by Tim Wilkinson and still developed by the company he cofounded, Transvirtual. The other is GCJ (the GNU Compiler for the Java language), which I started in 1996 at Cygnus Solutions (and which this article discusses). GCJ has been fully integrated and supported as a GCC language since GCC version 3.0.
The traditional way to implement Java is a two-step process: a translation phase and an execution phase. (In this respect Java is like C.) A Java program is compiled by javac, which produces one or more files with the extension .class. Each such file is the binary representation of the information in a single class, including the expressions and statements of the class' methods. All of these have been translated into bytecode, which is basically the instruction set for a virtual, stack-based computer. (Because some chips also have a Java bytecode instruction set, it also can be a real instruction set.)
The execution phase is handled by a Java Virtual Machine (JVM) that reads in and executes the .class files. Sun's version is called plain “java”. Think of the JVM as a simulator for a machine whose instruction set is Java bytecodes.
Using an interpreter (simulator) adds quite a bit of execution overhead. A common solution for high-performance JVMs is to use dynamic translation or just-in-time (JIT) compilers. In that case, the runtime system will notice a method has been called enough times to make it worthwhile to generate machine code for that method on the fly. Future calls to the method will execute the machine code directly.
A problem with JITs is startup overhead. It takes time to compile a method, especially if you want to do any optimization, and this compilation is done each time the application is run. If you decide to compile only the methods most often executed, then you have the overhead of measuring those. Another problem is that a good JIT is complex and takes up a fair bit of space (plus the generated code needs space, which may be on top of the space used by the original bytecode). Little of this space can be in shared memory.
Traditional Java implementation techniques also do not interoperate well with other languages. Applications are deployed differently (a Java Archive .jar file, rather than an executable); they require a big runtime system, and calling between Java and C/C++ is slow and inconvenient.
The approach of the GCJ Project is radically traditional. We view Java as simply another programming language and implement it the way we implement other compiled languages. As Cygnus had been long involved with GCC, which was already being used to compile a number of different programming languages (C, C++, Pascal, Ada, Modula2, Fortran, Chill), it made sense to think about compiling Java to native code using GCC.
On the whole, compiling a Java program is actually much simpler than compiling a C++ program, because Java has no templates and no preprocessor. The type system, object model and exception-handling model are also simpler. In order to compile a Java program, the program basically is represented as an abstract syntax tree, using the same data structure GCC uses for all of its languages. For each Java construct, we use the same internal representation as the equivalent C++ would use, and GCC takes care of the rest.
GCJ can then make use of all the optimizations and tools already built for the GNU tools. Examples of optimizations are common sub-expression elimination, strength reduction, loop optimization and register allocation. Additionally, GCJ can do more sophisticated and time-consuming optimizations than a just-in-time compiler can. Some people argue, however, that a JIT can do more tailored and adaptive optimizations (for example, change the code depending on actual execution). In fact, Sun's HotSpot technology is based on this premise, and it certainly does an impressive job. Truthfully, running a program compiled by GCJ is not always noticeably faster than running it on a JIT-based Java implementation; sometimes it even may be slower, but that usually is because we have not had time to implement Java-specific optimizations and tuning in GCJ, rather than any inherent advantage of HotSpot technology. GCJ is often significantly faster than alternative JVMs, and it is getting faster as people improve it.
A big advantage of GCJ is startup speed and modest memory usage. Originally, people claimed that bytecode was more space-efficient than native instruction sets. This is true to some extent, but remember that about half the space in a .class file is taken up by symbolic (non-instruction) information. These symbols are duplicated for each .class file, while ELF executables or libraries can do much more sharing. But where bytecodes really lose out to native code is in terms of memory inside a JVM with a JIT. Starting up Sun's JVM and JIT compiling and applications' classes take a huge amount of time and memory. For example, Sun's IDE Forte for Java (available in the NetBeans open-source version) is huge. Starting up NetBeans takes 74MB (as reported by the top command) before you actually start doing anything. The amount of main memory used by Java applications complicates their deployment. An illustration is JEmacs (JEmacs.sourceforge.net), a (not very active) project of mine to implement Emacs in Java using Swing (and Kawa, discussed below, for Emacs Lisp support). Starting up a simple editor window using Sun's JDK1.3.1 takes 26MB (according to top). XEmacs, in contrast, takes 8MB.
Running the Kawa test suite using GCJ vs. JDK1.3.1, GCJ is about twice as fast, causes about half the page faults (according to the time command) and uses about 25% less memory (according to top). The test suite is a script that starts the Java environment multiple times and runs too many different things for a JIT to help (which penalizes JDK). It also loads Scheme code interactively, so GCJ has to run it using its interpreter (which penalizes GCJ). This experiment is not a real benchmark, but it does indicate that even in its current status you can get improved performance using GCJ. (As always, if you are concerned about performance, run your own benchmark based on your expected job mix.)
GCJ has other advantages, such as debugging with GDB and interfacing with C/C++ (mentioned below). Finally, GCJ is free software, based on the industry-standard GCC, allowing it to be freely modified, ported and distributed.
Some have complained that ahead-of-time compilation loses the big write-once, run-anywhere portability advantage of bytecodes. However, that argument ignores the distinction between distribution and installation. We do not propose native executables as a distribution format, expect perhaps as prebuilt packages (e.g., RPMs) for a particular architecture. You still can use Java bytecodes as a distribution format, even though they don't have any major advantages over Java source code. (Java source code tends to have fewer portability problems than C or C++ source.) We suggest that when you install a Java application, you should compile it to native code if it isn't already so compiled.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Build a Skype Server for Your Home Phone System
- Validate an E-Mail Address with PHP, the Right Way
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
- Tech Tip: Really Simple HTTP Server with Python
2 hours 37 min ago
- Reply to comment | Linux Journal
2 hours 45 min ago
- Understanding the Linux Kernel
5 hours 6 sec ago
7 hours 29 min ago
- Kernel Problem
17 hours 32 min ago
- BASH script to log IPs on public web server
21 hours 59 min ago
1 day 1 hour ago
- Reply to comment | Linux Journal
1 day 2 hours ago
- All the articles you talked
1 day 4 hours ago
- All the articles you talked
1 day 4 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?