Compiling Java with GCJ

Although Java isn't a popular choice for free projects, GJC can make it a viable option.

Java has not become as pervasive as the original hype suggested, but it is a popular language, used a lot for in-house and server-side development and other applications. Java has less mind-share in the free software world, although many projects are now using it. Examples of free projects using Java include Jakarta from the Apache Foundation (jakarta.apache.org), various XML tools from W3C (www.w3.org) and Freenet (freenet.sourceforge.net). See also the FSF's Java page (www.gnu.org/software/java).

One reason relatively few projects use Java has been the real or perceived lack of quality, free implementations of Java. Two free Java implementations, however, have been around since the early days of Java. One is Kaffe (www.kaffe.org), originally written by Tim Wilkinson and still developed by the company he cofounded, Transvirtual. The other is GCJ (the GNU Compiler for the Java language), which I started in 1996 at Cygnus Solutions (and which this article discusses). GCJ has been fully integrated and supported as a GCC language since GCC version 3.0.

Traditional Java Implementation Model

The traditional way to implement Java is a two-step process: a translation phase and an execution phase. (In this respect Java is like C.) A Java program is compiled by javac, which produces one or more files with the extension .class. Each such file is the binary representation of the information in a single class, including the expressions and statements of the class' methods. All of these have been translated into bytecode, which is basically the instruction set for a virtual, stack-based computer. (Because some chips also have a Java bytecode instruction set, it also can be a real instruction set.)

The execution phase is handled by a Java Virtual Machine (JVM) that reads in and executes the .class files. Sun's version is called plain “java”. Think of the JVM as a simulator for a machine whose instruction set is Java bytecodes.

Using an interpreter (simulator) adds quite a bit of execution overhead. A common solution for high-performance JVMs is to use dynamic translation or just-in-time (JIT) compilers. In that case, the runtime system will notice a method has been called enough times to make it worthwhile to generate machine code for that method on the fly. Future calls to the method will execute the machine code directly.

A problem with JITs is startup overhead. It takes time to compile a method, especially if you want to do any optimization, and this compilation is done each time the application is run. If you decide to compile only the methods most often executed, then you have the overhead of measuring those. Another problem is that a good JIT is complex and takes up a fair bit of space (plus the generated code needs space, which may be on top of the space used by the original bytecode). Little of this space can be in shared memory.

Traditional Java implementation techniques also do not interoperate well with other languages. Applications are deployed differently (a Java Archive .jar file, rather than an executable); they require a big runtime system, and calling between Java and C/C++ is slow and inconvenient.

The GCJ Solution: Ahead-of-Time Compilation

The approach of the GCJ Project is radically traditional. We view Java as simply another programming language and implement it the way we implement other compiled languages. As Cygnus had been long involved with GCC, which was already being used to compile a number of different programming languages (C, C++, Pascal, Ada, Modula2, Fortran, Chill), it made sense to think about compiling Java to native code using GCC.

On the whole, compiling a Java program is actually much simpler than compiling a C++ program, because Java has no templates and no preprocessor. The type system, object model and exception-handling model are also simpler. In order to compile a Java program, the program basically is represented as an abstract syntax tree, using the same data structure GCC uses for all of its languages. For each Java construct, we use the same internal representation as the equivalent C++ would use, and GCC takes care of the rest.

GCJ can then make use of all the optimizations and tools already built for the GNU tools. Examples of optimizations are common sub-expression elimination, strength reduction, loop optimization and register allocation. Additionally, GCJ can do more sophisticated and time-consuming optimizations than a just-in-time compiler can. Some people argue, however, that a JIT can do more tailored and adaptive optimizations (for example, change the code depending on actual execution). In fact, Sun's HotSpot technology is based on this premise, and it certainly does an impressive job. Truthfully, running a program compiled by GCJ is not always noticeably faster than running it on a JIT-based Java implementation; sometimes it even may be slower, but that usually is because we have not had time to implement Java-specific optimizations and tuning in GCJ, rather than any inherent advantage of HotSpot technology. GCJ is often significantly faster than alternative JVMs, and it is getting faster as people improve it.

A big advantage of GCJ is startup speed and modest memory usage. Originally, people claimed that bytecode was more space-efficient than native instruction sets. This is true to some extent, but remember that about half the space in a .class file is taken up by symbolic (non-instruction) information. These symbols are duplicated for each .class file, while ELF executables or libraries can do much more sharing. But where bytecodes really lose out to native code is in terms of memory inside a JVM with a JIT. Starting up Sun's JVM and JIT compiling and applications' classes take a huge amount of time and memory. For example, Sun's IDE Forte for Java (available in the NetBeans open-source version) is huge. Starting up NetBeans takes 74MB (as reported by the top command) before you actually start doing anything. The amount of main memory used by Java applications complicates their deployment. An illustration is JEmacs (JEmacs.sourceforge.net), a (not very active) project of mine to implement Emacs in Java using Swing (and Kawa, discussed below, for Emacs Lisp support). Starting up a simple editor window using Sun's JDK1.3.1 takes 26MB (according to top). XEmacs, in contrast, takes 8MB.

Running the Kawa test suite using GCJ vs. JDK1.3.1, GCJ is about twice as fast, causes about half the page faults (according to the time command) and uses about 25% less memory (according to top). The test suite is a script that starts the Java environment multiple times and runs too many different things for a JIT to help (which penalizes JDK). It also loads Scheme code interactively, so GCJ has to run it using its interpreter (which penalizes GCJ). This experiment is not a real benchmark, but it does indicate that even in its current status you can get improved performance using GCJ. (As always, if you are concerned about performance, run your own benchmark based on your expected job mix.)

GCJ has other advantages, such as debugging with GDB and interfacing with C/C++ (mentioned below). Finally, GCJ is free software, based on the industry-standard GCC, allowing it to be freely modified, ported and distributed.

Some have complained that ahead-of-time compilation loses the big write-once, run-anywhere portability advantage of bytecodes. However, that argument ignores the distinction between distribution and installation. We do not propose native executables as a distribution format, expect perhaps as prebuilt packages (e.g., RPMs) for a particular architecture. You still can use Java bytecodes as a distribution format, even though they don't have any major advantages over Java source code. (Java source code tends to have fewer portability problems than C or C++ source.) We suggest that when you install a Java application, you should compile it to native code if it isn't already so compiled.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Compiling Java with GCJ

Anonymous's picture

I agree with the remarks about the beginning of the article. I would go further and criticize the claim that the two phase approach of Java is similar to C - it is much more similar to Basic with an initial compilation to a state that is later interpreted. Some may not like seeing Java compared to Basic or the original implementation of Pascal with p-code, but that is the Java model.

As a former OS internals developer, I wonder exactly what is supposed to be meant by "when the JVM becomes part of the OS" and how that is supposed to improve startup time to any great extent. It is the class load time and the byte code interpretation time that is the big issue here, not how long it takes to start the process running the JVM. Putting the JVM inside the kernel like a device driver would not be at all helpful in a virtual memory environment. Requiring a service trap from the application code to the kernel to get to the JVM would slow things down much more. If the claim is that putting the JVM on the same distribution CD as the OS will somehow speed things up, that does not make any sense. As for the JVM becoming part of the OS in JDK 1.5, I have not seen any announcement at www.java.sun.com about that. Besides, the JDK is irrelevant at runtime because the JRE is what interprets the byte codes in the class file.

As for transporting classes, a better example would be running Java classes inside a web browser for an applet. That is where the portability of Java classes is worth the cost at runtime to interpret everything or use a "not quite Just In Time" compiler that detects when too much time has been wasted and then optimizes code that may not run again.

In a servlet environment, noticing repeatedly executed code can have a payoff for future requests. For a Java application that runs for a while and then exits, the JIT optimizations are too late for any big payoff in performance.

Sun has resisted having Java be a compiled language all along. Whether this has helped them sell faster hardware is unknown but it certainly has slowed acceptance of Java in many cases. GCJ has the potential to be the perfect solution for cases where Java as source is desired but execution speed is important. This would allow source to be in a language that has many advantages, yet allow the installation to be specific to the hardware and OS for execution speed similar to C++.

Re: Compiling Java with GCJ

Anonymous's picture

Java hasn't been "interpreted" for a long while now. It is compiled "just-in-time", which is a totally different thing. The code that runs is real machine code for the actual processor type it is running on, unlike "p-code" or similar.

I don't know what the comment "when the JVM becomes part of the OS" comment means either. However there is a feature in Java1.5 where starting a new java application will *not* start a new JVM intance. Instead, it just loads the classes associated with the new application into an existing JVM. And a JVM can be left "idle in the background" when no java apps at all are running, so that when one is started it starts much quicker.

So whether you are running 1, 2, 5 or 50 java-based apps, there is only one JVM. This is possible because using different class-loaders can totally isolate applications from each other; they aren't aware that they are sharing a JVM.

Possible issues that I can see, though, involve:
* process priority ("nice" etc)
* process killing (kill -9)
* JNI libraries loaded by one app crashing the JVM

In many cases, however, sharing a JVM could be beneficial, particularly if the java standard libraries only need to be loaded once (and JIT'ed once).

Yes, the app itself still needs to be "JIT'ed" when run.

Re: Compiling Java with GCJ

Anonymous's picture

Most people don't need to use RMI in a JINI environment. They just want something that works, nicely, for writing apps. They want a rich programming environment with overflow detection, garbage collection, and a nice simple, usable object model. You are right that for a lot of things Java is currently used for, gcj probably won't work, but for a lot of things C and C++ are used for, gcj will work *better*.

Re: Compiling Java with GCJ

Anonymous's picture

Your vision of the uses for Java is very limited. There are times that we want to reuse code for Windows client apps from other types of apps and we don't want to re-write to C++ or VB or something.
GCJ (and other native compilers) are useful in these cases because we want to protect our source code (not all software is free.. some of us need to eat). Platform portability is not needed and not even desirable.
The Java class format and obfuscators are not good at protecting source code well enough. Native compilers are much better.
Also telling a user to copy one file is much easier than telling him to install a JVM, set the classpath, etc..
GCJ + SWT is very attractive.

Re: Compiling Java with GCJ

Anonymous's picture

Hallo ,
Can i import c++ code to java code using the cni interface and compile it to a class file (and how)??
Sbile

Re: Compiling Java with GCJ

Anonymous's picture

"Although Java isn't a popular choice for free projects [...]"

Yeah, sure... Number of projects registered at Sourceforge, by technology:

C (10368)

C++ (9957)

Java (8101)

Perl (4413)

PHP (6103)

Re: Compiling Java with GCJ

Anonymous's picture

Yea but how many of those have made it into a linux distrobution?

Re: Compiling Java with GCJ

Anonymous's picture

After reading this, I decided to do some rudimentary benchmarking. Here are my results and comments:

CPU OS Compiler JVM Parsing Unparse

500Cel W2K Javac 1.4.1_01 JSE 1.4.1_01 1.8 (2.25) 1.6 (2) (interpolation to 400Mhz)

400Cel RH8 Javac 1.4.1_01 JSE 1.4.1_01 2.5 2.3

400Cel RH8 Jikes JSE 1.4.1_01 2.5 2.3

400Cel RH8 IBM-1.4.0 IBM-1.4.0 5.3 2.3

400Cel RH8 gcj Native Code 6.8 4.3

400Cel RH8 Javac 1.4.1_01 J2ME/Personal 46.9 11.0

400Cel RH8 Javac 1.4.1_01 gnu (gij) 142 6.7

400Cel RH8 Jikes Wonka 170 347

Notes: All tests were repeated for 10,000 iterations. This is the average per iteration in milliseconds. The

tested routine is an xml parser which is processor intensive. There is very little network/disk

utilization. Most of the the other JVM projects seem dead. I could not test the WebLogic

Jrockit JVM as that requires (?) Redhat Advanced Server. Sun seems to be doing some

thing right as their JVM seems to smoke all the others...

Re: Compiling Java with GCJ

Anonymous's picture

Strange, testing numerical array operation (say dot product of two double array of length 1000, repeated 4 x 1000000 times) gcj is almost as fast as gcc (10sec vs 17 sec) when Sun jdk1.4 is 2.5 to 3 times slower (55 sec) .

This is great stuff for me. Developping in java, with all its comfort, and then compile it to be as fast as gcc.

Re: Compiling Java with GCJ

Anonymous's picture

To correct my numbers in the previous message:
gcj (-O3, no bounds check) : 20 seconds, gcc (-O3) : 17 seconds, JDK HotSpot 1.4.1_01: 55 seconds. Al this very approximate, but seeing the difference no neeed of statistical test.

Re: Compiling Java with GCJ

Anonymous's picture

Did you test with the server version of hotspot jvm?
java -server
Default is client.

Re: Compiling Java with GCJ

Anonymous's picture

The IBM 1.3.1 JVM smokes the IBM 1.4.0 JVM (and all of the Sun JVMs, last I heard). For some reason, the IBM JVM has gotten slower in the latest revision. Also, what optimizations did you use with gcj. -O2 at the minimum. You should also consider -O3, -fno-bounds-checking and -fomit-frame-pointer.

On the other hand, I'm currently working with some acoustic modeling code that someone else translated from Fortan to Java. The gcj dynamically linked binary takes 10 times as long to run as the Sun 1.4.1_01 JVM, and statically linking the binary makes it take 20 times longer than the JVM. The source code is identical. I don't know what's wrong. It may be that StrictMath is too young in gcj. StrictMath isn't available in GCJ 3.0 (the default version under Debian Linux), but is available in GCJ 3.2.

When can we expect suport for Swing?

Anonymous's picture

I would be really really impressed of you could compile Swing apps.

Re: When can we expect suport for Swing?

Anonymous's picture

Actually some swing applications compile. The main limitations seems to be missing methods. For example, to get one of my applications to compile I had to change code from:

JEditorPane pane=new JEditorPane(url);
pan.setEditable(false);

to:

JEditorPane pane=new JEditorPane(url);
final Class [] params = { Boolean.TYPE };
final Object [] args = { Boolean.FALSE };
try {
JeditorPane.class.getMethod(
"setEditable",params).invoke(pane,args);
} catch (final Throwable ignored) {}

With this change, my application compiled.

Of course compiling and running are two different things. Only about 4% of the swing code is actually implemented. The rest of the methods are place holders.

Bill

Re: When can we expect suport for Swing?

Anonymous's picture

Here! Here!

Re: Compiling Java with GCJ

Anonymous's picture

Can gcj compile from a jar file and/or .class files rather than from Java source code? I want to compile

an app which uses a couple of jar files, and recompiling them

from source is hard.

Re: Compiling Java with GCJ

Anonymous's picture

great stuff!

i came from: http://www.rhoads.com/papers/holygrail.jsp

~hugh

Re: Compiling Java with GCJ

Anonymous's picture

Interesting....unfortunately I could not even get HelloWorld to compile using Cynwin.

Ohh well will try it again on my linux machine when I get home.

Keep up the good work.

Re: Compiling Java with GCJ

Anonymous's picture

You need to install the libiconv.a package and add all the missing libraries on the command line. Doesn't seem to all be setup under cygwin.

gcc -c Hello.java
gcc --main=Hello -o Hello Hello.o -l gcj -l iconv -l z

Damn there is still something missing. What is this _WinMain@16 ? Oh well never mind.

Re: Compiling Java with GCJ

Anonymous's picture

that second line should be 'gcj'; not gcc. Then you can leave off all the '-l' options.

Re: Compiling Java with GCJ

Anonymous's picture

I love java, and I can only thank the GCJ team for the very good job they are making. Java, as released by Sun, is not open source and this has limited the language acceptance in the linux world: I can only hope that the GCJ project will solve this! On my side, I will for sure start writing my java code for gcj!

Tanks, and keep up the good work

Enrico

Re: Compiling Java with GCJ

Anonymous's picture

You damn sure my freind....
Unfortunately I am learning by now but one day I will like you :)

Re: Compiling Java with GCJ

Anonymous's picture

You damn sure my freind....
Unfortunately I am learning by now but one day I will like you :)

Tried to compile a simple

Andre's picture

Tried to compile a simple "Hello world" program, and the exe created is 4.2 MB !! Whoa ! So I stripped it and it's size went down to 2.1 MB
....
Isn't this some kind of bloat ? It works well tough

that plus has one dll dependency

MoFoQ's picture

yea...for me...it was 3.4MB plus a 980KB dll (libiconv2.dll).
With UPX 1.25, the size of the exe is almost 2MB and the dll is 650KB.
Even after strip (strip hello.exe), it's 2.1MB (and it's 640KB after strip and UPX).

in FPC, it's about 20KB (2.0) (1.x is around 8KB or less) before UPX.

so much bloat.
it needs lipo...badly.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix