Eclipse Goes Native
Our first round of changes to libgcj was bug fixing only. We implemented protection domains properly. Then, we made a pass over the entire runtime, fixing bugs related to class loading. Because of the way class loading had been implemented in libgcj, we had to modify all the places in the native code that conceivably might load a class to forward the request to the appropriate class loader.
Once this was done, we were able to start Eclipse using the libgcj bytecode interpreter. At this point the question became, how can we take real advantage of GCJ to compile Eclipse?
The naïve approach to this dilemma, namely precompiling all the classes and linking them all together, had been ruled out by our investigations into Eclipse's internals. This approach would clash with Eclipse's relatively sophisticated class loading strategy.
More investigation revealed that most classes are loaded by instances of the DelegatingURLClassLoader, which is a subclass of the standard URLClassLoader that has been extended to understand Eclipse's plugin architecture. It seemed like the best approach was to modify Eclipse to allow it to load precompiled shared libraries as well as bytecode files. We reasoned that the required changes would be localized due to the way plugin class loading had been structured.
In fact, we had to go one step further and extend libgcj a bit as well. libgcj knew how to load shared libraries invisibly in response to a call to, for example, Class.forName(). However, this magic always happened at the level of the bootstrap class loader. That wouldn't work well for Eclipse or for any other application that defines its own class loaders, so we invented a new gcjlib URL type. This is like a jar URL, but it points to a shared library. We also made some minor extensions to our implementation of URLClassLoader so that gcjlib URLs would be treated specially.
Doing this wasn't enough, however. We also had to solve the linkage problems. In particular, if we compiled a jar file to a shared library, how could we prevent the dlopen() of such a shared library from immediately failing due to unresolved symbols? The solution to this problem was to resurrect and clean up the -fno-assume-compiled option in GCJ. This option, which never had been finished, enabled an alternative ABI that caused GCJ's output to resolve most references at runtime rather than at link time.
The -f-no-assume-compiled option has various limitations and inefficiencies. On the boards for the future is a cleaner way to achieve this same goal. On the GCJ mailing list (see the on-line Resources section) this option is referred to either as the binary compatibility ABI or -findirect-dispatch. This new ABI does everything -fno-assume-compiled does, but in a much more efficient and compatible way. Development is underway and is coming along nicely on this new feature, one of several contributing to GCJ's enterprise readiness.
Once all this was in place, we finally were ready to make our changes to Eclipse. These turned out to be remarkably small. Most of the work involved making the same sort of change in three different places. In essence, we modified Eclipse so that when it's looking for a plugin's jar file, it also looks for a similarly named shared library installed alongside it. If there is one, we rewrite the URL passed to the class loader from a jar URL to a gcjlib URL. All rewriting is done conditionally, so our natively compiled Eclipse still works with an unmodified JVM. In other words, users are not locked in to native compilation if they would rather use a JVM instead.
Once that was done, we wrote our own launcher that understood how to bootstrap the Eclipse platform from shared libraries. This was accomplished in a modest 90 lines of code.
After all that, Eclipse was mysteriously slow. Had we done something wrong? Was GCJ-compiled code substantially worse than the code generated on the fly by the current crop of just-in-time (JIT) compilers? Did -fno-assume-compiled have enormous overhead?
One nice advantage of GCJ is its output generally can be treated in the same way one treats any object code. That is, existing tools such as OProfile can be applied to it directly without any change. And that, in fact, is how we investigated our performance problem.
The first thing we noticed was a large number of exceptions being thrown during platform startup. Amid the grumblings of compiler writers (exceptions should be for exceptional circumstances), and although we were considering changes to the GCJ runtime that would violate Java semantics, we noticed a strange symbol in the OProfile output. It turned out that a small bit of buggy assembly code deep in the libgcj runtime was causing a linear search of exception handling tables rather than the expected binary search. The overhead of this search through the entire program every time an exception was thrown was vast. A fix to the errant assembly code proved this was the problem, and suddenly our natively compiled Eclipse was able to start a second faster than the stock version using a JVM. To quantify it a bit further, the startup time dropped from more than a minute before the fix to less than 15 seconds after it.