Monday, October 11, 2010

The wacky world of compiling Java

Java compilers can supposedly figure out what needs to be compiled, but this feature is broken.

Consider the following three files:

  • A.java:
    public class A {
       public static final int x = 1;
    }
    
  • B.java:
    public class B {
       public static final int y = A.x * 10 + 1;
    }
    
  • C.java:
    public class C extends B {
       public static void main(String[] argv) {
           System.out.println(Integer.toString(B.y * 10 + 1));
       }
    }
    

Usually, when one class depends on another, the Java compiler can figure out the dependency by looking at the associated class file. The trick in this case is that each class depends only on a public static final member of the previous class. The Java specification allows static final constants to be inlined, in which case the corresponding class file contains no trace of the dependency. This causes javac problems.

To illustrate the problem, I ran several tests. In each case I erased all files, created A.java, B.java, and C.java as shown above, and compiled them. Then I modified one or more of the files (by changing the numerical constant) and triggered a new compilation by doing the equivalent of "javac C.java". If the Java compiler could detect dependencies correctly, then it would determine that C.java depends on B.java which depends on A.java, and it would compile the modified file(s) and all dependent file(s). So, for example, if B.java is modified, then B.java and C.java should be recompiled, whereas the old A.class file should be used as-is. I ran these tests using the Java compilers that I have handy: OpenJDK 1.6.0_18, the GNU compiler for Java 4.4.3 (gcj), the Eclipse batch Java compiler 3.5.1, and the Eclipse IDE. The results are tabulated below.

OpenJDK 1.6.0_18 File(s) recompiled Result?
javac C.java A.java B.java C.java
Files changed No files     X Debatable
A.java     X BROKEN!
B.java X X X Too much
A.java and B.java X X X Success
C.java     X Success
All files X X X Success
OpenJDK 1.6.0_18 File(s) recompiled Result?
javac -Xprefer:newer C.java A.java B.java C.java
Files changed No files     X Debatable
A.java     X BROKEN!
B.java   X X Success
A.java and B.java X X X Success
C.java     X Success
All files X X X Success
OpenJDK 1.6.0_18 File(s) recompiled Result?
javac -Xprefer:source C.java A.java B.java C.java
Files changed No files X X X Too much
A.java X X X Success
B.java X X X Too much
A.java and B.java X X X Too much
C.java X X X Too much
All files X X X Too much
gcj 4.4.3 (GNU compiler for Java) File(s) recompiled Result?
gcj -C C.java A.java B.java C.java
Files changed No files     X Debatable
A.java     X BROKEN!
B.java   X X Success
A.java and B.java X X X Success
C.java     X Success
All files X X X Success
Eclipse batch compiler 3.5.1 File(s) recompiled Result?
ecj C.java A.java B.java C.java
Files changed No files     X Debatable
A.java     X BROKEN!
B.java   X X Success
A.java and B.java X X X Success
C.java     X Success
All files X X X Success
Eclipse IDE File(s) recompiled Result?
Files edited within Eclipse than saved A.java B.java C.java
Files changed No files       Success
A.java X X X Success
B.java   X X Success
A.java and B.java X X X Success
C.java     X Success
All files X X X Success

Flavors of fail

The Java compilers err in two ways:

  • Compiling too much — one or more files were compiled even though neither the file nor any of its dependencies were changed. This mistake results in correct build products, but creates extra work for the compiler (namely, a full build every time).
  • Compiling too little — a file that was changed was not recompiled, even though it was a (direct or indirect) dependency of C.java. This mistake results in incorrect build products and a broken application!

As you can see, the only method of compilation that compiled the correct, minimum amount of code in each case was the automatic compilation that is done by the Eclipse IDE. Unfortunately, an IDE cannot practically be used in an automated build system.

The only way reliably to get correct build results from a command line compiler is to use the -Xprefer:source option to the OpenJDK compiler. What does this option do? It causes the compiler to ignore any existing class files and compile everything from source at each build. Forget incremental compilation.

All other options fail to notice that A.java and B.java need to be recompiled when A.java is changed. Presumably the compiler notices, when compiling C.java, that it depends on B.class. But, not being able to tell from B.class that it depends on A.class, it thinks that the B.class file is up to date and so never even looks in B.java or A.java. The result is that the build is incomplete and the application is broken.

A broken build is a nasty thing because it introduces an inconsistency at a level where it is not expected — the application runs differently than the source code specifies. This can easily result in a debugging wild goose chase (correct source code that mysteriously doesn't run correctly) or a broken deployment (the incorrectly-compiled code passed all tests, but a full build at deployment exposed the broken code). I am not willing to tolerate even a small chance of a broken build.

What can be done?

It is not reasonable to rely on an IDE for building code to be deployed. Among other things, an IDE is hard to automate, and one would need to ensure that developers are all using exactly the same version of the IDE. Therefore, using the Eclipse IDE for deployment builds is not an alternative.

If each class file would note which other files it depends on (including via static final uses), then it would be easy to teach javac to compile the right thing automatically. I'm no expert on this topic but I fear that this would require a change to the class file format, which more or less needs unanimous approval of the United Nations General Assembly.

If there were a tool that could reliably determine Java inter-class dependencies, it could be used by other tools to deduce which files need to be recompiled given a particular source code change. (What is needed is a sort of gcc -M for Java.) I don't know of such a tool, but feedback would be very welcome.

So to summarize, I don't know a good solution to this problem. In practice, I rely on Eclipse's recompilation for most day-to-day use, and do an automated full build when I want to be sure. But this means that I spend a lot of time waiting for javac when I would rather be coding.

Feedback about this topic would be very welcome. What do you use for compiling Java code? Is it reliable? Has the lack of reliable tools caused you grief? What projects should we keep an eye on?