Java Bootstrap - Part 1: The Java Stack

Java Bootstrap - Part 1: The Java Stack

I have an Obsidian notebook on my desktop right now that covers the basic process of bootstrapping a Java Runtime Environment on a POSIXish system.

I'm fortunate enough to be able to benefit from Andrius Štikonas' work on bootstrapping Java, Rust, and Go - and very unfortunate that he seems to be one of very few people interested in being able to retain a complete ground-up bootstrap.

History

It repeats itself if you look away for just a moment.

Today, OpenJDK is released as a binary JDK distribution for Linux/glibc on several platforms. Eclipse release their Temurin distribution, built on Alpine Linux, for Linux/musl on several platforms as well. Since 2007, the source of large parts of Sun's JDK has been made available under Open Source terms, only eleven years after Java promised to change the world.

The state of Java on the Linux desktop has, much like the concept of "the Linux Desktop", evolved much over the years. While it was designed to allow for truly portable cross-platform deployment of applications, Java's first release was available for Windows 95 and NT on i386 (no Alpha, no MIPS, no PPC) and Solaris on SPARC. Java 1.0 for Macintosh System 7.5 would be released later that year, and JavaSoft advised that they would depend on IBM to produce ports for Windows 3.1 and OS/2, and OSF to support the UNIX systems.

In October 1996, the Blackdown project provided JDK 1.0.2 for Linux. In 1999, Sun Microsystem announced that they would port Java to Linux. Between these two points, the open-source community had to do a lot of work in order to usher in this brave new portable cross-platform future, and even into the late 2000s, there were several different pieces of software filling in the gaps in Sun's official support for Linux.

Part of the reason for these gaps were simple portability - Linux is ported to many more platforms than a commercial software offering will be, and until 2007, Sun did not provide an open-sourced Java Virtual Machine. This meant that Linux/m68k did not have a contemporary Java release that would have the same feature set as Windows NT on a blazing-fast Pentium II, somewhat undercutting the promise of Java's portability.

To understand why this was a pain for many early adopters both of Java and Free Software, it's important to understand what is actually required for a functional Java Runtime Environment - the key set of technologies required for running Java code - and a functional Java Development Kit.

The Java Architecture

Excuse me, that's a load-bearing trauma! We can't unpack that!

Currently we have a problem with JamVM locking up that looks a little like this:

while(entering = LOCKWORD_READ(&mon->entering),
  !(LOCKWORD_COMPARE_AND_SWAP(&mon->entering, entering, entering-1)));

An issue which presents identically has been known to the Gentoo Linux folks for seventeen years but not very well ventilated or debugged, so it's a little difficult to ascertain if this is the cause, or if there was a suspiciously identical ppc64 bug many years ago.
the result is that my CPU is being used 100%, but nothing else happens..

The code in question is part of maintaining a lock that complies with Java's memory model. It's difficult right now to tell if the issue is with the native implementation provided with JamVM, or how it's being used in lock.c to provide atomic operations.

So - why are we using JamVM? What does JamVM do, and why is it necessary? That's what we're going to get to here.

The absolute bare-bones Java implementation needs to do three things:

  • an interpreter turns bytecode into code which runs on the target machine.
  • a compiler turns the human-readable Java code into bytecode for the interpreter to run
  • a Class Library - actually, a set of libraries, provides standard functions that exist anywhere, that can be depended upon.

The Class Library needs the least explanation of these - it is similar to the Standard Template Library, and subsequent C++ Standard Library, of C++. In this respect it provides a number of generic functions and datatypes which exist anywhere. And like the C++ Standard Library, some of these functions have been added with later versions of Java, and can't be depended upon to be present in earlier version. Much like coding in C++, additional classes can be accessed by the use of additional packages.

The compiler is the next bit we should look at. Java, in its source code form, looks ... mostly human-readable. Of course, as with any programming language, a person without programming experience won't be able to immediately understand the control flow of a Java programme. However, even with programming experience, the interpreter won't be able to understand it at all.

The compiler operates in the same way as any other compiler - it programmatically converts human-readable instructions into instructions that a machine can execute. While there are some compilers that allow the compilation of Java to a normal binary, in almost every case, compilers, like jikes, gcj, and Sun's javac compile to an intermediate language, called a bytecode. This approach would be used by Microsoft's .NET and its Common Language Runtime later.

This bytecode is comprised of instructions, or opcodes, specifying the manipulation to be performed on a piece of data, all of which fit into a single byte, or eight bits. This allows bytecode to be considerably more compact than human-readable code, and allows for an interpreter to be provided with a programme format that lends itself readily to being executed on the CPU without additional overhead.

The compiler we use in the first stage of the bootstrap is IBM's Jikes compiler. The compiler is necessary to build the class library, which must then ship with an interpreter, referred to in Java as the Virtual Machine. The astute among you may have noted that VM frequently stands for "virtual machine" and wondered if JamVM was named so for this reason. Surprise! It is.

The Java Virtual Machine

This one can't be done in only a few paragraphs.

To quote myself from earlier:

an interpreter turns bytecode into code which runs on the target machine.

Those of you familiar with assembly language, having read about bytecode, may get the idea that execution of bytecode requires simply converting bytecode to machine code for a given platform. That's where this gets a bit complex.

The features of the Java Virtual Machine are specified for a given release of the Java Platform. Some of these features pertain to security, some to memory management, some to Native Methods, and programming interfaces to other languages.

The first thing a JVM must do, before starting to interpret bytecode, is to access the Classes (libraries) that it will need to use. A Class Loader is the first function of the JVM, as such.

The second thing a JVM must do is to implement the tasks that the Java bytecode can instruct it to do. Before this, it verifies that the bytecode is foundationally sensible. While those of you who have played modded Minecraft may laugh at the idea that Java applications aren't meant to crash the host machine, Java applications aren't meant to crash the host machine! Thus, the JVM sanity-checks the bytecode before execution, for such things as type-safety, proper initialisation of data, and appropriate lexical scope.

The first JVM released by Sun used a simple interpreter, but Java 1.1 was shipped with a Just-In-Time compiler. This means that for every function, say, that is called, the compiler will be assessed and compiled to machine code for its first invocation, and then cached for later reuse. This allows a saving in overall execution time at the cost of a small initial translation for each newly-used function in a programme.

The third thing the JVM must do is to be verifiable. Specific inputs must always result in an expected specific output. And that is why JamVM is causing us some issues. On 64-bit POWER, certain assumptions about data structures that served well in other architectures are no longer suitable, and thus testing on a Raptor Engineering Talos II has revealed some problems which will also surely bite us on other platforms if we're not aware of them. These problems did not occur on a commodity x86_64 server, where the initial Java bootstrap was taking place.

Right now, our VM is taking the input of attempting to compile Apache Ant, and producing either core dumps or spinlocks. The output expected is ... well, Apache Ant.

I hope this has been somewhat informative; certainly, I've been able to use it to clear my head while trying to chase down bug reports from over 15 years ago. Future posts will cover further adventures in bootstrap, as well as philosophical treatises on why we even bother.