Friday, June 05, 2009

Biased Locking in Java SE 6.0

As a result of the ensuing discussion in response to this blog entry, it is now pretty clear that the issue described herein is not specific to Java SE 6. Please do not reach any conclusions without reading the entire post, including the comments, and especially the comments.

Joern Huxhorn recently informed logback developers that locking in Java SE 6.0 was unfair. Indeed, Java documentation regularly mentions that one should not make assumptions about Java's thread scheduler. For example, the last paragraph of Thread Priority on the Solaris™ Platform, states:
Likewise, no assumptions should be made about the order in which threads are granted ownership of a monitor or the order in which threads wake in response to the notify or notifyAll method. An excellent reference for these topics is Chapter 9, "Threads," in Joshua Bloch's book "Effective Java Programming Language Guide."
If you actually bother to read the book, in Item 51, Joshua Bloch writes:
When multiple threads are runnable, the thread scheduler determines which threads get run and for how long. Any reasonable JVM implementation will attempt some sort of fairness when making this determination.
While Joshua Bloch's statement above is specifically about the JVM scheduler, I think that the notion of "some sort of fairness" applies to locks as well. The general principle is the same.

Here is a small java application called LockingInJava which illustrates the locking behavior in question.

public class LockingInJava implements Runnable {

static int THREAD_COUNT = 5;
static Object LOCK = new Object();
static Runnable[] RUNNABLE_ARRAY = new Runnable[THREAD_COUNT];
static Thread[] THREAD_ARRAY = new Thread[THREAD_COUNT];

private int counter = 0;

public static void main(String args[]) throws InterruptedException {
printEnvironmentInfo();
execute();
printResults();
}

public static void printEnvironmentInfo() {
System.out.println("java.runtime.version = "
+ System.getProperty("java.runtime.version"));
System.out.println("java.vendor = "
+ System.getProperty("java.vendor"));
System.out.println("java.version = "
+ System.getProperty("java.version"));
System.out.println("os.name = "
+ System.getProperty("os.name"));
System.out.println("os.version = "
+ System.getProperty("os.version"));
}

public static void execute() throws InterruptedException {
for (int i = 0; i < THREAD_COUNT; i++) {
RUNNABLE_ARRAY[i] = new LockingInJava();
THREAD_ARRAY[i] = new Thread(RUNNABLE_ARRAY[i]);
}
for (Thread t : THREAD_ARRAY) {
t.start();
}
// let the threads run for a while
Thread.sleep(10000);

for (int i = THREAD_COUNT - 1; i >= 0; i--) {
THREAD_ARRAY[i].interrupt();
}
Thread.sleep(100); // wait a moment for termination, too lazy for join ;)
}

public static void printResults() {
for (int i = 0; i < RUNNABLE_ARRAY.length; i++) {
System.out.println("runnable[" + i + "]: " + RUNNABLE_ARRAY[i]);
}
}

public void run() {
for (;;) {
synchronized (LOCK) {
counter++;
try {
Thread.sleep(10);
} catch (InterruptedException ex) {
break;
}
}
}
}

public String toString() {
return "counter=" + counter;
}
}
When run under Sun's JDK version 1.6.0_11 on a 64bit Dual Core AMD Opteron running Linux, here is what gets printed on the console.
java.runtime.version = 1.6.0_11-b03
java.vendor = Sun Microsystems Inc.
java.version = 1.6.0_11
os.name = Linux
os.version = 2.6.25-gentoo-r6
runnable[0]: counter=1002
runnable[1]: counter=0
runnable[2]: counter=0
runnable[3]: counter=0
runnable[4]: counter=0
Notice how only the first thread gets any work done. All the other threads are completely starved while waiting for the lock. The same application run with JDK 1.5 or older will have much more uniform results. Some threads will get more often access to the lock than others, but all threads will get some access. With JDK 1.6, access to locks is dispensed in a biased fashion. In other words, the thread currently holding the lock will always be favored compared to other competing threads. Sun calls it biased locking. It purportedly makes better use of CPU capabilities.

As for the argument about no guarantees offered about fairness, while true in letter, it is invalid in spirit. The argument is too punctilious and formalistic. Any developer reading the documentation will naturally assume that while there are no guarantees, the synchronization mechanism will not actively impede fairness.

Biased locking, as introduced in JDK 1.6, will affect thousands of unsuspecting applications. The fact that the bias is intentional does not make it less damaging. It's still a bug.

19 comments:

Matt said...

I get the same results on my MacPro.

Java 5 (32-bit):
$ java LockingInJava
java.runtime.version = 1.5.0_16-b06-284
java.vendor = Apple Inc.
java.version = 1.5.0_16
os.name = Mac OS X
os.version = 10.5.7
runnable[0]: counter=202
runnable[1]: counter=202
runnable[2]: counter=201
runnable[3]: counter=201
runnable[4]: counter=201

Java 6 (64-bit):
java.runtime.version = 1.6.0_07-b06-153
java.vendor = Apple Inc.
java.version = 1.6.0_07
os.name = Mac OS X
os.version = 10.5.7
runnable[0]: counter=1010
runnable[1]: counter=0
runnable[2]: counter=0
runnable[3]: counter=0
runnable[4]: counter=0

Rainer Jung said...

I vaguely remember that there is a biased locking fix in the release notes of the fresh 1.6.0_14. Could be interesting to retest.

Unknown said...

You might try -XX:-UseBiasedLocking.

Dave (http://blogs.sun.com/dave)

Unknown said...

Following up in more detail, biased locking is an attempt to defray the local latency costs of the atomic instructions use for normal lock acquisition. It shouldn't have any impact on the contended case or lock succession policies. When the JVM encounters contention it'll immediate revoke bias (if any) on the object, and the object will revert to normal locking mode. (Even then, biased is disabled for objects allocated early in execution, so the "LOCK" object should be ineligible for bias).

Over time the JVM has shifted the trade off between fairness and overall aggregate throughput toward throughput at the expense of short term fairness. If monitors were completely fair, for instance, we'd context switch incessantly. Instead we try to keep "hot" threads on the process by letting them barge or dominate locks. This also improves cache residency.

If you need a fair lock then the forms the j.u.c are excellent and will give you precisely what you want.

Dave (http://blogs.sun.com/dave)

huxi said...
This comment has been removed by the author.
huxi said...

Just for the sake of completeness:

You can find my related wiki article at http://bit.ly/10uYpr

Ceki said...

Hello Dave,

Thank you for taking the time to respond. I appreciate it. Your explanation about the tradeoff between fairness and overall aggregate throughput is both clear and convincing. I have no discord with the general concept of favouring the "hot" thread in order to reduce switching.

However, the utterly lopsided and unfair behavior observed on multi-core systems, as illustrated by the LockingJava application (see above) on Linux and Mac OS X, is unconscionable. How is it compatible with your statement that "[biased locking] shouldn't have any impact on the contended case or lock succession."? From what I can observe and contrary to your assertion, when the JVM encounters contention it does not immediately revoke bias and the object does not revert to normal locking mode.

It seems that either I fail to understand what you are trying to say, or there is actually a serious problem in the implementation of locks in the JVM for Linux and OS X.

Unknown said...

I think there's an underlying misconception about biased locking. Again, bias should be revoked immediately the 1st time an object is shared (locked more than one thread). So at onset of contention the lock should revert immediately to normal unbiased state. The "bias" in in biased locking doesn't give one thread special "priority" over other threads, but instead allows one thread to re-lock an object more efficiently without the use of high-latency atomic instructions. When a 2nd thread tries to lock a biased object (be it contention or just simple sharing) we revert to normal locking. Other than one-time revocation costs, biased locking should have no impact on contented locking performance. If it does, it's a bug.

You might try -XX:-UseBiasedLocking to disable biased locking. I just ran LockingInJava on a 64x Solaris/SPARC system and observed the same behavior -- extremely skewed progress -- both with and without biased locking. But I'd urge you to give it a try to verify the results on your particular platform.

Finally, if you transliterate LockingInJava to C with pthreads_mutexes you'll see the same skewed progress behavior, at least on linux and solaris. (I didn't have an opportunity to try macos).

Ceki said...

Thank you for clarifying the point about the locking bias being revoked upon contention. The "Java SE 6 performance White Paper" states as much: "Locking attempts by threads other that the one toward which the object is biased will cause an operation whereby the bias is revoked."

According to your wishes, I have just ran LockingInJava application with the "-XX:-UseBiasedLocking" switch with the same (extremely skewed) results. It looks like there is indeed a bug on multi-core processors running on *nix operating systems, unaffected by turning off biased locking.

The transliteration to C with pthreads_mutexes you mention indicates that the problem is not even Java specific. If so, one could further conclude that Linux and Solaris (and potentially OS X) share the same code as far as this particular matter is concerned, which would be as surprising as it is unlikely.

Anyway, do we/you consider the existence of a bug established? And, if so, where do we go from here?

Unknown said...

Hello Ceki,

I was using C/pthreads as an example to support my claim that locking primitives are tending toward extremely unfair behavior on most modern systems. While sometimes undesirable, I don't believe the behavior is broken, and in fact it's arguably a required in order to give moderately contended code
a fighting chance to scale up on modern multicore processors. (It'll never scale ideally, of course). Large commercial customers seem to prefer the current behavior. Note that Java monitors don't map directly to native mutexes, so I'm not saying that the java behavior arises from the policies of the underlying mutexes.

If you need fairness then other primitives are available, such as the fair locks in j.u.c. (Depending on your particular usage you might suffer reduced throughput with such locks, however).

BTW, see also http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4985566. (The comments were mine, as are the "Likewise, no assumptions..." constraint given in the release notes).

Ceki said...

Thank you for clarifying the point on "java behavior [not necessarily] arising from the policies of the underlying mutexes."

Bug 4985566 which very closely resembles the case I am describing in this blog entry, mentions JDK 1.4 as the Java platform. Subsequently, I ran LockingInJava on Sun as well as BEA's JDK 1.5 on Linux. Interestingly enough, the results were as skewed as with Sun's JDK 1.6. As mentioned previously, running the same application with IBM JDK 1.6, yields much more balanced results.

In summary, the issue does not seem to be necessarily related to "biased locks" introduced in JDK 1.6 but was present in earlier JDKs as well.

As you say, using fair locks which are part of the j.u.c. package can alleviate the problem. However, forcing the java world to rely on fair locks instead of primitives present in the Java language due to the intolerable unfairness of the latter, is imho wrongheaded. The fact that such intolerable unfairness is justified by the phrase "no assumptions should be made about the order in which threads are granted ownership of a monitor" in the Java documentation reminds me of the legal principle that "a manifest abuse of a law is not protected by law."

I recognize that the issue is complicated. However, as Joshua Block states "any reasonable JVM implementation will attempt some sort of fairness when making this determination [about thread allocation]." To be brutally honest, my conclusion is that, at least on the 64bit Linux platform, and contrary to IBM's JVM, Sun's and BEA's JVMs are not reasonable JVM implementations.

Ceki said...

The previous link to to "a manifest abuse of a law is not protected by law" was incorrect. The correct link is included in the previous sentence.

Unknown said...

Playing devil's advocate for a moment, do you feel that the baseline native synchronization operators on windows, macos, linux and solaris are broken?

I'm extremely appreciative of the fact that the formal specification affords an implementer far more latitude than is reasonable to take in practice. We strike balances daily. And I'm fully aware and sympathetic toward the difficulties arising from such varying progress properties exhibited by the various JVMs. But at the same time I have a set of customers who strongly lobbied for the current mechanism.

I could certainly institute some enforced fairness in the JVM, but fairness is defined over some time interval. In experimental prototypes that I've used to explore fairness policies I've used fairly course intervals (~ 1 second). What's your opinion? Specifically, if a thread has dominated a lock for "N" msecs then the next time it tries to acquire the lock it should abdicate to another thread on the entry list of the lock and enqueue itself at the tail of that list. What's your feeling on reasonable "N" values? If the interval is too short then we drive up the context switch rate, so I can't dial it down too far.

The approach is vaguely similar to the anti-languishing mechanism found in the solaris schedulers (and elsewhere), If threads languish too long on the ready queues without being dispatched their priorities will slowly creep up until they run, after which the priority boost will be rescinded. In our case we're worried about threads languishing on the queue while a thread (or small set of threads) dominates the lock.

Ceki said...

I am not too familiar with the native synchronization operators on windows, macos, linux and solaris. However, if on any of those systems, a lock could be released and re-obtained ad infinitum by the same thread, while other competing threads starved, I'd say that that specific operating system offered an unreasonable synchronization primitive. For the record, I am not against some unfairness in the scheduler. My objection is against absolute and prolonged unfairness. As you say, the issue is really about striking a balance.

To answer your question about reasonable values for N, any value less than 2 or 3 seconds would probably be perfectly OK. I have also noticed that, in LockingInJava application above, merely changing Thread.sleep(10) to Thread.sleep(1) within the try/catch block yields rather balanced and acceptable results. I am thus starting to wonder whether the LockingInJava application is representative of a real use-case, and by extension, whether there is a problem at all.

In logback, the logging framework I work on, a lock will usually be released after 3 microseconds, although under certain rare circumstances a lock could be retained for about 10 milliseconds. In any case, values of N in the order of a second should be fine.

I am curious, within the context of the scheduler, wouldn't accessing the value of current time be too costly an operation in itself?

By the way, the legal principle I was referring to is called "a manifest abuse of a right is not protected by law".

Unknown said...

Responding out of order ...

Thanks - it's useful to get a sense of what type of tolerances your application has with respect to fairness.

As for native synchronization primitives, solaris and linux both allow indefinite lock "barging". I'm pretty sure that the latest windows and macos do as well, although I haven't verified those lately.

Regarding Thread.sleep(1) vs sleep(10), sleep times are quantized differently on various platforms. On some systems sleep(1) will round up to at least 10 msecs. I'm assuming you're running on a newer linux kernel (tickless or 1000HZ) or something similar. With a longer sleep period (10 msecs) it'll almost be the case that the competing threads will have given up on spinning and blocked in the kernel. When the lock holder releases the lock it'll select an "heir presumptive" and unpark that thread. But in your case the dominating thread will try to reacquire the lock before the wakee manages to become active. With shorter sleep times it's more likely that the contending threads will be spinning instead of blocked, in which case they can immediately pounce on the lock.

Regarding your question about short-term scheduling and querying the time as part of the algorithm to avoid starvation, yes, getting the time can be costly. Instead, I might use a global low frequency thread or ensure that at least one thread on the entry list blocked using a timed park, and have that thread set at a "MustAbdicate" flag that would be checked in the fast-path lock acquire path.

Anonymous said...

with a small change, the locking seem to be fair:

public void run() {
for (;;) {

// START critical section
synchronized (LOCK) {
counter++;
}
// END critical section

try {
Thread.sleep(10);
} catch InterruptedException ex) {
break;
}
}
}

artemv said...

@Ceki
Don't understand your concern. After all, a bullet point of biased locking - support of "hot"(in terms of working memory) thread. Once it releases LOCK and immediately competes for it_ again, it's still considered "hot" and wins, and this is cool, 'cause total throughput is greater - look at @Matt results(lets pretend that counter is shared i.e. not thread-local as in your original code).

But if after LOCK's release "hot" thread goes to sleep or proceeds with some operation other_ than LOCK acquisition - his working memory gets cooled down, and next time when he competes for LOCK he may/may-not win - look at @Anonymous example.

I believe biased locking getting more efficient when contention is getting higher.

Anonymous said...

Hi Ceki,

i just detected, that your loop condition is wrong. Instead

for (int i = THREAD_COUNT - 1; i <= 0; i--) {
THREAD_ARRAY[i].interrupt();
}

it must be i >=0. The loop in your code is never executed.

Greeting,
Marcus

Eyal Schneider said...

Actually there's another flow in the code above. The counter variable isn't properly handled - The writes are protected by the lock, while the reads (in toString()) aren't. This doesn't conform to the Java Memory Model rules. Interestingly, the output you observed is legal according to the JMM, and can be explained by thread visibility issues.