Java Concurrency in Practice Notes: Avoid Liveness Hazards

There is often a tension between safety and liveness. We use locking to ensure thread safety, but indiscriminate of locking can cause lock-ordering deadlocks. Similarly, we use thread pool and semaphores to bound resource consumption, but failure to understand the activities being bounded cause resource deadloacks.

Avoiding Liveness Hazards

Deadlock

When thread A holds lock L and tries to acquire lock M, but at the same time thread B holds M and tries to acquire L, both threads will wait forever. This situation is the simplest case of deadlock, when multiple threads wait forever due to a cyclic locking dependency (Think of the treads as the nodes of a directed graph whose edges represent the relation “Thread A is waiting for a resource held by thread B”. If this graph is cyclical, there is a deadlock)

Like many other concurrency hazards, deadlocks rarely manifest themselves immediately. The fact that a class has a potential deadlock doesn’t mean that it ever will deadlock, just that it can. When deadlocks do manifest themselves, it is often as the worst possible time - under heavy production load.

Lock-ordering deadlocks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class LeftRightDeadlock {
private final Object left = new Object();
private final Object right = new Object();
public void leftRight() {
synchronized (left) {
synchronized (right) {
doSomething();
}
}
}
public void rightLeft() {
synchronized (right) {
synchronized (left) {
doSomethingElse();
}
}
}
void doSomething() {
}
void doSomethingElse() {
}
}

LeftRightDeadlock is at the risk for deadlock. If one thread calls leftRight and another calls rightLeft, and their actions are interleaved as shown below, they will deadlock.

Deadlock

The deadlock in LeftRightDeadLock came about because the two threads attempted to acquire the same locks in a different order. If they asked for the locks in the same order, there would be no cyclic locking dependency and therefor no deadlock.

A program will be free of lock-ordering deadlocks if all threads acquire the locks they need in a fixed global order.

Dynamic lock order deadlocks

Sometimes it is not obvious that you have sufficient control over lock ordering to prevent deadlocks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class DynamicOrderDeadlock {
// Warning: deadlock-prone!
public static void transferMoney(Account fromAccount,
Account toAccount,
DollarAmount amount)
throws InsufficientFundsException {
synchronized (fromAccount) {
synchronized (toAccount) {
if (fromAccount.getBalance().compareTo(amount) < 0)
throw new InsufficientFundsException();
else {
fromAccount.debit(amount);
toAccount.credit(amount);
}
}
}
}
}

transferMoney acquires the locks on both Account objects before executing the transfer, ensuring that the balances are updated atomically and without violating invariants such as “an account can not have a negative balance”.

Deadlock can occur if two threads call transferMoney at the same time, one transferring X to Y, and the other doing the opposite:

1
2
A: transferMoney(myAccount, yourAccount, 10);
B: transferMoney(yourAccount, myAccount, 10);

Since the order of arguments is out of our control, to fix the problem we must induce an ordering on the locks and acquire them according to the induced ordering consistently throughout the application.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
public class InduceLockOrder {
private static final Object tieLock = new Object();
public void transferMoney(final Account fromAcct,
final Account toAcct,
final DollarAmount amount)
throws InsufficientFundsException {
class Helper {
public void transfer() throws InsufficientFundsException {
if (fromAcct.getBalance().compareTo(amount) < 0)
throw new InsufficientFundsException();
else {
fromAcct.debit(amount);
toAcct.credit(amount);
}
}
}
int fromHash = System.identityHashCode(fromAcct);
int toHash = System.identityHashCode(toAcct);
if (fromHash < toHash) {
synchronized (fromAcct) {
synchronized (toAcct) {
new Helper().transfer();
}
}
} else if (fromHash > toHash) {
synchronized (toAcct) {
synchronized (fromAcct) {
new Helper().transfer();
}
}
} else {
synchronized (tieLock) {
synchronized (fromAcct) {
synchronized (toAcct) {
new Helper().transfer();
}
}
}
}
}
interface DollarAmount extends Comparable<DollarAmount> {
}
interface Account {
void debit(DollarAmount d);
void credit(DollarAmount d);
DollarAmount getBalance();
int getAcctNo();
}
class InsufficientFundsException extends Exception {
}
}

InduceLockOrder uses System.identityHashCode to generate an ordering on the input arguments to avoid deadlock. But if two objects has the same hascode, then a third “tie breaking” lock is used.

If Account has a unique, immutable, comparable key such as an account number, inducing a lock ordering is even easier: order objects by their key, thus eliminating the need for the tie-breaking lock.

Deadlocks are a serious problem in real systems. A production application may perform billions of lock acquire-release cycles per day. Only one of those needs to be timed just wrong to bring the application to deadlock. DemonstratedDeadLock deadlocks fairly on most systems:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class DemonstrateDeadlock {
private static final int NUM_THREADS = 20;
private static final int NUM_ACCOUNTS = 5;
private static final int NUM_ITERATIONS = 1000000;
public static void main(String[] args) {
final Random rnd = new Random();
final Account[] accounts = new Account[NUM_ACCOUNTS];
for (int i = 0; i < accounts.length; i++)
accounts[i] = new Account();
class TransferThread extends Thread {
public void run() {
for (int i = 0; i < NUM_ITERATIONS; i++) {
int fromAcct = rnd.nextInt(NUM_ACCOUNTS);
int toAcct = rnd.nextInt(NUM_ACCOUNTS);
DollarAmount amount = new DollarAmount(rnd.nextInt(1000));
try {
DynamicOrderDeadlock.transferMoney(accounts[fromAcct], accounts[toAcct], amount);
} catch (DynamicOrderDeadlock.InsufficientFundsException ignored) {
}
}
}
}
for (int i = 0; i < NUM_THREADS; i++)
new TransferThread().start();
}
}

Deadlocks between cooperating objects

Multiple lock acquisition is not always as obvious as in LeftRightDeadlock or transferMoney, the two locks need not be acquired by the same method.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
public class CooperatingDeadlock {
// Warning: deadlock-prone!
class Taxi {
@GuardedBy("this") private Point location, destination;
private final Dispatcher dispatcher;
public Taxi(Dispatcher dispatcher) {
this.dispatcher = dispatcher;
}
public synchronized Point getLocation() {
return location;
}
public synchronized void setLocation(Point location) {
this.location = location;
if (location.equals(destination))
dispatcher.notifyAvailable(this);
}
public synchronized Point getDestination() {
return destination;
}
public synchronized void setDestination(Point destination) {
this.destination = destination;
}
}
class Dispatcher {
@GuardedBy("this") private final Set<Taxi> taxis;
@GuardedBy("this") private final Set<Taxi> availableTaxis;
public Dispatcher() {
taxis = new HashSet<Taxi>();
availableTaxis = new HashSet<Taxi>();
}
public synchronized void notifyAvailable(Taxi taxi) {
availableTaxis.add(taxi);
}
public synchronized Image getImage() {
Image image = new Image();
for (Taxi t : taxis)
image.drawMarker(t.getLocation());
return image;
}
}
}

While no method explicitly acquires two locks, callers of setLocation and getImage can acquire two locks just the same.

If a thread calls setLocation in response to an update from a GPS receiver, it first updates the taxi’s location and then checks to see if it has reached its destination. If it has, it informs the dispatcher that it needs a new destination. Since both setLocation and notifyAvailable are synchronized, the thread calling setLocation acquires the Taxi lock and then the Dispatcher lock. Similarly, a thread calling getImage acquires the Dispatcher lock and then each Taxi lock. Two locks are acquired by two threads in different orders, risking deadlock.

Invoking an alien method with a lock held is asking for liveness trouble. The alien method might acquire other locks (risking deadloack) or block for an unexpectedly long time, stalling other threads that need the lock you hold.

Open calls

Calling an alien method with a lock held is difficult to analyze and therefore risky.

Calling a method with no locks held is called an open call, and classes that rely on open calls are more well-behaved and composable than classes that make calls with locks held. Using open calls to avoid deadlock is analogous to using encapsulation to provide thread safety: while one can certainly construct a thread-safe program without any encapsulation, the thread safety analysis of a program that makes effective use of encapsulation is far easier than that of one that does not. Similarly, the liveness analysis of a program that relies exclusively on open calls is far easier than that of one that does not.

Strive to use open calls throughout your program. Programs that rely on open calls are far easier to analyze for deadlock-freedom than those that allow calls to alien methods with locks held.

Restructuring a synchronized block to allow open calls can sometimes have undesirable consequences, since it takes an operation that was atomic and makes it not atomic. In some cases, the loss of atomicity is a problem, and here you will have to use another technique to achieve atomicity. One such technique is to structure a concurrent object so that only one thread can execute the code path following the open call.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class CooperatingNoDeadlock {
@ThreadSafe
class Taxi {
@GuardedBy("this") private Point location, destination;
private final Dispatcher dispatcher;
public Taxi(Dispatcher dispatcher) {
this.dispatcher = dispatcher;
}
public synchronized Point getLocation() {
return location;
}
public synchronized void setLocation(Point location) {
boolean reachedDestination;
synchronized (this) {
this.location = location;
reachedDestination = location.equals(destination);
}
if (reachedDestination)
dispatcher.notifyAvailable(this);
}
public synchronized Point getDestination() {
return destination;
}
public synchronized void setDestination(Point destination) {
this.destination = destination;
}
}
@ThreadSafe
class Dispatcher {
@GuardedBy("this") private final Set<Taxi> taxis;
@GuardedBy("this") private final Set<Taxi> availableTaxis;
public Dispatcher() {
taxis = new HashSet<Taxi>();
availableTaxis = new HashSet<Taxi>();
}
public synchronized void notifyAvailable(Taxi taxi) {
availableTaxis.add(taxi);
}
public Image getImage() {
Set<Taxi> copy;
synchronized (this) {
copy = new HashSet<Taxi>(taxis);
}
Image image = new Image();
for (Taxi t : copy)
image.drawMarker(t.getLocation());
return image;
}
}
class Image {
public void drawMarker(Point p) {
}
}
}

Resource deadlocks

Just as threads can deadlock when they are each waiting for a lock that the other holds and will not release, they can also deadlock when waiting for resources.

e.g. If a task requires connections to both databases and the two resources are not always requested in the same order, thread A could be holding a connection do database D1 while waiting for a connection to database D2, and thread B could be holding a connection to D2 while waiting for a connection to D1.

Avoiding and diagnosing deadlocks

A lock that never acquires more than one lock at a time cannot experience lock-ordering deadlock. Of course, this is not always practical, but if you can get away with it, it’s a lot less work. If you must acquire multiple locks, lock ordering must be a part of your design: try to minimize the number of potential locking interactions, and follow and document a lock-ordering protocol for locks that may be acquired together.

Timed lock attemps

Another technique for detecting and recovering from deadlocks is to use the timed tryLock feature of the explicit Lock classes instead of intrinsic locking. Where intrinsic locks wait forever if they cannot acquire the lock, explicit locks let you specify a timeout after which tryLock returns failure.

Deadlock analysis with thread dumps

While preventing deadlocks is mostly your problem, the JVM can help identify them when they do happen using thread dumps. A thread dump includes a stack trace for each running thread, similar to the stack trace that accompanies an exception. Thread dumps also include locking information, such as which locks are held by each thread, in which stack frame they were acquired, and which lock a blocked thread is waiting to acquire.

Other liveness hazards

Starvation

Starvation occurs when a thread is perpetually denied access to resources it needs in order to make progress; the most commonly starved resource is CPU cycles. Starvation in Java applications can be caused by inappropriate use of thread priorities. It can slso be caused by executing nonterminating constructs (infinite loops or resource waits that do not terminate) with a lock held, since other threads that need that lock will never be able to acquire it.

Avoid the temptation to use thread priorities, since they increase platform dependence and can cause liveness problems. Most concurrent applications can use the default priority for all threads.

Poor responsiveness

One step removed from starvation is poor responsiveness, which is not uncommon in GUI applications using background threads.

Poor responsiveness can also be caused by poor lock management.

Livelock

Livelock is a form of liveness failure in which a thread, while not blocked, still cannot make progress because it keeps retrying an operation that will always fail.

Livelock often occurs in transactional messaging applications, where the messaging infrastructure rolls back a transaction if a message cannot be processed successfully, and puts it back at the head of the queue. If a bug in the message handler for a partucular type of message causes it to fail, every time the message is dequeued and passed to the buggy handler, the transaction is rolled back. Since the message is not wack at the head of the queue, the handler is called over and over again. (This is sometimes called the poison message problem) This form of livelock often comes from overeager error-recovery code that mistakenly treats an unrecoverable error as a recoverable one.

Livelock can also occur when multiple cooperating threads change their state in response to the others in such a way that no thread can ever make progress.

The solution of this variety of livelock is to introduce some randomness into the retry mechanism. Retrying with random waits and backoffs can be equally effective for avoiding livelock in concurrent applications.

Summary

Liveness failures are a serious problem because there is no way to recover from them short of aborting the application.

The most common form of liveness failure is lock-ordering deadlock. Avoiding lock ordering deadlock starts at design time: ensure that when threads acquire multiple locks, they do so in a consistent order. The best way to do this is by using open calls throughout your program. This greatly reduces the number of places where multiple locks are held at once, and makes it more obvious where those places are.