Java Concurrency in Practice Notes: Building Blocks (Part 2)

Building Blocks

Building an efficient, scalable result cache

Nearly every server application uses some form of caching. Reusing the results of a previous computation can reduce latency and increase throughput, at the cost of some additional memory usage.

Like many other frequently reinvented wheels, caching often looks simpler than it is. A naive cache implementation is likely to turn a performance bottleneck into a scalability bottleneck, even if it does improve single-threaded performance.

First attempt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class Memoizer1 <A, V> implements Computable<A, V> {
@GuardedBy("this") private final Map<A, V> cache = new HashMap<A, V>();
private final Computable<A, V> c;
public Memoizer1(Computable<A, V> c) {
this.c = c;
}
public synchronized V compute(A arg) throws InterruptedException {
V result = cache.get(arg);
if (result == null) {
result = c.compute(arg);
cache.put(arg, result);
}
return result;
}
}
interface Computable <A, V> {
V compute(A arg) throws InterruptedException;
}
class ExpensiveFunction
implements Computable<String, BigInteger> {
public BigInteger compute(String arg) {
// after deep thought...
return new BigInteger(arg);
}
}

Memoizer1 uses a HashMap to store the results of previous computations. The compute method first checks whether the desired result is already cached, and returns the precomputed value if it is.

HashMap is not thread-safe, so to ensure that two threads do not access the HashMap at the same time, Memoizer1 takes the conservative approach of synchronizing the entire compute method. This ensures thread safety but has an obvious scalability problem: only one thread at a time can execute compute at all. See:

Cache Attempt #1

Second attempt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class Memoizer2 <A, V> implements Computable<A, V> {
private final Map<A, V> cache = new ConcurrentHashMap<A, V>();
private final Computable<A, V> c;
public Memoizer2(Computable<A, V> c) {
this.c = c;
}
public V compute(A arg) throws InterruptedException {
V result = cache.get(arg);
if (result == null) {
result = c.compute(arg);
cache.put(arg, result);
}
return result;
}
}

Memoizer2 improves on the awful concurrent behavior of Memoizer1 by replacing the HashMap with ConcurrentHashMap. Since ConcurrentHashMap is thread-safe, there is no need to synchronize when accessing the backing Map, thus eliminating the serialization induced by synchronizing compute in Memoizer1.

Memoizer2 certainly has better concurrent behavior than Memoizer1: multiple threads can actually use it concurrently. But it still has some defects as a cache - there is a window of vulnerbility in which two threads calling compute at the same time could end up computing the same value. See:

Cache Attempt #2

Third attempt

The problem of Memoizer2 is that if one thread starts an expensive computation, other threads are not aware that the computation is in progress and so may start the same computation. We’d like to somehow represent the notion that “thread X is currently computing f(27)“, so that if another thread arrives looking for f(27), it knows that the most efficient way to find it is to head over to Thread X’s house, hang out there until X is finished.

We’ve already seen a class that does almost exactly this: FutureTask. FutureTask represents a computational process that may or may not already have computed. FutureTask.get returns the result of the computation immediately if it is available; otherwise it blocks until the result has been computed and then returns it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class Memoizer2 <A, V> implements Computable<A, V> {
private final Map<A, V> cache = new ConcurrentHashMap<A, V>();
private final Computable<A, V> c;
public Memoizer2(Computable<A, V> c) {
this.c = c;
}
public V compute(A arg) throws InterruptedException {
V result = cache.get(arg);
if (result == null) {
result = c.compute(arg);
cache.put(arg, result);
}
return result;
}
}

Memoizer3 redefines the backing Map for the value cache as a ConcurrentHashMap<A, Future<V>>, instead of a ConcurrentHashMap<A, V>. Memoizer3 first checks to see if the appropriate calculation has been started. If not, it creates a FutureTask, registers it in the Map, and starts the computation; otherwise it waits for the result of the existing computation. The result might be available immediately or might be in the process of being computed - but this is transparent to the caller of Future.get.

Final attempt

Memoizer3 implementation is almost perfect: it exhibits very good concurrency, the result is returned efficiently if it is already known, and if the computation is in progress by another thread, newly arriving threads wait patiently for the result.

Memoizer3 has only one defect - there is still a small window of vulnerability in which two threads might compute the same value. This window is far smaller than in Memoizer, but because the if block in compute is still a nonatomic check-then-act sequence, it is possible for two threads to call compute with the same value at roughly the same time, both see that the cache does not contain the desired value, and both start the computation.

Cache Attempt #3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public class Memoizer <A, V> implements Computable<A, V> {
private final ConcurrentMap<A, Future<V>> cache
= new ConcurrentHashMap<A, Future<V>>();
private final Computable<A, V> c;
public Memoizer(Computable<A, V> c) {
this.c = c;
}
public V compute(final A arg) throws InterruptedException {
while (true) {
Future<V> f = cache.get(arg);
if (f == null) {
Callable<V> eval = new Callable<V>() {
public V call() throws InterruptedException {
return c.compute(arg);
}
};
FutureTask<V> ft = new FutureTask<V>(eval);
f = cache.putIfAbsent(arg, ft);
if (f == null) {
f = ft;
ft.run();
}
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f);
} catch (ExecutionException e) {
throw LaunderThrowable.launderThrowable(e.getCause());
}
}
}
}

Memoizer takes advantage of the atomic putIfAbsent method of ConcurrentMap, closing the window of vulnerbility in Memoizer3.

Caching a Future instead of a value creates the possibility of cache pollution: if a computation is cancelled or failed, future attempts to compute the result will also indicate cancellation or failure. To avoid this, Memoizer removes the Future from the cache if it detects that the computation was cancelled; it might also be desirable to remove the Future upon detecting a RuntimeException if the computation might succeed on a future attempt.

Memoizer also does not address cache expiration, but this could be accomplished by using a subclass of FutureTask that associates an expiration time with each result and periodically scanning the cache for expired entries.

Part 1 Summary

  • It’s the mutable state, stupid

All concurrency issues boil down to coordinating access to mutable state. The less mutable state, the easier it is to ensure thread safety.

  • Make fields final unless they need to be mutable

  • Immutalble objects are automatically thread-safe

Immutable objects simplify concurrent programming tremendously. They are simpler and safer, and can be shared freely without locking or defensive copying.

  • Encapsulation makes it practical to manage the complexity

You could write a thread-safe program with all data stored in global variables, but why would you want to? Encapsulating data within objects makes it easier to preserve their invariants; encapsulating synchronization within objects makes it easier to comply with their synchronization policy.

  • Guard each mutable variable with a lock

  • Guard all variables in an invariant with the same lock

  • Hold locks for the duration of compound actions

  • A program that accesses a mutable variable from multiple threads without synchronization is a broken program

  • Don’t rely on clever reasoning about why you don’t need to synchronize

  • Include thread safety in the design process - or explicitly document that your class is not thread-safe

  • Document your synchronization policy

(To Be Continued)