Java Concurrency in Practice Notes: Introduction, Thread Safety

(This notes is from reading Brian Goetz‘s Java Concurrency in Practice)

Preface

Multicore processors are just becoming inexpensice enough for midrange desktop systems, and many development teams are noticing more and more threading-releated bug reports in their projects.

Testing and debugging multithreaded programs can be extremely difficult because concurrency bugs do not manifest themselves predictably. And when they surface, it is often at the worst possible time - in production, under heavy load.

This book’s goal is to give readers a set of design rules and mental models that make it easier - and more fun - to build correct, performant concurrent classes and applications in java.

Chapter 1. Introduction

Why bother with concurrency? Threads are an inescapable feature of the Java language, and they can simplify the development of complex systems by turning complicated asynchronous code into simpler straight-line code.

Finding the right balance of sequentiality and asynchrony is often as characteristic of efficient people - and the same is true of programs.

Benefits of threads:

  • Exploiting multiple processors
  • Simplicity of modeling
  • Simplified handling of asynchronous events
  • More responsive user interfaces

Risks of threads:

  • Saftey hazards
  • Liveness hazards
  • Performance hazards

Threads are everywhere:

Even if your program never explicitly creates a thread, frameworks may create threads on your behalf, and code called from these threads must be thread-safe.

It would be nice to believe that concurrency is an “optional” or “advanced” language feature, but the reality is that nearly all Java applications are multithreaded and these frameworks do not insulate you from the need to properly corordinate access to application state.

Chapter 2. Thread Safety

Concurrent programming isn’t so much about threads of locks, any more than civil engineering is about rivets and I-beams - These are just mechanisms - means to an end.

Writeing thread-safe code is, at its core, about managing access to state, and in particular to shared, mutable state.

Whenever more than one thread accesses a given state variable, and one of them might write to it, they all must coordinate their access to it using syncrhonization.

The primary mechanism for syncrhonization in Java is synchronized, which provides exclusive locking, but the term “synchronization” also includs the use of volatile variables, explicit locks, and atomic variables.

If multiple threads access the same mutable state variable without appropriate synchronization, your program is broken. There are three ways to fix it:

  • Don’t share the state variable across threads
  • Make the state variable immutable
  • Use synchronization whenever accessing the state variable

It is far easier to design a class to be thread-safe than to retrofit it for thread safety later

What is thread safety?

At the heart of any reasonable definition of thread safety is the concept of correctness. Correctness means that a class conforms to its specification. A good specification defines invariants constraining an object’s state and postconditions describing the effects of its operations.

A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of calling code.

Thread-safe classes encapsulate any needed syncrhonization so that clients need not provide their own

Ans stateless objects are always thread-safe.

Atomicity

Race condition: The possibility of incorrect results in the presence of unlucky timing. It happens when the correctness of a computation depends on the relative timing or interleaving of multiple threads by the runtime.

It is the invalidation of observations that characterizes most race conditions - using a potentially stale observation to make a decision or perform a computation: check-then-act

Example: race conditions in lazy initialization

1
2
3
4
5
6
7
8
9
10
11
@NotThreadSafe
public class LazyInitRace {
private ExpensiveObject instance = null;
public ExpensiveObject getInstance() {
if (instance == null) { // Check-then-act
instance = new ExpensiveObject();
}
return instance;
}
}

To avoid race conditions, there must be a way to prevent other threads from using a variable while we’re in the middle of modifying it, so we can ensure that other threads can observe or modify the state only before we start or after we finish, but not in the middle.

Operation A and B are atomic with respect to each other if, from the perspective of a thread executing A, when another thread executes B, either all of B has executed or none of it has. An atomic operation is one that is atomic with respect to all operations, including itself, that operate on the same state.

1
2
3
4
5
6
7
8
9
10
11
12
@ThreadSafe
public class CountingFactorizer implements Servlet {
private final AtomicLong count = new AtomicLong(0);
public long getCount() { return count.get(); }
public void service(Servlet request, ServletResponse response) {
//...
count.incrementAndGet();
//...
}
}

When practical, use existing thread-safe objects, like AtomicLong, to manage your class’s state. It is simpler to reason about the possible states and state transitions for existing thread-safe objects than it is for arbitrary state variables, and this makes it easier to maintain and verify thread safety.

Locking

What if we want to add more than one state to our servlet? Imagine that we want to improve the performance of our servlet by caching the most recently computed result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@NotThreadSafe
public class UnsafeCachingFactorizer extends GenericServlet implements Servlet {
private final AtomicReference<BigInteger> lastNumber
= new AtomicReference<BigInteger>();
private final AtomicReference<BigInteger[]> lastFactors
= new AtomicReference<BigInteger[]>();
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
if (i.equals(lastNumber.get()))
encodeIntoResponse(resp, lastFactors.get());
else {
BigInteger[] factors = factor(i);
lastNumber.set(i);
lastFactors.set(factors);
encodeIntoResponse(resp, factors);
}
}
}

The above approach does not work, although both atomic references are individually thread-safe.

To preserve state consistency, update related state variables in a single atomic operation.

Intrinsic locks

Java provides a built-in locking mechanism for enforcing atomicity: the synchronized block:

1
2
3
syncrhonized(lock) {
// Access or modify shared state
}

Every Java object can implicitly act as a lock for purposes of synchronization; these built-in locks are called intrinsic locks or monitor locks.

Which means we could do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@ThreadSafe
public class SynchronizedFactorizer extends GenericServlet implements Servlet {
@GuardedBy("this") private BigInteger lastNumber;
@GuardedBy("this") private BigInteger[] lastFactors;
public synchronized void service(ServletRequest req,
ServletResponse resp) {
BigInteger i = extractFromRequest(req);
if (i.equals(lastNumber))
encodeIntoResponse(resp, lastFactors);
else {
BigInteger[] factors = factor(i);
lastNumber = i;
lastFactors = factors;
encodeIntoResponse(resp, factors);
}
}
}

But that will lead unacceptably poor concurrency.

Reentrancy

When a thread requests a lock that is already held by another thread, the requesting thread blocks. But because intrinsic locks are reentrant, if a thread tries to acquire a lock that it already holds, the request succeeds.

Reentrancy means that locks are acquired on a per-thread rather thaan per-invocation basis.

Guarding state with locks

For each mutable state variable that may be accessed by more than one thread, all accesses to that variable must be performed with the same lock field. In this case, we say that the variable is guarded by that lock.

The fact that every object has a built-in lock is just a convenience so that you needn’t explicitly create lock objects. It is up to you to construct locking protocols or syncrhonization policies that let you access shared state safely.

Every shared, mutable variable should be guarded by exactly one lock. Make it clear to maintainers which lock that is.

When a calass has invariants that involve more than one state vairable, there is an additonal requirement: each variable participating in the invariant must be guarded by the same lock. This allows you to access or update them in a single atomic operation, preserving the invariant.

Liveness and performance

Fortunately, it is easy to improve the concurrency of the servlet while maintaining thread safety by narroing the scope of the syncrhonized block. You should be careful not to make the scope of the synchronized block too small; you would not want to divide an operation that should be atomic into more than one synchronized block. But it is reasonable to try to excluede from synchronized blocks long-running operations that do not affect shared state, so that other threads are not prevented from accessing the shared state while the long-running operation is in progress.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
@ThreadSafe
public class CachedFactorizer extends GenericServlet implements Servlet {
@GuardedBy("this") private BigInteger lastNumber;
@GuardedBy("this") private BigInteger[] lastFactors;
@GuardedBy("this") private long hits;
@GuardedBy("this") private long cacheHits;
public synchronized long getHits() {
return hits;
}
public synchronized double getCacheHitRatio() {
return (double) cacheHits / (double) hits;
}
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
BigInteger[] factors = null;
synchronized (this) {
++hits;
if (i.equals(lastNumber)) {
++cacheHits;
factors = lastFactors.clone();
}
}
if (factors == null) {
factors = factor(i);
synchronized (this) {
lastNumber = i;
lastFactors = factors.clone();
}
}
encodeIntoResponse(resp, factors);
}
}

It would be safe to use AtomicLong here, but there is less benefit than there was in CountingFactorizer. Atomic varaiables are useful for effecting atomic operations on a single variable, but since we are already using synchronized blocks to construct atomic operations, using two different synchronization mechanismes would be confusing and would offer no performance or safety benefit.

There is frequently a tension between simplicity and performance. When implementing a synchronization policy, resist the temptation to prematurely sacrifice simplicity (potentially compromising safety) for the sake of performance.

Avoid holding locks during length computations or operations at rick of not completing quickly such as network or console I/O.

(To Be Continued)