Book Reading: Java Concurrency in Practice, Chapter 6

Executing tasks in threads

The first step in organizing a program around task execution is identifying sensible task boundaries. Ideally, tasks are independent activities: work that doesn’t depend on the state, result, or side effects of other tasks. Independence facilitates concurrency, as independent tasks can be executed in parallel if there are adequate processing resources.

Server applications should exhibit both good throughput and good responsiveness under normal load. Further, applications should exhibit graceful degradation as they become overload, rather than simply falling over under heavy load. Choosing good task boundaries, coupled with a sensible task execution policy, can help achieve these goals.

Most server applications offer a natrual choice of task boundary, individual client requests. Using individual requests as task boundaries usually offers both independence and approriate task sizing.

There are number of possible policies for scheduling tasks within an application, some of which explit the potential for concurrency better than others. The simplest is to execute tasks sequentially in a single thread. But processing a web request involves a mixed of computation and I/O. The server must perform socket I/O to read the request and write the response, which can block due to network congestion or connectivity problems. It may also perform file I/O or make database requests, which can also block. In a single-threaded server, blocking not only delays completing the current request, but prevents pending requests from being processed at all.

A more responsive approach is to create a new thread for servicing each request, in this way, for each connection, the main loop creates a new thread to process the request instead of processing it within the main thread. This has some main consequences:

Task processing is offloaded from the main thread, enabling the main loop to resume waiting for the next incoming connection more quickly. This enables new connections to be accepted before previous requests complete, improving responsiveness.
Tasks can be processed in parallel, enabling multiple requests to be serviced simultaneously. This may improve throughput if there are multiple processors, or if tasks need to block for any reason such as I/O completion, lock acquisition, or resource availability.

For production use, however, the thread-per-task approach has some practical drawbacks, especially when a large number of threads may be created:

Thread lifecycle overhead. Thread creation and teardown are not free. The actual overhead varies across platform, but thread creation takes time, introducing latency into request processing, and requires some processing activity by the JVM and OS, If requests are frequent and lightweight, as in most server applications, creating a new thread for each request can consume significant computing resources.

Resource consumption. Active threads consume system resources, especially memory. When there are more runnable threads than available processors, threads sit idle. Having many idel threads can tie up a lot of memory, putting pressure on the garbage collector, and having many threads competing for the CPUs can impose other performance cost as well.

Stability. There is a limit on how many threads can be created. The limit varies by platform and is affected by factors including JVM invocation parameters, the requestd stack size in the Thread constructor, and limits on threads placed by the underlying operation system.

Up to a certain point, more threads can improve throughput, but beyond that point creating more threads just slows down your application, and creating one thread too many can cause your entire application to crash horribly.

The `Executor` framework

In Chapter 5, we saw how to use bounded queues to prevent an overloaded appplication from running out of memory. Thread pools offer the same benefit for thread mangedment, and java.util.concurrent provides a flexible thread pool implementation as part of the Executor framework.

Executor provides a standard means of decoupling task submission from task exectution. It is based on the producer-consumer pattern, where activities that submit tasks are the producers and the threads that execute tasks are the consumers. Using a Executor is usually the easiest path to implementing a producer-consumer design in your application.

The value of decoupling submission from execution is that it lets you easiliy specify, and subsequently change without great difficulty the execution policy for a given class of tasks. An execution policy specifies the “what, where, when, and how” of task exectution, including:

In what thread will tasks be executed?
In what order should tasks be executed (FIFO, LIFO, priority order)?
How many tasks may execute concurrently?
How many tasks may be queued pending execution?
If a task has to be rejected because the system is overloaded, which task should be selected as the victim, and how should the application be notified?
What actions should be taken before or after executing a task?

Execution policies are a resource management tool, and the optimal policy depends on the available computing resources and your quality-of-service requiremnts. Separating the specification of execution policy from task submission makes it practical to select an execution policy at depolyment time that is matched to the available hardware.

A thread pool, as its name suggests, manages a homogeneous pool of worker threads. A thread pool is tightly bound to a work queue holding tasks waiting to be executed.

Executing tasks in pool threads has a number of advantages over the thread-per-approach, Reusing an existing thread instead of creating a new one amortizes thread creation and teardown costs over multiple requests. As an added bonus, since the worker thread often already exists at the time the request arrives, the latency associated with thread creation does not delay task execution, thus improve responsiveness. By properly tuning the size of the thread pool, you can have enough threads to keep the processors busy while not having so many that your application runs out of memory or thrashes due to competition among threads for resources.

The class library provides a flexible thread pool implementation along with some useful predefined configurations:

newFixedThreadPool. A fixed-size thread pool creates threads as tasks are submitted, up to the maximum pool size, and then attempts to keep the pol size constant.

newCachedThreadPool. A cached thread pool has more flexibility to reap idle threads when the current size of the pool exceeds the demand for processing, and to add new threads when demand increases, but places no bounds on the size of the pool.

newSingleThreadExecutor. A single-threaded executor creates a single worker thread to process tasks, replacing it if it dies unexpectedly. Tasks are guaranteed to be processed sequentially according to the order imposed by the task queue (FIFO, LIFO, priority order).

newScheduledThreadPool. A fixed-size thread pool that supports delayed and periodic task execution, similar to Timer.

Since the JVM can’t exit until all the (nondaemon) threads have terminated, failling to shut down an Executor could prevent the JVM from exiting. In shutting down an application, there is a spectrum from graceful shutdown to abrupt shutdown, and various points in between. Since Executors provide a service to applications, they should be able to able to shut down as well, both gracefully and abruptly, and feed back information to the application about the status of tasks that were affected by the shutdown.

To address the issue of execution service lifecycle, the ExecutorService interface extends Executor, adding a number of methods for lifecycle management:

public interface ExecutorService extends Executor {
    void shutdown();
    List<Runnable> shutdownNow();
    boolean isShutdown();
    boolean isTerminated();
    boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException;
    // ...
}

The lifecycle implied by ExecutorService has three states–running, shutting down, and terminated. The shutdown method initiates a graceful shutdown: no new tasks are accepted but previously submitted tasks are allowed to complete–including those that have not begun execution. The shutdownNow method initiates an abrupt shutdown: it attempts to cancel outstanding tasks and does not start any tasks that are queued but not begun. Once all tasks have completed, the ExecutorService transitions to the terminated state. You can wait for an ExecutorService to reach the terminated state with awaitTermination, or poll for whether it has yet terminated with isTerminated.

The Timer facility manages the execution of deferred and periodic tasks. However, Timer has some drawbacks, and ScheduledThreadPoolExecutor should be thought of as its replacement.

A Timer creates only a single thread for executing timer tasks. If a timer task takes too long to run, the timing accuracy of other TimerTasks can suffer. Another problem with Timer is that it behaves poorly if a TimerTask throws an unchecked exception. The Timer thread doesn’t catch the exception, so an unchecked exception thrown from a TimerTask terminates the timer thread. Timer also doesn’t resurrect the thread in this situation; instead, it erroneously assumes the entire Timer was cancelled. In this case, TimerTasks that are already scheduled but not yet executed are never run, and new tasks cannot be scheduled.

If you need to build your own scheduling service, you may still be able to take advantage of the library by using a DelayQueue, a BlockingQueue implementation that provides the scheduling functionality of ScheduledThreadPoolExecutor. A DelayQueue manages a collection of Delayed objects. A Delayed has a delay time associate with it: DelayQueue lets you take an element only if its delay has expired. Objects are returned from a DelayQueue ordered by the time associated with their delay.

Finding exploitable parallelism

In most server application, there is an obvious task boundary: a single client request. But sometimes good task boundaries are not quite so obvious. In this ection we develop several versions of a component that admit varying degree of concurrency. Our sample component is the page-rendering portion of a browser application, which takes a page of HTML and renders it into an image buffer.

The simplest approach is to process the HTML document sequentially, but is likely to annoy the user, who may have to wait a long time before all the text is rendered.

A less annoying but still sequential approach involves rendering the text elements first, leaving rectangular placeholders for the images, and after completing the initial pass on the document, going back and downloading the images and drawing them into the associated placeholder.

Downloading an image mostly involves waiting for I/O to complete, and during this time the CPU does little work. So the sequential approach may under-utilize the CPU, and also makes the user wait longer than necessary to see the finished page. We can achieve better utilization and responsiveness by breaking the problem into independent tasks that can execute concurrently.

The Executor framework uses Runnable as its basic task representation. Runnable is a fairly limiting abstraction, and has following differences Callable:

Callable can return a value but Runnable can’t.
Callable can throw checked exception so the caller method can catch and handle; while the Runnable can’t throw and all exceptions need to be handled in run method.
Executor.execute method only takes Runnable and Executor.submit takes both Runnable and Callable, a Future is returned by calling submit, but the Runnable one won’t contain any value.

The lifecycle of a task executed by an Executor has four phases: created, submitted, started, and completed. Since tasks can take a long time to run, we also want to be able to cancel a task. In the Executor framework, tasks that have been submitted but not yet started can always be cancelled, and tasks that have started can sometimes be cancelled if they are responsive to interruption.

There are several ways to create a Future to describe a task. The submit methods in ExecutorService all return a Future, so that you can submit a Runnable or a Callable to an executor and get back a Future that can be used to retrieve the result or cancel the task.

Submitting a Runnable or Callable to an Executor consititutes a safe publicatoin of the Runnable or Callable from the submitting thread to the thread that will eventually execute the task. Similarly, setting the result value for a Future consititues a safe publication of the result from the thread in which it was computed to any thread that retrieves it via get.

As a first step towards making the page renderer more concurrent, let’s divide it into two tasks, one that renders the text and one that downloads all the images. So we can create a Callable to download all the images, and submit it to an ExecutorService. This returns a Future describing the task’s execution; when the main task gets to the point where it needs the images, iw waits for the result by calling Future.get. This allows the text to be rendered concurrently with downloading the image data. But there is no need for user to wait for all the images to be downloaded; they would probably prefer to see individual images drawn as they become available.

Parallelizing sequential heterogeneous tasks can significant improving performance, however, assigning a different type of task to each worker does not scale well. Without finding finer-grained parallelism among similar tasks, this approach will yield diminishing returns. A further problem with dividing heterogeneous tasks among multiple workers is that the tasks may have disparate sizes. Thus, trying to increase concurrency by paralleizing heterogeneous activities can be a lot of work, and there is a limit to how much additional concurrency you can get out of it.

If you have a batch of computations to submit to an Executor and you want to retrieve their results as the become available, you could retain the Future associated with each task and repeatedly poll for completion by calling get with a timeout of zero. This is possible, but tedious. Fortunately there is a better way: a completion service.

CompletionService combines the functionality of an Executor and a BlockingQueue, You can submit Callable tasks to it for execution and use the queue like methods take and poll to retrieve completed results, packaged as Futures, as they become available. Bascially, individual can think of Executor as something like input queue + thread pool and CompletionService as input queue + thread pool + output queue.

And we can use CompletionService to shorter total runtime and improve responsiveness for the page render. We can create a separate task for downloading each image and execute them in a thread pool, turning the sequential download into a parallel one: this reduces the amount of time to download all the images. And by fetching results from the CompletionService and rendering each image as soon as it is available, we can give the user a more dynamic and responsive user interface.

Sometimes, if an activity does not complete within a certain amount of time, the results is no longer needed and the activity can be abandoned. The timed version of Future.get supports this requirement: it returns as soon as the result is ready, but throws TimeoutException if the result is not ready within the timeout period.

When timed tasks run out of time, we may want to stop them so the do not waste computing resources by continuing to compute a result that will not be used. Again, Future can help; if a timed get completes with a TimedoutException, you can cancel the task through the Future. If the task is written to be cancellable, it can be terminated early so as not to consume excessive resources.

Executing tasks in threads

The Executor framework

Finding exploitable parallelism

The `Executor` framework