Sunday, March 13, 2011

Transactions 101

When I was at QCon the other day I was asked a number of questions around transactions that made me realise that I really need to take my JavaOne presentation and turn it into some blog entries as I promised last year. So over the next indeterminate amount of time (hey, I'm busy with things like JUDCon too), we'll take a tour through transactions and try to dispel some of the myths that surround them.

So let's start here with some of the basics. And you don't get more basic than defining what we mean by transaction. Put simply, a transaction provides an “all-or-nothing” (atomic) property to work that is conducted within its scope, whilst at the same time ensuring that shared resources are isolated from concurrent users. Importantly application programmers typically only have to start and end a transaction; all of the complex work necessary to provide the transaction’s properties is hidden by the transaction system, leaving the programmer free to concentrate on the more functional aspects of the application at hand.

Let’s take a look at just how a transaction system could help in a real-world application environment. Consider the case of an on-line cinema reservation system. The cinema has many seats that can be reserved individually, and the state of a seat is either RESERVED or UNRESERVED. The cinema service exports two operations, reserveSeat and unreserveSeat (we’ll ignore the other operations that are obviously required to make this service truly usable). Finally we’ll assume that there is a transaction manager service that will be used to manage any transactions that the cinema may require in order to process the user’s requests.

Let’s consider a very simple example: imagine that Mr. Doe wants to reserve a block of seats for his family (1A, 1B and 1C). Now, the service only allows a single seat to be reserved through the reserveSeat operation, so this will require Mr. Doe to call it 3 times, once for each seat. Unfortunately the reservation process may be affected by failures of software or hardware that could affect the overall consistency of the system in a number of ways. For example, if a failure occurs after reserving 1A, then obviously none of the other seats will have been reserved. Mr. Doe can try to complete the reservation when (assuming) the cinema service eventually recovers, but by this time someone else may have reserved the seats.

What Mr. Doe really wants is the ability to reserve multiple seats as an atomic (indivisible) block. This means that despite failures and concurrent access, either all of the seats Mr. Doe requires will be reserved for him, or none will. At first glance this may seem like a fairly straightforward thing to achieve, but it actually requires a lot of effort to ensure that these requirements can be guaranteed. Fortunately atomic transactions possess the following (ACID) properties that make them suitable for this kind of scenario:

• Atomicity: The transaction completes successfully (commits) or if it fails (aborts) all of its effects are undone (rolled back).
• Consistency: Transactions produce consistent results and preserve application specific invariants.
• Isolation: Intermediate states produced while a transaction is executing are not visible to others. Furthermore transactions appear to execute serially, even if they are actually executed concurrently.
• Durability: The effects of a committed transaction are never lost (except by a catastrophic failure).

A transaction can be terminated in two ways: committed or aborted (rolled back). When a transaction is committed, all changes made within it are made durable (forced on to stable storage, e.g., disk). When a transaction is aborted, all of the changes are undone. Atomic transactions can also be nested, and in which case the effects of a nested action are provisional upon the commit/abort of the outermost (top-level) atomic transaction.

Associated with every transaction is a coordinator, which is responsible for governing the outcome of the transaction. The coordinator may be implemented as a separate service or may be co-located with the user for improved performance. It communicates with enlisted participants to inform them of the desired termination requirements, i.e., whether they should accept (commit) or reject (rollback) the work done within the scope of the given transaction. For example, whether to purchase the (provisionally reserved) flight tickets for the user or to release them. A transaction manager factory is typically responsible for managing coordinators for many transactions. The initiator of the transaction (e.g., the client) communicates with a transaction manager and asks it to start a new transaction and associate a coordinator with the transaction.

Traditional transaction systems use a two-phase protocol to achieve atomicity between participants, (a three-phase protocol may also be supported, but it rarely is these days): during the first (preparation) phase, an individual participant must make durable any state changes that occurred during the scope of the transaction, such that these changes can either be rolled back or committed later once the transaction outcome has been determined. Assuming no failures occurred during the first phase, in the second (commitment) phase participants may “overwrite” the original state with the state made durable during the first phase.

In order to guarantee consensus, two-phase commit is necessarily a blocking protocol: after returning the first phase response, each participant who returned a commit response must remain blocked until it has received the coordinator’s phase 2 message. Until they receive this message, any resources used by the participant are unavailable for use by other transactions, since to do so may result in non-ACID behavior. If the coordinator fails before delivery of the second phase message these resources remain blocked until it recovers.

As we’ve mentioned, transactions are required to provide fault tolerance. What this means is that information about running transactions (often referred to as in-flight transactions) and the participants involved must survive failures and be accessible during recovery. This information (the transaction log) is held in some durable state-store. Typically the transaction log is scanned to determine whether there are transactions mentioned in it that require recovery to be performed. If there are, then the information within the log is used to recreate the transaction and the recovery subsystem will then continue to complete the transaction.

Failures aren’t restricted to just the transaction coordinator. Therefore, participants must retain sufficient information in durable store so that they too can be recovered in the event of a failure. What information is recorded will obviously depend upon the participant implementation.

OK, so that's ACID in a nutshell. Next time we'll move on to take a look at the protocol that may run either side of two-phase commit: synchronizations.

No comments: