Tuesday, March 8, 2011

Must Go Faster

Transaction management systems, like much other middleware, embody a very simple tradeoff: they make life easier for developers at the cost of a runtime overhead. Maximising the ease of use and minimising the overhead are things we spend a fair bit of time thinking about.

Conventional wisdom in the transaction community is that the runtime cost of a classic ACID transaction is dominated by the disk writing needed to create a log for crash recovery purposes. As with much folklore there is some truth in this, but it's not the whole story. For starters, there are some use cases where we don't need to write a log at all.

In transactions that have only one resource it's easy to optimise the 2PC protocol to a 1PC and avoid much of the overhead. But here is the snag: the design of the transaction API in Java EE does not allow for developers to communicate metadata about number of resource managers expected in the transaction ahead of time. In some cases it's not actually known, as it may depend on runtime data values. However, it's disappointing that you still need some parts of the XA protocol overhead even where the transaction is known at design time to be local (native) to a single RM.

xaresource1.start();

xaresource1.end();

xaresource1.commit();


is still three network rounds trips and one disk sync. That's a huge improvement on the eight trips and three syncs needed for a 2PC, but it's nevertheless more than the single trip and sync in the native case with

connection.commit();

Fortunately there is a workaround: define both XA and non-XA ('local-tx' in JBossAS -ds.xml terminology) datasources in the app server and use the local one wherever you know the transaction won't involve other RMs.

Perhaps one day we'll have a less clunky solution - maybe being able to define a single datasource as supporting both XA and non-XA cases, then annotating transactional methods with e.g. @OneResource or @MultiResource to tell the JCA and TM how to manage the connection. Or even being able to escalate an RM local tx to an XA one on demand rather than having to chose in advance, although that would need RM support as well as changes to the XA protocol and Java APIs. Dream On.

Even where it's running with 1PC optimization for a single RM, the transaction manager still provides benefits over the native connection based transaction management. The most critical is the ability to handle certain lifecycle events, in particular beforeCompletion(), a notification phase that allows in-memory caches such as an ORM session to be flushed to stable store and its companion afterCompletion() which allows for resource cleanup. The TM's ability to manage transaction timeouts is also important to prevent poorly written operations from locking up resources for a protracted period.

As with writing logs for 2PC recovery, the management of timeouts is one of those activities we have to do every time, even though it turns out to be required in only a tiny minority of cases. Efficiently managing the rollback of transactions that have exceeded their allotted lifetime is a seemingly trivial overhead compared to the log write and as a result the code for it received little attention until fairly recently. This is where the folklore came to bite us: conventional wisdom dictated that we should focus the performance tuning effort on the I/O paths and not worry too much about functions that just operated on in-memory structures.

WRONG.

For the reasons outlines above, in a typical app server workload there are an awful lot of transactions containing just a single resource and hence not doing a log write. D'Oh. For those use cases the overhead of the in-memory activity in the TM is actually significant. So we sat down, wrote a highly concurrent 1PC microbenchmark test scenario, put it though a profiler, shuddered and went down the pub.

When we'd recovered a bit we tuned the transaction reaper, a background process responsible for timing out transactions. By deferring much of the work as long as possible it turns out to be possible to skip it entirely for many transactions. A short lived tx does not always need to be inserted into the time ordered reaper queue - it may terminate normally long before the reaper needs to take any action. By being lazy we saved a lot of list sorting and the associated locking overhead.

As a result the recent JBossTS releases substantially outperform their predecessors in the single resource case, particularly when scaling to a large number of threads. Upgrade and Enjoy.

Next time: More Speed! Very fast I/O for 2PC logging.

No comments: