Friday, December 23, 2011

Transactions making a comeback? They were never away!

Over the years that I've been involved with transaction processing theory and practice (way too many years for me to admit!) I've seen transactions used, abused and ignored in a wide variety of applications and situations. I've discussed many of these scars in articles here and elsewhere for decades (ouch! that pains me to even use such a timeframe!) But every new software wave, whether it's Web Services, object-orientation, and even the Web itself, has come to the inevitable conclusion that transactions are needed in some way and in one form or another. Maybe not ACID transactions; maybe it's compensation based, or some other variation on the extended transaction theme.

So it is nice to see that being repeated in the past couple of years with Cloud and NoSQL. If you read this blog enough you'll have heard from myself and others on the fact that although some believe that you need to ditch transactions in order to achieve scalability, others aren't quite convinced that it is always necessary to do so - and I count us amongst the latter group. Complete reliance on transactions is sometimes too much. However, completely ignoring them and pushing consistency and recovery up to the application is sometimes too much as well. This is something that appears to be dawning on a number of transaction-less implementations and hopefully it's a trend that will continue.

Anyone who knows me or has followed things I've written or presented over the years will also know that I think transactions are a great structuring technique for concurrent programming. Of course in distributed systems we tend to take this for granted: you've inherently got multiple clients/users manipulating your data, typically at the same time, so transactions with their ACID properties allow developers to concentrate on the functional aspects of the application or service, whilst letting the transaction system do the hard work on ensuring isolation and consistency in the presence of these concurrent users (and possibly failures).

But distribution isn't the only place where you get concurrency, and particularly these days with multi-core processors. Back in the last century (ouch!) some of us in academia and industry, had access to multi-processor machines which weren't as common as you might expect (they were expensive!) But when you had them it quickly became apparent that transactions could help with developing applications that had no distribution in them but were (massively) parallel. From this early work techniques such as software transactional memory were born. Back then it was more of an edge case and people found it hard to understand why you'd need transactions without distribution or even a database involved. Well obviously the advances in hardware have silenced most of those critics and we're seeing more and more vendors, open source projects, languages etc. looking at transactions are a fundamental component and not just an add-on for some scenarios.

So what does all of this mean? Well first of all I think it's great to see transactions continuing to have an impact in these new waves. Fundamental requirements like fault tolerance, concurrency control, security etc. are fundamental for a reason. Secondly I think it's fair to say that as with previous waves, we'll see transaction theory and implementations adapt and change to better address some of the new concerns and requirements that are bound to arise. I'm excited by all of this, as I am whenever there's something new to apply transactions. I'm also excited by the fact that JBossTS (implementation and team) is well placed to help drive some of this as we've done for ... well ... let's just say for quite a long time and leave it at that!

Monday, December 19, 2011

my last commit

Today I created the tag for the JBossTS 4.16.0.Final release, thereby making my last commit to the JBossTS repository as team lead.

It is the end of an era in more ways than one, as 4.16 is planned to be the last feature release for the 4.x line and the last for the JBossTS brand.

Starting in the new year, the project gets a new name, Narayana, a new major version, 5.0, and a new development team lead, Tom Jenkinson. So, exciting times ahead for the project in 2012 and beyond, with fresh blood and fresh ideas.

As for myself, I start 2012 with a shiny new JBoss role looking at Big Data and noSQL. Who knows, there may even be a new blog for you to subscribe to. But first there is the small matter of a seasonal vacation...

Merry Christmas

Jonathan.

Friday, November 11, 2011

HPTS 2011

I've uploaded most of the presentations and posters sessions for HPTS 2011 to the website now. If you interested in transactions, NoSQL, eventual consistency, data provenance and a whole raft of topical subjects, then you really should check them out!

Thursday, November 10, 2011

unnecessary

Hot on the heals of thinking about information quality, I came across this little gem regarding Spring and JPA configuration:

"Using multiple session factories in a logical transaction imposes challenging issues concerning transaction management... We quickly abandoned [JTA transaction management] due to the potential cost and complexity of an XA protocol with two-phase commit. Since most of the modules of an application share the same data source, the imposed cost and complexity are definitively unnecessary."

The authors then go on to describe how they built a custom solution that causes the same database connection to be used by the modules, removing the need for transaction coordination across multiple connections.

"Extending Spring’s transaction management with SessionFactory swapping and the Shared Transaction Resource pattern was a very challenging task."

That last bit should probably have read "challenging and unnecessary task".

A good JCA will automatically track connections enlisted with a JTA transaction and will reuse the already enlisted connection to satisfy a further dataSource.getConnection() request. Further, even if it does enlist multiple connections, the JTA will use isSameRM to detect that they relate to the same resource manager and thus still maintain the one phase commit optimisation. All of these challenging tasks are taken care of for you by the application server.

You probably should not bother to invent a better mousetrap until you've determined that current mousetraps don't catch your mice. The imposed cost and complexity are definitively unnecessary.

musings on peer review

I've been reading up in a new field of study recently and as a result have been thinking about information provenance, reputation and collaboration.

Once upon a time, getting up to speed on a new topic in computing science required finding a good library and wading through a stack of conference proceedings and journals. Now it requires an online search and, critically, the ability to evaluate the quality of the massive quantity of material that comes back.

Formal peer review of publications is gone, unable to scale to online use. Meanwhile online systems for review, comment and reputation management are still in their infancy. The best we have right now is an ad-hoc stew of brands, social graphs and distributed conversations.

Those with enough time to invest can build a mental model of a new field that, after the initial investment in learning the landscape, allows them to maintain an ongoing overview of developments with relatively little effort. Those who's work only tangentially involves a deeply technical topic don't have this luxury. They typically perform searches not to learn about the new field in general, but to get specific solutions to a problem outside of their core field of expertise, after which they move on. Such users vastly outnumber the domain experts for niche topics like transactions.

What implications does this new model of communication and information dissemination have for the behaviour of professed experts in technical fields? Clearly our obligation to ensure that information we provide is accurate remains unchanged, and is in any case in our own best interest. Should we consider there to be an implied professional obligation to publish information only in a context that allows feedback to be attached directly to it e.g. blog with comments enabled? Or even allow collaborative editing e.g. wiki? How about taking time out to correct misinformation where we find it - is that now part of our social contract?

Question for the audience: Where do you get your information on transactions, and how do you assess its quality?

Thursday, October 20, 2011

pen or pencil for writing logs?

Those old enough to remember the days of manned space flight in the western world will recall that NASA expended considerable resources to come up with a pen that would work in space. Meanwhile, half a world away, the Russians just used a bunch of pencils.

I can't help feeling that story is going to have a new counterpart in the history books. Whilst HP are working on memristor based persistent RAM, someone else just grafted a bunch of flash and a super-capacitor onto a regular DIMM instead. Now I just need the linux guys to come up with a nice API...

New RAM shunts data into flash in power cuts

Tuesday, October 18, 2011

nested transactions 101

You wait ages for an interesting technical problem, then you get the same one twice in as many weeks. If you are a programmer, you now write a script so that you don't have to do any manual work the third time you encounter the problem. If you are a good programmer, you just use the script you wrote the previous week.

When applied to technical support questions, this approach results in the incremental creation of a FAQ. My last such update was way back in April, on the topic of handling XA commit race conditions between db updates and JMS messages. Like I said, you wait ages for an interesting problem, but another has finally come along. So, this FAQ update is all about nested transactions.

Mark has posted about nested transaction before, back in 2010 and 2009. They have of course been around even longer than that and JBossTS/ArjunaTS has support for them that actually pre-dates Java - it was ported over from C++. So you'd think people would have gotten the hang of it by now, but not so. Nested transactions are still barely understood and even less used.

Let's deal with the understanding bit first. Many people use the term 'nested transactions' to mean different things. A true nested transaction is used mainly for fault isolation of specific tasks within a wider transaction.

tm.begin();
doStuffWithOuterTransaction();
tm.begin();
try {
doStuffWithInnerTransaction();
tm.commit();
} catch(Exception e) {
handleFailureOfInnerTransaction();
}
doMoreStuffWithOuterTransaction();
tm.commit();

This construct is useful where we have some alternative way to achieve the work done by the inner transaction and can call it from the exception handler. Let's try a concrete example:

tm.begin(); // outer tx
bookTheatreTickets();
tm.begin(); // inner tx
try {
bookWithFavoriteTaxiCompany();
tm.commit(); // inner tx
} catch(Exception e) {
tm.begin(); // inner tx
bookWithRivalTaxiFirm();
tm.commit(); // inner tx
}
bookRestaurantTable();
tm.commit(); // outer tx

So, when everything goes smoothly you have behaviour equivalent to a normal flat transaction. But when there is minor trouble in a non essential part of the process, you can shrug it off and make forward progress without having to start over and risk losing your precious theatre seats.

As it turns out there are a number of reasons this a not widely used.

Firstly, it's not all that common to have a viable alternative method available for the inner update in system level transactions. It's more common for business process type long running transactions, where ACID is frequently less attractive than an extended tx model such as WS-BA anyhow. What about the case where you have no alternative method, don't care if the inner tx fails, but must not commit its work unless the outer transaction succeeds? That's what afterCompletion() is for.

Secondly, but often of greater practical importance, nested transactions are not supported by any of the widely deployed databases, message queuing products or other resource managers. That severely limits what you can do in the inner transaction. You're basically limited to using the TxOJ resource manager bundled with JBossTS, as described in Mark's posts. Give up any thought of updating your database conditionally - it just won't work. JDBC savepoints provide somewhat nested transaction like behaviour for non-XA situations, but they don't work in XA situations. Nor does the XA protocol, foundation of the interoperability between transaction managers and resource managers, provide any alternative. That said, it's theoretically possible to fudge things a bit. Let's look at that example again in XA terms.

tm.begin(); // outer tx
bookTheatreTickets(); // enlist db-A.
tm.begin(); // inner tx
try {
bookWithFavoriteTaxiCompany(); // enlist db-B.
tm.commit(); // inner tx - prepare db-B. Don't commit it though. Don't touch db-A.
} catch(Exception e) {
// oh dear, the prepare on db-B failed. roll it back. Don't rollback db-A though.
tm.begin(); // inner tx
bookWithRivalTaxiFirm(); // enlist db-C
tm.commit(); // inner tx - prepare db-C but don't commit it or touch db-A
}
bookRestaurantTable(); // enlist db-D
tm.commit(); // outer tx - prepare db-A and db-D. Commit db-A, db-C and db-D.

This essentially fakes a nested transaction by manipulating the list of resource managers in a single flat transaction - we cheated a bit by removing db-B part way through, so the tx is not true ACID across all the four participants, only three. JBossTS does not support this, because it's written by purists who think you should use an extended transaction model instead. Also, we don't want to deal with irate users whose database throughput has plummeted because of the length of time that locks are being held on db-B and db-C.

Fortunately, you may not actually need true nested transactions anyhow. There is another sort of nested transaction, properly known as nested top-level, which not only works with pretty much any environment, but is also handy for many common use cases.

The distinction is founded on the asymmetry of the relationship between the outer and inner transactions. For true nested transactions, failure of the inner tx need not impact the outcome of the outer tx, whilst failure of the outer tx will ensure the inner tx rolls back. For nested top-level, the situation is reversed: failure of the outer transaction won't undo the inner tx, but failure of the inner tx may prevent the outer one from committing. Sound familiar? The most widely deployed use case for nested top-level is ensuring that an audit log entry of the processing attempt is made, regardless of the outcome of the business activity.

tm.begin();
doUnauditedStuff();
writeAuditLogForProcessingAttempt();
doSecureBusinessSystemUpdate();
tm.commit();

The ACID properties of the flat tx don't achieve what we want here - the audit log entry must be created regardless of the success or failure of the business system update, whereas we have it being committed only if the business system update also commits. Let's try that again:

tm.begin(); // tx-A
doUnauditedStuff();
Transaction txA = tm.suspend();
tm.begin(); // new top level tx-B
try {
writeAuditLogForProcessingAttempt();
tm.commit(); // tx-B
} catch(Exception e) {
tm.resume(txA);
tm.rollback(); // tx-A
return;
}
tm.resume(txA);
doSecureBusinessSystemUpdate();
tm.commit(); // tx-A

Well, that's a little better - we'll not attempt the business logic processing unless we have first successfully written the audit log, so we're guaranteed to always have a log of any update that does take place. But there is a snag: the audit log will only show the attempt, not the success/failure outcome of it. What if that's not good enough? Let's steal a leaf from the transaction optimization handbook: presumed abort.

tm.begin(); // tx-A
doUnauditedStuff();
Transaction txA = tm.suspend();
tm.begin(); // new top level tx-B
try {
writeAuditLogForProcessingAttempt("attempting update, assume it failed");
tm.commit(); // tx-B
} catch(Exception e) {
tm.resume(txA);
tm.rollback(); // tx-A
return;
}
tm.resume(txA);
doSecureBusinessSystemUpdate();
writeAuditLogForProcessingAttempt("processing attempt completed successfully");
tm.commit(); // tx-A

So now we have an audit log will always show an entry and always show if it succeeded or not. Also, I'll hopefully never have to answer another nested transaction question from scratch. Success all round I'd say.

memristor based logs

Long time readers will recall that I've been tinkering with shiny toys in the form of SSDs, trying to assess how changes in storage technology cause changes in the way transaction logging should be designed. SSDs are here now, getting cheaper all the time and therefore becoming more 'meh' by the minute. So, I need something even newer and shinier to drool over...

Enter memristors, arguably the coolest tech to emerge from HP since the last Total-e-Server release. Initially intended to complete with flash, memristor technology also has the longer term potential to give us persistent RAM. A server with a power-loss tolerant storage mechamism that runs at approximately main memory speed will fundamentally change the way we think about storage hierarchies, process state and fault tolerance.

Until now the on-chip cache hierarchy and off-chip RAM have both come under the heading of 'volatile', whilst disk has been considered persistent, give or take a bit of RAID controller caching.

Volatile storage is managed by the memory subsystem, with the cache hierarchy within that tier largely transparent to the O/S much less the apps. Yes, performance gurus will take issue with that - understanding cache coherency models is vital to getting the best out of multi-core chips and multi-socket servers. But by and large we don't control it directly - MESI is hardcoded in the CPU and we only influence it with simple primitives - memory fencing, thread to core pinning and such.

Persistent storage meanwhile is managed by the file system stack - the O/S block cache, disk drivers, RAID controllers, on-device and on-controller caches etc. As more of it is in software we have a little more control over the cache model, by O_DIRECT, fsync, firmware config tweaking and such. Most critically, we can divide the persistent storage into different pools with different properties. The best known example is the age old configuration suggestion for high performance transaction processing: put the log storage on a dedicated device.

So what will the programming model look like when we have hardware that offers persistent RAM, either for all the main memory or, more likely in the medium term, for some subset of it? Will the entire process state survive a power loss at all cache tiers from the on-CPU registers to the disk platters, or will we need fine grained cache control to say 'synchronously flush from volatile RAM to persistent RAM', much as we currently force a sync to disk? How will we resume execution after a power loss? Will we need to explicitly reattach to our persistent RAM and rebuild our transient data structures from its contents, or will the O/S magically handle it all for us? Do we explicitly serialize data moving between volatile and non-volatile RAM, as we currently do with RAM to disk transfers, or is it automatic as with cache to RAM movements? What does this mean for us in terms of new language constructs, libraries and design patterns?

Many, many interesting questions, the answers to which will dictate the programming landscape for a new generation of middleware and the applications that use it. The shift from HDD to SSD may seem minor in comparison. Will everything we know about the arcane niche of logging and crash recovery become obsolete, or become even more fundamental and mainstream? Job prospects aside, on this occasion I'm leaning rather in favour of obsolete.

Sunday, September 25, 2011

Workshop on the Theory of Transactional Memory

I'm just back from the WTTM in Rome. It was good to go to the workshop and listen to presentations from the likes of Maurice Herlihy on advances in both hardware and software transactional memory. I was there representing the Cloud-TM work that we're doing in collaboration with others and specifically around Infinispan and JBossTS.

One of my favourite presentations of the event was on the use of pessimistic concurrency control in STM. Most STM approaches use an optimistic approach, which obviously works well in cases where contention is limited. The problem with pessimistic is retaining locks for the duration even when updates are infrequent. However, as the presenter showed, with suitable a priori ordering, pessimistic can be more efficient even in low contention environments. This was good because we tend to use pessimistic concurrency control in JBossTS a lot, and as was shown in the presentation, it's also a lot easier for developers to understand and reason about.

Unfortunately I had to leave the workshop before the last session, so I didn't get to listen to Torvald who represents Red Hat on the transactional C++ working group. That's a shame, because I've been following this effort for several years and know that we may be seeing this in gcc sometime soon. And who knows, maybe EE8 will see some STM eventually.

Friday, August 12, 2011

RESTful Transactions in the Cloud: Now they're for real

A few months back Mark posted on the subject of REST, transactions and the cloud. Well now you can test the validity of his claims for yourselves. RedHat recently announced their OpenShift initiative for deploying applications into the cloud and we have put together a demonstration to show how easy it is to perform end to end transactional requests whilst remaining RESTful/RESTlike throughout and using little more than a JAX-RS container, an HTTP client library and a transaction coordinator.

As our starting point we use an implementation of the draft specification for a RESTful interface to 2-Phase-Commit transactions which we refer to as REST-AT.

The demonstration starts out by showing local clients (written in various languages such as javascript and Ruby) and local (JAX-RS) web services interacting transactionally using HTTP for all communications. The transaction coordinator that implements REST-AT is written using JAX-RS and also runs locally.

The push to cloud based deployments demands that providers support piecemeal migration of applications into the cloud infrastructure. REST-AT together with OpenShift makes the realisation of this goal, at least for transactional services, a relatively painless exercise. The demonstration code and accompanying video shows how to migrate the transaction coordinator component leaving clients and services outside of the infrastructure. We achieve this by using the OpenShift JBoss AS7 cartridge and deploy REST-AT as a war into the application server that this cartridge provides. In a later post we will show how to migrate Java and Ruby services into the OpenShift infrastructure using the AS7 and Ruby cartridges, respectively.

To begin using OpenShift a good starting point would be to watch the various getting started videos (look for the Zero to Cloud in 30 minutes track or go directly to the demo video). To start using REST-AT check out the quickstarts and the sources for the demo.

Using JBossTS in OpenShift Express

Hi,

I just wanted to let you know that you can use transactions in Red Hat's new OpenShift Express tool. Having had some hand's on time with the system I can safely say its one of the most exciting new projects I have seen in a while.

It makes deploy applications into the cloud so easy, plus as it supports deploying applications onto AS7 you can benefit from all the cool features that have been added there.

As AS7 features transactions (naturally!) I decided to kick the tyres of Openshift Express a little harder by not just using JPA, EJB, JTA and JSF, but also utilise one of the JBoss Transactions extension features - TXOJ. You can watch my video of this here, and the source code is available here.

Safe to say it was extremely simple to get this up and running, if you follow the video you can too!

For those who don't like videos though, here's a little cheat sheet I put together to get it running:

1. You need an openshift express domain, go to: http://openshift.redhat.com to get one
2. rhc-create-app -a txapp -t jbossas-7.0
3. cd txapp
4. svn export https://anonsvn.jboss.org/repos/labs/labs/jbosstm/trunk/ArjunaJTA/quickstarts/jee_transactional_app
5. mv jee_transactional_app/* .
6. rmdir jee_transactional_app
7. Add an openshift profile to your pom.xml:
   <profile>
<id>openshift</id>
<build>
<plugins>
<plugin>
<artifactId>maven-war-plugin</artifactId>
<configuration>
<outputDirectory>deployments</outputDirectory>
<warName>ROOT</warName>
</configuration>
</plugin>
</plugins>
</build>
</profile>
8. git add src pom.xml
9. git commit -a -m “jee_transactional_app” && git push
10. Open up firefox and go to http://txapp-<YOUR_DOMAIN>.openshift.redhat.com/jee_transactional_app/

I hope you enjoy using Openshift Express and also can add a few Transactional Objects to your applications!

Tom

Friday, July 22, 2011

norecoveryxa

"Infamy! Infamy! They've all got it in for me!"

Of all the error and warning messages in JBossTS, none is more infamous than norecoveryxa. Pretty much every JBossTS user has an error log containing this little gem. Usually more than once. A lot more. But not for much longer. It has incurred my wrath and its days are numbered. I've got it in for that annoying little beast.


[com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa]
Could not find new XAResource to use for recovering non-serializable XAResource


Transaction logs preserve information needed to complete interrupted transactions after a crash. Boiled down to pure essence, a transaction log is a list of the Xids involved in a transaction. An Xid is simply a bunch of bytes, beginning with a unique tx id and ending with a branch id that is (usually) different for each resource manager in the transaction. During recovery, these Xids are used to tell the resource managers to commit the in-doubt transactions. All of which is fine in theory, but an utter pain in practice.

The core problem is that an Xid is not a lot of use unless you can actually reconnect to the resource manager and get an XAResource object from its driver, as it is the methods on that object to which you pass the Xid. So you need some way of storing connection information also. In rare cases the RM's driver provides a Serializable XAResource implementation, in which case the simplest (although not necessarily fastest) thing to do is serialize it into the tx log file. Recovery is then easy - deserialize the XAResource and invoke commit on it. Well, except for the whole multiple classloaders thing, but that's another story. Besides, hardly any drivers are actually so accommodating. So, we need another approach.

To deal with non-serializable XAResources, the recovery system allows registration of plugins, one per resource manager, that provide a new XAResource instance on demand. In olden times configuring these was a manual task, frequently overlooked. As a result the recovery system found itself unable to get a new XAResource instance to handle the Xid from the transaction logs. Hence norecoveryxa. That problem was solved, at least for xa-datasource use within JBossAS, by having the datasource deployment handler automatically register a plugin with the recovery system. Bye-bye AppServerJDBCXARecovery, you will not be missed.

That leaves cases where the resource manager is something other than an XA aware JDBC datasource. To help diagnose such cases it would be helpful to have something more than a hex representation of the Xid. Whilst the standard XAResource API does not allow for such metadata, we now have an extension that provides the transaction manager with RM type, version and JNDI binding information. This gets written to the tx logs and, in the case of the JNDI name, encoded into the Xid.

As a side benefit, this information allows for much more helpful debug messages and hopefully means I spend less time on support cases. Didn't really think I was doing all this just for your benefit did you?

before:

TRACE: XAResourceRecord.topLevelPrepare for < formatId=131076,
gtrid_length=29, bqual_length=28, tx_uid=0:ffff7f000001:ccd6:4e0da81f:2,
node_name=1, branch_uid=0:ffff7f000001:ccd6:4e0da81f:3>


after:
TRACE: XAResourceRecord.topLevelPrepare for < formatId=131076,
gtrid_length=29, bqual_length=28, tx_uid=0:ffff7f000001:ccd6:4e0da81f:2,
node_name=1, branch_uid=0:ffff7f000001:ccd6:4e0da81f:3>,
product: OracleDB/11g, jndiName: java:/jdbc/TestDB >


The main reason we want the RM metadata though is so that when recovery tries and fails to find an XAResource instance to match the Xid, it can now report the details of the RM to which that Xid belongs. This makes it more obvious which RM is unavailable e.g. because it's not registered for recovery or it is temporarily down.


WARN: ARJUNA16037: Could not find new XAResource to use for recovering
non-serializable XAResource XAResourceRecord < resource:null,
txid:< formatId=131076, gtrid_length=29, bqual_length=41,
tx_uid=0:ffff7f000001:e437:4e294bb6:6, node_name=1,
branch_uid=0:ffff7f000001:e437:4e294bb6:7, eis_name=java:/jdbc/DummyDB >,
heuristic: TwoPhaseOutcome.FINISH_OK,
product: DummyProductName/DummyProductVersion,
jndiName: java:/jdbc/DummyDB>


Despite the name change (part of the new i18n framework), that's still our old nemesis norecoveryxa. It's made of stern stuff and, although now a little more useful, is not so easily vanquished entirely.

Even with all the resource managers correctly registered, it's still possible to be unable to find the Xid during a recovery scan. That's because of a timing window in the XA protocol, where a crash occurs after the RM has committed and forgotten the branch but before the TM has disposed of its no longer needed log file. In such cases no RM will claim ownership of the branch during replay, leaving the TM to assume that the owning RM is unavailable for some reason. With the name of the owning RM to hand, we now have the possibility to match against the name of the scanned RMs, conclude the branch belongs to one of the scanned RMs even though it pleaded ignorance, and hence dispose of it as a safely committed branch.

All of which should ensure that norecoveryxa's days in the spotlight are severely numbered.

Friday, July 8, 2011

rethinking log storage

Transaction coordinators like JBossTS must write log files to fault tolerant storage in order to be able to guarantee correct completion of transactions in the event of a crash. Historically this has meant using files hosted on RAIDed hard disks.

The workload is characterised by near 100% writes, reads for recovery only, a large number of small (sub 1 KB) operations and, critically, the need to guarantee that the data has been forced through assorted cache layers to non-volatile media before continuing.

Unfortunately this combination of requirements has a tendency to negate many of the performance optimizations provided in storage systems, making log writing a common bottleneck for high performance transaction systems. Addressing this can often force users into unwelcome architectural decisions, particularly in cloud environments.

So naturally transaction system developers like the JBossTS team spend a lot of time thinking about how to mange log I/O better. Those who have read or watched my JUDCon presentation will already have some hints of the ideas we're currently looking at, including using cluster based in-memory replication instead of disk storage.

More recently I've been playing with SSD based storage some more, following up on earlier work to better understand some of the issues that arise as users transition to a new generation of disk technology with new performance characteristics.

As you'll recall, we recently took the high speed journalling code from HornetQ and used it as the basis for a transaction log. Relatively speaking, the results were a pretty spectacular improvement over our classic logging code. In absolute terms however we still weren't getting the best out the hardware.

A few weeks back I got my grubby paws on one of the latest SSDs. On paper its next generation controller and faster interface should have provided a substantial improvement over its predecessor. Not so for my transaction logging tests though - in keeping with tradition it utterly failed to outperform the older generation of technology. On further investigation the reasons for this become clear.

Traditional journalling solutions are based on a) aggregating a large number of small I/Os into a much smaller number of larger I/Os so that the drive can keep up with the load and b) serializing these writes to a single append-only file in order to avoid expensive head seeks.

With SSDs the first of those ideas is still sound, but the number of I/O events the drive can deal with is substantially higher. This requires re-tuning the journal parameters. For some usage it even becomes undesirable to batch the I/Os - until the drive is saturated with requests it's just pointless overhead that delays threads unnecessarily. As those threads (transactions) may have locks on other data any additional latency is undesirable. Also, unlike some other systems, a transaction manager has one thread per tx, as it's the one executing the business logic. It can't proceed until the log write completes, so write batching solutions involve significant thread management and scheduling overhead and often have a large number of threads parked waiting on I/O.

There is another snag though: even where the journalling code can use async I/O to dispatch multiple writes to the O/S in parallel, the filesystem still runs them sequentially as they contend on the inode semaphore for the log file. Thus writes for unrelated transactions must wait unnecessarily, much like the situation that arises in business applications which uses too coarse-grained locking. Also, the nice ncq for the drive remains largely unused, limiting the ability of the drive controller to execute writes in parallel.

The serialization of writes to the hardware, whilst useful for head positioning on HDDs, is a painful mistake for SSDs. These devices, like a modern CPU, require high concurrency in the workload to perform at their best. So, just as we go looking for parallelization opportunities in our apps, so we must look for them in the design of the I/O logging.

The most obvious solution is to load balance the logging over several journal files when running on an SSD. It's not quite that simple though - care must be taken to avoid a filesystem journal write as that will trigger a buffer flush for the entire filesystem, not just the individual file. Not to mention contention on the filesystem journal. For optimal performance it may pay to put each log file on its own small filesystem. But I'm getting ahead of myself. First there is the minor matter of actually writing a prototype load balancer to test the ideas. Any volunteers?

Sunday, June 26, 2011

STM Arjuna

I'd forgotten how long ago I'd promised to talk about some of the STM work we've been doing. Well I haven't been able to do much more on it for a while, but I do have time to at least outline what's possible at the moment. So let's just remember a bit about the current way in which you can use JBossTS to build transactional POJOs without the need for a database; we'll use an example to illustrate:
  public class AtomicObject extends LockManager
{
public AtomicObject()
{
super();

state = 0;

AtomicAction A = new AtomicAction();

A.begin();

if (setlock(new Lock(LockMode.WRITE), 0) == LockResult.GRANTED)
{
if (A.commit() == ActionStatus.COMMITTED)
System.out.println("Created persistent object " + get_uid());
else
System.out.println("Action.commit error.");
}
else
{
A.abort();

System.out.println("setlock error.");
}
}

public void incr (int value) throws Exception
{
AtomicAction A = new AtomicAction();

A.begin();

if (setlock(new Lock(LockMode.WRITE), 0) == LockResult.GRANTED)
{
state += value;

if (A.commit() != ActionStatus.COMMITTED)
throw new Exception("Action commit error.");
else
return;
}

A.abort();

throw new Exception("Write lock error.");
}

public void set (int value) throws Exception
{
AtomicAction A = new AtomicAction();

A.begin();

if (setlock(new Lock(LockMode.WRITE), 0) == LockResult.GRANTED)
{
state = value;

if (A.commit() != ActionStatus.COMMITTED)
throw new Exception("Action commit error.");
else
return;
}

A.abort();

throw new Exception("Write lock error.");
}

public int get () throws Exception
{
AtomicAction A = new AtomicAction();
int value = -1;

A.begin();

if (setlock(new Lock(LockMode.READ), 0) == LockResult.GRANTED)
{
value = state;

if (A.commit() == ActionStatus.COMMITTED)
return value;
else
throw new Exception("Action commit error.");
}

A.abort();

throw new Exception("Read lock error.");
}

public boolean save_state (OutputObjectState os, int ot)
{
boolean result = super.save_state(os, ot);

if (!result)
return false;

try
{
os.packInt(state);
}
catch (IOException e)
{
result = false;
}

return result;
}

public boolean restore_state (InputObjectState os, int ot)
{
boolean result = super.restore_state(os, ot);

if (!result)
return false;

try
{
state = os.unpackInt();
}
catch (IOException e)
{
result = false;
}

return result;
}

public String type ()
{
return "/StateManager/LockManager/AtomicObject";
}

private int state;
}
Yes, quite a bit of code for a transactional integer. But if you consider what's going on here and why, such as setting locks and creating potentially nested transactions within each method, it starts to make sense. So how would we use this class? We'll let's just take a look at a unit test to see:
     AtomicObject obj = new AtomicObject();
AtomicAction a = new AtomicAction();

a.begin();

obj.set(1234);

a.commit();

assertEquals(obj.get(), 1234);

a = new AtomicAction();

a.begin();

obj.incr(1);

a.abort();

assertEquals(obj.get(), 1234);
But we can do a lot better in terms of ease of use, especially if you consider what's behind STM: where objects are volatile (don't survive machine crashes). And this is where our approach comes in. Let's look at the example above and create an interface for the AtomicObject:
public interface Atomic
{
public void incr (int value) throws Exception;

public void set (int value) throws Exception;

public int get () throws Exception;
}
And now we can create an implementation of this:
@Transactional
public class ExampleSTM implements Atomic
{
@ReadLock
public int get () throws Exception
{
return state;
}

@WriteLock
public void set (int value) throws Exception
{
state = value;
}

@WriteLock
public void incr (int value) throws Exception
{
state += value;
}

@TransactionalState
private int state;
}
Here we now simply use annotations to specify what we want. The @Transactional is needed to indicate that this will be a transactional object. Then we use @ReadLock or @WriteLock to indicate the types of locks that we need on a per method basis. (If you don't define these then the default is to assume @WriteLock). And we use @TransactionalState to indicate to the STM implementation which state we want to operate on and have the isolation and atomicity properties (remember, with STM there's no requirement for the D in ACID). If there's more state in the implementation than we want to recover (e.g., it can be recomputed on each method invocation) then we don't have to annotate it.

But that's it: the AtomicObject and ExampleSTM classes are identical. So let's take a look at another unit test:
      RecoverableContainer theContainer = new RecoverableContainer();
ExampleSTM basic = new ExampleSTM();
Atomic obj = obj = theContainer.enlist(basic);
AtomicAction a = new AtomicAction();

a.begin();

obj.set(1234);

a.commit();

assertEquals(obj.get(), 1234);

a = new AtomicAction();

a.begin();

obj.incr(1);

a.abort();

assertEquals(obj.get(), 1234);
Think of the RecoverableContainer as the unit of software transactional memory, within which we can place objects that it will manage. In this case ExampleSTM. Once placed, we get back a reference to the instance within the unit of memory, and as long as we use this reference from that point forward we will have the A, C and I properties that we want.

Tuesday, June 14, 2011

Ever wondered about transactions and threads?

No? Well you really should! When transaction systems were first developed they were single-threaded (where a thread is defined to be an entity which performs work, e.g., a lightweight process, or an operating-system process.) Executing multiple threads within a single process was a novelty! In such an environment the thread terminating the transaction is, by definition, the thread that performed the work. Therefore, the termination of a transaction is implicitly synchronized with the completion of the transactional work: there can be no outstanding work still going on when the transaction starts to finish.

With the increased availability of both software and hardware multi-threading, transaction services are now being required to allow multiple threads to be active within a transaction (though it’s still not mandated anywhere, so if this is something you want then you may still have to look around the various implementations). In such systems it is important to guarantee that all of these threads have completed when a transaction is terminated, otherwise some work may not be performed transactionally.

Although protocols exist for enforcing thread and transaction synchronization in local and distributed environments (commonly referred to as checked transactions), they assume that communication between threads is synchronous (e.g., via remote procedure call). A thread making a synchronous call will block until the call returns, signifying that any threads created have terminated. However, a range of distributed applications exists (and yours may be one of them) which require extensive use of concurrency in order to meet real-time performance requirements and utilize asynchronous message passing for communication. In such environments it is difficult to guarantee synchronization between threads, since the application may not communicate the completion of work to a sender, as is done implicitly with synchronous invocations.

As we’ve just seen, applications that do not create new threads and only use synchronous invocations within transactions implicitly exhibit checked behavior. That is, it is guaranteed that whenever the transaction ends there can be no thread active within the transaction which has not completed its processing. This is illustrated below, in which vertical lines indicate the execution of object methods, horizontal lines message exchange, and the boxes represent objects.



The figure illustrates a client who starts a transaction by invoking a synchronous ‘begin’ upon a transaction manager. The client later performs a synchronous invocation upon object a that in turn invokes object b. Each of these objects is registered as being involved in the transaction with the manager. Whenever the client invokes the transaction ‘end’ upon the manager, the manager is then able to enter into the commit protocol (of which only the final phase is shown here) with the registered objects before returning control to the client.

However, when asynchronous invocation is allowed, explicit synchronization is required between threads and transactions in order to guarantee checked (safe) behavior. The next figure illustrates the possible consequences of using asynchronous invocation without such synchronization. In this example a client starts a transaction and then invokes an asynchronous operation upon object a that registers itself within the transaction as before. a then invokes an asynchronous operation upon object b. Now, depending upon the order in which the threads are scheduled, it’s possible that the client might call for the transaction to terminate. At this point the transaction coordinator knows only of a’s involvement within the transaction so enters into the commit protocol, with a committing as a consequence. Then b attempts to register itself within the transaction, and is unable to do so. If the application intended the work performed by the invocations upon a and b to be performed within the same transaction, this may result in application-level inconsistencies. This is what checked transactions are supposed to prevent.

Some transaction service implementations will enforce checked behavior for the transactions they support, to provide an extra level of transaction integrity. The purpose of the checks is to ensure that all transactional requests made by the application have completed their processing before the transaction is committed. A checked transaction service guarantees that commit will not succeed unless all transactional objects involved in the transaction have completed the processing of their transactional requests. If the transaction is rolled back then a check is not required, since all outstanding transactional activities will eventually rollback if they are not told to commit.

As a result, most (though not all) modern transaction systems provide automatic mechanisms for imposing checked transactions on both synchronous and asynchronous invocations. In essence, transactions must keep track of the threads and invocations (both synchronous and asynchronous) that are executing within them and whenever a transaction is terminated, the system must ensure that all active invocations return before the termination can occur and that all active threads are informed of the termination. This may sound simple, but believe us when we say that it isn’t!

Unfortunately this is another aspect of transaction processing that many implementations ignore. As with things like interposition (for performance) and failure recovery, it is an essential aspect that you really cannot do without. Not providing checked transactions is different from allowing checking to be disabled, which most commercial implementations support. In this case you typically have the ability to turn checked transactions off when you know it is safe, to help improve performance. If you think there is the slightest possibility you’ll be using multiple threads within the scope of a single transaction or may make asynchronous transactional requests, then you’d better find out whether your transaction implementation it up to the job. I'm not even going to bother about the almost obligatory plug for JBossTS here ;-)

Thursday, June 2, 2011

WTF is Interposition?

Consider the situation depicted below, where there is a transaction coordinator and three participants. For this sake of this example, let us assume that each of these participants is on a different machine to the transaction coordinator and each other. Therefore, each of the lines not only represents participation within the transaction, but also remote invocations from the transaction coordinator to the participants and vice versa.


In a distributed system there’s always an overhead incurred when making remote invocations compared to making a purely local (within the same VM) invocation. Now the overhead involved in making these distributed invocations will depend upon a number of factors, including how congested the network is, the load on the respective machines, the number of transactions being executed etc. Some applications may be able to tolerate this overhead, whereas others may not. As the number of participants increase, so does the overhead for fairly obvious reasons.

A common approach to reduce this overhead is to realize that as far as a coordinator is concerned, it does not matter what the participant implementation does. For example, although one participant may interact with a database to commit the transaction, another may just as readily be responsible for interacting with a number of databases: essentially acting as a coordinator itself, as shown below:


In this case, the participant is acting like a proxy for the transaction coordinator (the root coordinator): it is responsible for interacting with the two participants when it receives an invocation from the coordinator and collating their responses (and it’s own) for the coordinator. As far as the participants are concerned, a coordinator is invoking them, whereas as far as the root coordinator is concerned it only sees participants.

This technique of using proxy coordinators (or subordinate coordinators) is known as interposition. Each domain (machine) that imports a transaction context may create a subordinate coordinator that enrolls with the imported coordinator as though it were a participant. Any participants that are required to enroll in the transaction within this domain actually enroll with the subordinate coordinator. In a large distributed application, a tree of coordinators and participants may be created.

A subordinate coordinator must obviously execute the two-phase commit protocol on its enlisted participants. Thus, it must have its own transaction log and corresponding failure recovery subsystem. It must record sufficient recovery information for any work it may do as a participant and additional recovery information for its role as a coordinator. Therefore, it is impossible for a normal participant to simply be a sub-coordinator because the roles are distinctly different; sub-coordinators are tightly coupled with the transaction system.

So the question then becomes when and why does interposition occur?

  • Performance: if a number of participants reside on the same node, or are located physically close to one another (e.g., reside in the same LAN domain) then it can improve performance for a remote coordinator to send a single message to a sub-coordinator that is co-located with those participants and for that sub-coordinator to disseminate the message locally, rather than for it to send each participant the same message.
  • Security and trust: a coordinator may not trust indirect participants and neither may indirect participants trust a remote coordinator. This makes direct registration impossible. Concentrating security and trust at coordinators can make it easier to reason about such issues in a large scale, loosely coupled environment.
  • Connectivity: some participants may not have direct connectivity with a specific coordinator, requiring a level of indirection.
  • Separation of concerns: many domains and services may simply not want to export (possibly sensitive) information about their implementations to the outside world.
You'll find interposition used in a number of transaction systems. Within JBossTS it's used by the JTS and XTS components.

When is a transaction not a transaction?

Most of the time when we talk about transactions we really mean database transactions (local transactions), or ACID transactions. These may also share the common name of Top-level (Flat) Transaction, or maybe Traditional Transactions. Several enhancements to the traditional flat-transaction model have been proposed and in this article I want to give an overview of some of them:

  • Of course the first has to be nested transactions. But I've mentioned these before several times, so won't go over them again here. But I will remind everyone that you can use them in JBossTS!
  • Next up are Independent Top-Level transactions which can be used to relax strict serializability. With this mechanism it is possible to invoke a top-level transaction from within another transaction. An independent top-level transaction can be executed from anywhere within another transaction and behaves exactly like a normal top-level transaction, that is, its results are made permanent when it commits and will not be undone if any of the transactions within which it was originally nested roll back. If the invoking transaction rolls back, this does not lead to the automatic rollback of the invoked transaction, which can commit or rollback independently of its invoker, and hence release resources it acquires. Such transactions could be invoked either synchronously or asynchronously. In the event that the invoking transaction rolls back compensation may be required. Guess what? Yup, we support these too!
  • Now we move on to Concurrent Transactions: just as application programs can execute concurrently, so too can transactions (top-level or nested), i.e., they need not execute sequentially. So, a given application may be running many different transactions concurrently, some of which may be related by parent transactions. Whether transactions are executed concurrently or serially does not affect the isolation rules: the overall affect of executing concurrent transactions must be the same as executing them in some serial order.
  • Glued Transactions are next in our line up: top-level transactions can be structured as many independent, short-duration top-level transactions, to form a “logical” long-running transaction; the duration between the end of one transaction and the beginning of another is not perceivable and selective resources (e.g., locks on database tables) can be atomically passed from one transaction to the next. This structuring allows an activity to acquire and use resources for only the required duration of this long-running transactional activity. In the event of failures, to obtain transactional semantics for the entire long-running transaction may require compensation transactions that can perform forward or backward recovery. As you might imagine, implementing and supporting glued transactions is not straightforward. In previous work we did around the CORBA Activity Service, we used JBossTS as the core coordinator and supported glued transactions. Although some of that code still exists today, it would definitely need a dusting off before being used again.
  • Last but by no means least ... distributed transactions. Need I say more?
So there's an outline of some, but not all, of the more exotic extended transaction models out there. The interesting thing is that with the basic transaction engine in JBossTS we can support many of them. Some of the ones I haven't outlined can also be implemented using a technique that another of the original Arjuna developers came up with: Multi-coloured Actions. Well worth a read, and perhaps something we may look at again someday.

Monday, May 2, 2011

Done (for now).

JUDCon Boston 2011 is well under way. After many interesting technical sessions, including presentations on Byteman and on Transactions in the Cloud, we're onto the beer and pizza part of the day. If you're in Boston get yourself down here tomorrow for the second day, it's a great event.

Unfortunately I'm going to miss it as I'm off on vacation at oh-my-god-early in the morning. Please try not to break the shiny new JBossTS release in my absence. Normal services will resume, from the not so normal and sadly (for me) temporary location of Brisbane, Australia, in a couple of week's time.

Tuesday, April 26, 2011

almost done

JBossTS 4.15.0.Final is done. It's the fastest, most robust release yet. It's also the last feature release of its line.

As well as working standalone, 4.15 is the feature release for integration with JBossAS 7 and thence JBossEAP 6. As AS 7 is still a moving target we've got a bit of tweaking to do for the integration over the next few weeks, particularly on JTS and XTS. After that it's maintenance only on the 4.15.x branch and the community focus moves on to the next major version.

As well as new features, the next major release cycle will have a new project name (complete with snazzy new logo, naturally) and a new project lead.

That change should hopefully leave me with some time to do new things too, particularly looking at the implications of the cloud computing model and NoSQL storage solutions on transaction systems.

I'm starting out on that road with a presentation next week at JUDCon so if you're in the Boston area come along and join in. Don't worry, I'll have written the slides by then, I promise. I'm just not quite done yet :-)

Wednesday, April 20, 2011

Messaging/Database race conditions

Programmers are divided between those who only write single threaded code and those who will, at the slightest hint of provocation, tell interminably long war stories about debugging race conditions. Although I'm firmly in the second category, this is not such a story. It is however a lesson in what happens when spec authors spend insufficient time listening to them.

A common pattern in enterprise apps involves a business logic method writing a database update, then sending a message containing a key to that data via a queue to a second business method, which then reads the database and does further processing. Naturally this involves transactions, as it is necessary to ensure the database update(s) and message handling occur atomically.

methodA {
beginTransaction();
updateDatabase(key, values);
sendMessage(queueName, key);
commitTransaction();
}

methodB {
beginTransaction();
key = receiveMessage(queueName);
values = readDatabase(key);
doMoreStuff(values);
commitTransaction();
}

So, like all good race condition war stories, this one starts out with a seemingly innocuous chunk of code. The problem is waiting to ambush us from deep within the transaction commit called by methodA.

Inside the transaction are two XAResources, one for the database and one for the messaging system. The transaction manager calls prepare on both, get affirmative responses and writes its log. Then it calls commit on both, at which point the resource managers can end the transaction isolation and make the data/message visible to other processes.

Spot the problem yet?

Nothing in the XA or JTA specifications defines the order in which XAResources are processed. The commit calls to the database and message queue may be issued in either order, or even in parallel.

Therefore it is inevitable that every once in a while, probably on the least convenient occasions, the database read in methodB will fail as the data has not yet been released from the previous transaction.

Oops.

So, having been let down by shortcomings in the specifications, what can the intrepid business logic programmer do to save the day?

Polling: Throw away the messaging piece and construct a loop that will look for new data in the db periodically. yuck.

Messaging from afterCompletion: The transaction lifecycle provides an afterCompletion notification that is guaranteed to be ordered after all the XAResource commits complete, so we can send the message from there to avoid the race. Also icky, since it is then no longer atomic with the db write in the case of failures, which means we need polling as a backup anyhow.

Let it fail: An uncaught exception in the second tx, e.g. an NPE from the database read, should cause the container to fail the transaction and the messaging system to do message redelivery. Hopefully by that time the db is ready. Inefficient and prone to writing a lot of failure messages into your system logs, which is to say: yuck.

Force resource ordering: Many transaction systems offer some non-spec compliant mechanism for XAResource ordering. Due to the levels of abstraction the container places between the transaction manager and the programmer this is not always easy to use from Java EE. It also ties your code to a specific vendor's container. You can do this with JBoss if you really want to, but it's not recommended.

Delay the message delivery: Similar to the above, some messaging systems go above and beyond the JMS spec to offer the ability to mark a queue or specific message as having a built-in delay. The messaging system will hold the message for a specified interval before making it available to consumers. Again you can do with with JBoss (see 'Scheduled Messages' in HornetQ) but whilst it's easy to code it's not a runtime efficient solution as tuning the delay interval is tricky.

Backoff/Retry in the consumer: Instead of letting the consuming transaction fail and be retried by message redelivery, catch the exception and handle it with a backoff/retry loop:

methodB {
beginTransaction();
key = receiveMessage(queueName);

do {
values = readDatabase(key);
if(values == null) {
sleep(x);
}
} while(values == null);

doMoreStuff(values);
commitTransaction();
}

This has the benefit of being portable across containers and, with a bit of added complexity, pretty robust. Yes it is a horrible kludge, but it works. Sometimes when you are stuck working with an environment built on flawed specifications that's really the best you can hope for.

Monday, April 4, 2011

World domination

It's nearing that time of year again... The transactions team are packing their bags and preparing to spend a couple of days holed up in a top secret underground lair, plotting the next steps towards world domination.

I normally spend a few days beforehand checking up on the state of the world, particularly new developments in transactions, the list of open issues in our bug tracker, the last few month's worth of support cases and other such sources of interesting information. I'm starting to think the reason that no Evil Genius has succeeded in taking over the world yet has very little to do with being repeatedly thwarted by James Bond and an awful lot to do with the sheer volume of paperwork involved. Perhaps if they spent less on underground lairs and more on secretarial support...

Anyhow, as per usual we have way too many ideas on the list, so some prioritization is in order. One of the key factors is the number of users a given chunk of work may benefit. Improvements to the crash recovery system benefit pretty much everyone. I'd like to say the same about documentation, but it's apparent only a tiny proportion of users actually read it. Somewhere in between lie improvements to optional modules or functionality that only some users take advantage of. The tricky part is gauging how many there are.

Users occasionally get in touch and tell us how they are using the code. Unfortunately this usually happens only when they need help. The rest of the time we mostly just make educated guesses about the use cases. I was mulling over this problem today when a tweet caught my eye:

"Today's royalty check for "Advanced CORBA Programming": $24.51. Last year each check was ~$400. I guess CORBA is officially dead now :-)" - Steve Vinoski

This got me thinking about the future of JTS. In Java EE environments, JTS exists pretty much only to allow transactional calls between EJBs using IIOP. Now that we have @WebService on EJBs, how much traffic still uses IIOP anyhow? What are the odds of RMI/IIOP (and hence JTS) being deprecated out of Java EE in the next revision or two? How much time should we be sinking into improvements of JTS vs. web services transactions?

That's just one of many topics on the agenda for our team meeting. I suppose we could also debate the relative merits of man-portable freeze rays vs. orbital laser cannons, but I don't think the corporate health and safety folks will let us have either. Spoilsports. Guess we'll just have to continue progressing towards world domination the hard way.

Thursday, March 31, 2011

The Competitors

I don't normally spend much time worrying about our competitors. I figure keeping the customers happy is a better use of my time. But no set of blog posts about performance would be complete without a final showdown...

There aren't all that many standalone JTA transaction managers out there. Most function only as part of an application server and are not a viable option for embedding in other environments. Others are incomplete or largely abandoned. In my opinion there are currently only two serious challengers to JBossTS for use outside a full Java EE app server. Let's see how they do.

results in transactions per second, using quad core i7, jdk6, in-memory logging i.e. we're testing the CPU not the disk.


threads JBossTS Competitor A.
1 93192 9033
2 164744 20242
3 224114 22649
4 249781 21148
100 239865 19978


umm, guys, "synchronized ( Foo.class )" is really not a good idea, ok? Come on, put up a bit more of a fight please, this is no fun.

Next contender, please....


threads JBossTS Competitor B.
1 93192 102270
2 164744 145011
3 224114 185528
4 249781 193199
100 239865 29528


Ahh, that's more like it. Good to have a worthy opponent to keep us on our toes. Some kind of scaling problem there though. Looks like contention on the data structures used to queue transaction timeouts. Been there, done that. Feel free to browse the source code for inspiration guys :-)

Since nobody in their right mind actually runs a transaction manager with logging turned off, this is really all just a bit of fun. Well, for some of us anyhow. So where is the competitive comparison for the real world scenario?

Well, I do have the comparison numbers for all three systems with logging enabled, but I'd hesitate to describe them as competitive. You'll have to try it for yourself or you just won't believe the difference. Suffice it to say there is only one Clebert, and we've got him. Sorry guys.

Now if this had been an ease of use contest I'd be a bit more worried. Sure we have a ton of detailed and comprehensive reference documentation, but the learning curve is near vertical. Time to work on some intro level material: quick starts, config examples etc. Not as much fun as performance tuning, but probably more useful at this point.

Monday, March 28, 2011

fast enough?

Remember the performance tuning we've been doing? Well, the powers that be sat up and took notice too. I think the phrase was 'more than enough to satisfy requirements' with a non-too-subtle subtext of 'so now put down that profiler and do some real work'. Fortunately I'm not too good at taking hints...

After some more tweaking today (finalizers are evil and must die), I thought I'd see how we were doing. Here is trunk, soon to become JBossTS 4.15 and be used in EAP6, against JBossTS 4.6.1.CP11, used in the current EAP5.1 platform release. 4.6.1 originally dates from September 2009, but we backported some performance improvements into it in mid 2010.

results in transactions per second, using quad core i7, jdk6, in-memory objectstore i.e. we're testing the CPU not the disk.


threads 4.6.1.CP12 4.15.SNAPSHOT improvement
1 37285 93192 2.50x
2 55309 164744 2.97x
3 65070 224114 3.44x
4 66172 249781 3.77x
100 29027 239865 8.26x


hmm, ok, so I guess maybe we did exceed the requirements a bit.

Saturday, March 19, 2011

Concurrency control

We managed to gloss over a few important points during the discussion on Isolation. Let's dig into them here.

How can you ensure isolation? Well typically this is covered by what's referred to as concurrency control for the resources within the transaction. A very simple and widely used approach is to regard all operations on resources (objects) to be of type ‘read’ or ‘write’, which follow the synchronization rule permitting ‘concurrent reads' but exclusive ‘writes’. This rule is imposed by requiring that any computation intending to perform an operation that is of type read (write) on an object, first acquire a ‘read lock’ (‘write lock’) associated with that object. A read lock on an object can be held concurrently by many computations provided no computation is holding a write lock on that object. A write lock on an object, on the other hand, can only be held by a computation provided no other computation is holding a read or a write lock. (Note, although we'll talk in terms of locks, this should not be used to infer a specific implementation. Timestamp-based concurrency control could just as easily be used, for example.)

In order to ensure the atomicity property, all computations must follow a ‘two–phase’ locking policy. During the first phase, termed the growing phase, a computation can acquire locks, but not release them. The tail end of the computation constitutes the shrinking phase, during which time held locks can be released but no locks can be acquired. Now suppose that a computation in its shrinking phase is to be rolled back, and that some objects with write locks have already been released. If some of these objects have been locked by other computations, then abortion of the computation will require these computations to be aborted as well. To avoid this cascade roll back problem, it is necessary to make the shrinking phase ‘instantaneous’.

Most transaction systems utilize what is commonly referred to as pessimistic concurrency control mechanisms: in essence, whenever a data structure or other transactional resource is accessed, a lock is obtained on it as described earlier. This lock will remain held on that resource for the duration of the transaction and the benefit of this is that other users will not be able to modify (and possibly not even observe) the resource until the holding transaction has terminated. There are a number of disadvantages of this style: (i) the overhead of acquiring and maintaining concurrency control information in an environment where conflict or data sharing is not high, (ii) deadlocks may occur, where one user waits for another to release a lock not realizing that that user is waiting for the release of a lock held by the first.

Therefore, optimistic concurrency control assumes that conflicts are not high and tries to ensure locks are held only for brief periods of time: essentially locks are only acquired at the end of the transaction when it is about to terminate. This kind of concurrency control requires a means to detect if an update to a resource does conflict with any updates that may have occurred in the interim and how to recover from such conflicts. Typically detection will happen using timestamps, whereby the system takes a snapshot of the timestamps associated with resources it is about to use or modify and compares them with the timestamps available when the transaction commits.

Resolution of conflicts is a different problem entirely, since in order to do so requires semantic information about the resources concerned. Therefore, most transaction systems that offer optimistic schemes will typically cause the detecting transaction to roll back and the application must retry, this time with new data. Obviously this may result in a lot of work being lost, especially if the transaction that rolls back has been running for some time.

Assuming both optimistic and pessimistic concurrency control are available to you (and they may not be), then which one to use is up to you. A close examination of the environment in which the application and transactional resources reside is necessary to determine whether a) shared access to resources occurs and b) the relative probability that sharing will cause a transaction to roll back. This might very well not be a black or white choice and may change over the lifetime of your objects or application. Certainly the use of different concurrency control schemes can be important when trying to improve the throughput of user requests and committed transactions, so it’s well worth considering and understanding the issues involved.
Type specific concurrency control

Another possible enhancement is to introduce type specific concurrency control, which is a particularly attractive means of increasing the concurrency in a system (and yes, it's supported in JBossTS). Concurrent read/write or write/write operations are permitted on an object from different transactions provided these operations can be shown to be non-interfering (for example, for a directory object, reading and deleting different entries can be permitted to take place simultaneously). Object-oriented systems are well suited to this approach, since semantic knowledge about the operations of objects can be exploited to control permissible concurrency within objects. Additional work may be needed when working with procedural systems.

Finally, what about deadlocks? When multiple transactions compete for the same resources in conflicting modes (locks), it is likely that some of them will fail to acquire those resources. If a transaction that cannot acquire a lock on a resource waits for it to be released, then that transaction is blocked – no forward progress can be made until the lock has been acquired. In some environments, it is possible for some transactions to be waiting for each other, where each of them is blocked and is also blocking another transaction. In this situation, none of the transactions can proceed and the system is deadlocked.

For example, let’s consider two transactions T1 and T2 that operate on two resources X and Y. Let’s assume that the execution of the operations involved in these transactions is:

T1: read(X); write(Y)
T2: read(Y); write(X)

If the serial execution of these transactions were to result in:

readT1(X); readT2(Y); writeT2(X); readT1(Y)

Note, readT1 means the read operation performed by T1 etc.

Assume that T1 obtained a read lock on X and then T2 gets a read lock on Y – possible because these operations aren’t conflicting and can thus occur in parallel. However, when T2 comes to write to X its attempt to get a write lock on X will block because T1 still holds its read lock. Likewise, T1’s attempt to get a write lock on Y will block because of the read lock that T2 holds. Each transaction is blocked waiting for the release of the others read lock before they can progress: they are deadlocked.
The only way for the deadlock to be resolved is for at least one of the transactions to release its locks that are blocking another transaction. Obviously such a transaction cannot commit (it has not been able to perform all of its work since it was blocked); therefore, it must roll back.

Deadlock detection and prevention is complicated enough in a non-distributed environment without then including the extra complexity of distribution. In general, most transaction systems allow deadlocks to occur, simply because to do otherwise can be too restrictive for applications. There are several techniques for deadlock detection, but the two most popular are:

  • Timeout-based: if a transaction has been waiting longer than a specified period of time, the transaction system will automatically roll back the transaction on the assumption it is deadlocked. The main advantage of this approach is that it is easy to implement in a distributed environment; the main disadvantage is that some transactions may execute for longer than expected and be rolled back when they are not in fact deadlocked.

  • Graph-based: this explicitly tracks waiting transaction dependencies by constructing a waits-for graph: nodes are waiting transactions and edges are waiting situations. The main advantage of this approach is that it is guaranteed to detect all deadlocks, whereas the main disadvantage is that in a distributed environment is can be costly to execute.


A slight variation on the timeout-based approach exists in some transaction systems, where timeouts can be associated with lock acquisition, such that the system will only block for the specified period of time. If the lock has not been acquired by the time this period elapses, it returns control to the application indicating that the lock has not been obtained. It is then up to the application to determine what to do; for example, it may be possible to acquire the required data elsewhere or to ask for a lock in a different mode. The advantage of this approach is that a transaction is not automatically rolled back if it cannot acquire a lock, possibly saving the application lots of valuable time; the disadvantage is that it requires additional effort on behalf of the application to resolve lock acquisition failures. However, for many objects and applications, this is precisely the best place to resolve lock conflicts because this is the only place where the semantic information exists to know what (if anything) can be done to resolve such conflicts.

Durability

D is for Dog, Design and Durability, i.e., the effects of a committed transaction are never lost (except by a catastrophic failure).

The durability (or persistence) property means that any state changes that occur during the transaction must be saved in a manner such that a subsequent failure will not cause them to be lost. How these state changes are made persistent is typically dependant on the implementation of the transaction system and the resources that are ultimately used to commit the work done by the transactional objects. For example, the database will typically maintain its state on hard disk in order to ensure that a machine failure (e.g., loss of power) does not result in loss of data.

Although most users of transactions will see durability from their application’s point-of-view, there is also an aspect within the transaction system implementation itself. In order to guarantee atomicity in the presence of failures (both transaction coordinator and participant), it is necessary for the transaction service itself to maintain state. For example, in some implementations the coordinator must remember the point in the protocol it has reached (i.e., whether it is committing or aborting), the identity of all participants that are registered with the transaction and where they have reached in the protocol (e.g., whether they have received the prepare message). This is typically referred to as the transaction log, though this should not be interpreted as implying a specific implementation. Some implementations may maintain a separate log (file) per transaction, with this information recorded within it and removed when it is no longer needed. Another possible implementation has a single log for all transactions and the transaction information is appended to the end of the log and pruned from the log when the respective transaction completes.

Let’s look at what happens at the participant in terms of durability, when it’s driven through the two-phase commit protocol. When the participant receives a prepare message from the coordinator it must decide whether it can commit or roll back. If it decides to roll back then it must undo any state changes that it may control and inform the coordinator; there is no requirement for durable access at this point. If the participant can commit, it must write its intentions to a durable store (participant log) along with sufficient information to either commit or roll back the state changes it controls. The format of this information is typically dependant on the type of participant, but may include the entire original and new states.

Once successfully recorded, the participant informs the coordinator that it can commit and awaits the coordinator’s decision. When this second phase message arrives, the participant will either cancel all state changes (the coordinator wants to roll back), or if the coordinator wants to commit, make those state changes the current state (e.g., overwrite the original state with the new state). It can then delete the participant log and inform the coordinator.

Isolation

I is for Iguana, Igloo and Isolation, i.e., intermediate states produced while a transaction is executing are not visible to others. Furthermore transactions appear to execute serially, even if they are actually executed concurrently.

This property is often referred to as serializability. If we assume that objects and services can be shared between various programs then it is necessary to ensure that concurrent executions of programs are free from interference, i.e., concurrent executions should be equivalent to some serial order of execution. We have already seen a fairly trivial example of this through the online bank account, but it is worth formalizing this requirement. Consider the following two programs (where w, x, y and z are distinct state variables):

P1 : z := 10; x := x+1; y := y+1
P2 : w := 7: x := x * 2; y := y * 2

Assume that x=y=2 initially. Then, a serial execution order ’P1;P2 ’ will produce the result z=10, w=7, x=y=6, and execution ’P2 ;P1 ’ will produce the results z=10, w=7, x=y=5. The partly concurrent execution order given below will be termed interference free or serializable, since it is equivalent to the serial order ’P1 ;P2 ’:

(z := 10 || w := 7); x := x+1; y := y+1; x := x * 2; y := y * 2

However, an execution order such as the one below is not free from interference since it cannot be shown to be equivalent to any serial order.

(z := 10 || w := 7); x := x+1; x := x * 2; y := y * 2; y := y+1

Programs that possess the above-mentioned serializable property are said to be atomic with respect to concurrency. The serializability property is extremely important, especially in an environment where multiple concurrent users may be attempting to use the same resources consistently.

Consistency

C is for Crayon, Cup and Consistency, i.e., transactions produce consistent results and preserve application specific invariants.

A transactional application should maintain the consistency of the resources (e.g., databases, file-systems, etc.) that it uses. In essence, transactional applications should move from one consistent state to another. However, unlike the other transactional properties (A, I and D) this is something that the transaction system cannot achieve by itself since it does not possess any semantic information about the resources it manipulates: it would be impossible for a transaction processing system to assert that the resources are moving to (or from) consistent states. All a transaction system can ensure is that any state changes that do occur are performed in a manner that is guaranteed despite failures. It is the application programmers’ responsibility to ensure consistency (in whatever way makes sense for the resources concerned.)

Atomicity

We've already covered some of this in previous postings, but in the next four entries let's explicitly look at all ACID properties. And of course that means we start here with ... A is for Apple, Aardvark and Atomic, i.e., the transaction completes successfully (commits) or if it fails (aborts) all of its effects are undone (rolled back).

In order to ensure that a transaction has an atomic outcome, the two-phase commit protocol that we discussed earlier, is typically used. This protocol is used to guarantee consensus between participating members of the transaction. When each participant receives the coordinator’s phase 1 message, they record sufficient information on stable storage to either commit or abort changes made during the action. After returning the phase 1 response, each participant that returned a commit response must remain blocked until it has received the coordinator’s phase 2 message. Until they receive this message, these resources are unavailable for use by other actions. If the coordinator fails before delivery of this message, these resources remain blocked. However, if failed machines eventually recover, crash recovery mechanisms can be employed to unblock the protocol and terminate the transaction.

Tuesday, March 15, 2011

fiddling with knobs

Get your mind out the gutter you. I'm talking about control knobs. Also buttons, switches, levers and anything else that allows for interesting behavioural tweaks.

Messing around with settings on software is a great way to pass the time. For those who enjoy such things there are over 100 configuration properties in JBossTS. Which admittedly may be a little too much of a good thing. Fortunatly they mostly have sensible default values.

I recently discussed the performance benefits of using one of those properties to swap in a new ObjectStore implementation for transaction logging in JBossTS. In the last few days I've been adding some more configuration options to allow overriding of the settings in the underlying HornetQ Journal. It turns out that there are also a bunch of interesting knobs to fiddle with in that code. By judicious tweaking of the settings on the Journal and the file system I've got a nice speedup over the last iteration:

60119 tx/second with in-memory logging, no disk I/O

43661 tx/second with the new Journal on SSD

43k is a big step up from 30-36k or so we were seeing previously, but still not saturating the CPU. However, a new test case shows the disk is not to blame:

89221 logs writes / second.

That's taking most of the transaction manger code out of the picture and writing an equivalent amount of data direct to the Journal based ObjectStore. So, plenty of disk write capacity to spare now.

So why can't we use it all?

The log write done by an individual transaction is very small. To get an efficient batch size you need to aggregate the writes for a significant number of transactions. Whilst waiting for that batch to be written, the transactions are blocked. So, to keep the CPU busy you need to throw more threads / concurrent transactions at it. Infact the disk is so quick you need a LOT more threads. At which point lock contention elsewhere in the transaction manager code become a problem, despite the significant improvements we made to it anyhow for pure in-memory transaction cases.

Naturally it's possible we can get some additional benefit from another iteration of lock refactoring. We've already picked off most of the reasonably tractable cases though and what's left is going to be complex to change.

Another possibility is to actually run the Journal out of process and use IPC so several JVMs can share it. The additional communication overhead may just be worthwhile. It's a similar model to what we do with the SQL db based objectstore, but without the overhead of SQL. Although naturally this multi process architecture is attractive only where the application workload is amenable to partitioning between JVMs.

It may have to wait a while though. We're entering the final weeks of 4.15 development and it's time to start cleaning up all the loose ends and getting things in shape for use by JBossAS 7 GA. Onward.

Sunday, March 13, 2011

Slightly alkaline transactions if you please ...

Given that the traditional ACID transaction model is not appropriate for long running/loosely coupled interactions, let’s pose the question, “what type of model or protocol is appropriate?” The answer to that question is that that no one specific protocol is likely to be sufficient, given the wide range of situations that transactions are likely to be deployed within. In fact trying to shoe-horn ACID transactions into a wide range of situations for which they were never designed is one of the reasons they've gotten such a bad reputation over the years.

There are a number of different extensions to the standard transaction model that have been proposed to address specific application needs, that may not be easily or efficiently addressed through the use of traditional transactions:

Nested transactions: permits a finer control over recovery and concurrency. The outermost transaction of such a hierarchy is referred to as the top-level transaction. The permanence of effect property is only possessed by the top-level transaction, whereas the commits of nested transactions (subtransactions) are provisional upon the commit/abort of an enclosing transaction. And yes, JBossTS supports nested transactions!

Type specific concurrency control: concurrent read/write or write/write operations are permitted on an object from different transactions provided these operations can be shown to be non-interfering. Oh and yes, we support this too!

Independent top-level transactions: with this model it is possible to invoke a top-level transaction from within another (possibly deeply nested) transaction. If the logically enclosing transaction rolls back, this does not lead to the rollback of the independent top-level transaction, which can commit or rollback independently. In the event that the enclosing transaction rolls back, compensation may be required, but this is typically left up to the application. Yes, we've got this one covered as well.

Structured top-level transactions: long-running top-level transactions can be structured as many independent, short-duration top-level transactions. This allows an activity to acquire and use resources for only the required duration. In the event of failures, to obtain transactional semantics for the entire duration may require compensations for forward or backward recovery. Now although we don't support this directly within JBossTS since it is really workflow, JBossTS has been used to implement workflow systems for many years.

What this range of extended transaction models illustrate is that a single model is not sufficient for all applications. Therefore, is it possible to develop a framework within which all of these models can be supported, and also facilitate the development of other models? This was the question asked by the Object Management Group when it began its work on attempting to standardise extended transaction models. In this paper we shall given an overview of the results of the work we performed with IBM, Iona and others in producing the final Activity Service OMG specification that attempts to answer that question. This also became a Jave specification with JSR 95.

Now I'm not going to go into this standard in any detail, since that could be the subject of several other blog posts, but I will summarise it as this: have a generic coordination infrastructure that allows the intelligence behind the protocol (e.g., two-phase or three-phase) as well as the trigger points for the protocol (e.g., at the end of the transaction or during the transaction) to be defined at runtime and in a pluggable manner. But why did I mention all of this? Because this standard formed the basis of the Web Services transactions standard (and the various competing specifications that came before it), with WS-Coordination as the pluggable infrastructure.

At this point in time there are only two types of transaction protocol supported by the Web Services standard and the REST-based approach that we're working on:

• ACID transactions: Web services are for interoperability in closely coupled environments such as corporate intranets as much as they are for the Web. Interoperability between heterogeneous transaction service implementations is a requirement and yet has been difficult to achieve in practice. This transaction model is designed to support interoperability of existing transaction processing systems via Web services, given such systems already form the backbone of enterprise class applications. Although ACID transactions may not be suitable for all Web services, they are most definitely suitable for some, and particularly high-value interactions such as those involved in the finance sector. For example, in a J2EE environment, JTA-to-JTA interoperability is supported through the JTS specification, but this is neither mandated nor universally implemented.

• Forward compensation based transactions: this model is designed for those business interactions that are long in duration, where traditional ACID transactions are inappropriate. With this model, all work performed within the scope of an application should be able to be compensated such that an application’s work is either performed successfully or undone. However, how individual Web services perform their work and ensure it can be undone if compensation is required, is an implementation choice. The model simply defines the triggers for compensation actions and the conditions under which those triggers are executed.

And the nice thing about our implementations of both of these is that they're based on the exact same transaction engine: ArjunaCore. Because it was designed with extensibility in mind and also with the requirement to be able to relax the various ACID properties, over the last decade we've been able to use it relatively untouched as the basis of pretty much every transaction protocol and implementation that's been out there. So if we do decide that we need to add another extended transaction protocol for other use cases, I'm pretty confident that it won't require us to start from scratch!