Wednesday, June 14, 2017

Sagas and how they differ from two-phase commit

With all the talk about microservices and distributed transactions in the community these days and in particular the use of the term Saga - it seems like a good time for a refresher on Sagas and how they differ from a 2PC transaction. As we know, transaction management is needed in a microservices architecture and Sagas can provide a good fit to addressing that need. There are several ways[9] to handle and implement Saga in such environment but the explanation of that topic is not the goal of this post.

This post instead introduces the concept of Saga transactions and compares it to ACID two-phase commit transactions. The two approaches having the same goal - coordinate resources while operations over them form one logical unit of work but taking quite different approaches to ensure that goal is met.

ACID

First, let’s refresh on the concept of an ACID transaction. ACID is defined as - atomicity, consistency, isolation and durability. These properties are provided by a system to guarantee that operations over multiple resources can be considered as a single logical operation (unit of work).
Let’s take it briefly one by one.
  • Atomicity signifies all or nothing. If one operation from the set fails then all others have to fail too. Looking from a different angle we can call the property abortability[1].
  • Consistency states that system is consistent before transaction starts and after it finishes. The definition of consistency varies from resource to resource. In database systems, we can consider it as not breaking any constraints defined by the primary key, foreign key, uniqueness etc. But being consistent in broader term is understood having the application in the consistent state. That is not maintained by the transactional system itself but by the application programmer. Of course, the transaction helps to defend it.
  • Isolation refers to the behaviour of concurrent operations. The ACID definition defines it as transactions being serializable - the system behaves like all transactions being processed in a single thread one by one. Such implementation is not required and it widely differs[2]. Besides such behaviour (even pretended) brings performance implications. It’s usual that isolation property is relaxed to comply with one of the isolation levels with t lower guarantee than serializability[3].
  • Durability means that if the transaction commits all data are persisted and even the system crashes it will be accessible after the restart.

Considering that definition of ACID transactions, now let’s consider about Saga and two-phase commit.

Two-phase commit (2PC)

A well-known algorithm to achieve ACID transaction outcomes is the two-phase commit protocol. 2PC (part of a family of consensus protocols) serves to coordinate the commit of a distributed transaction, i.e one that updates multiple resources. Those of you already familiar with Narayana will be well acquainted with a popular manifestation of this philosophy: JTA
As its name suggests, the protocol works in two phases. The first phase is named ‘prepare’ and the coordinator queries participants if they are ready to finish wit the commit. The second phase is named ‘commit’ and coordinator commands participants to commit and made changes visible to the outer world. Coordinator commands to commit only if all participants voted for it. If some of the participant votes ‘abort’ then the whole transaction and all participants are rolled back. It means any change made to the participant during the transaction is aborted.

For a better understanding of the process see details at our wiki at https://developer.jboss.org/wiki/TwoPhaseCommit2PC.

Saga

The Saga[4] pattern, on the other hand, works with units of work that can be undone. There is no commitment protocol included.
The original paper discusses updates to single node database where such work is processed but the notion can be further applied to distributed transactions too.
A Saga consists of a sequence of operations, each could work with a resource. Changes made by the operation on the particular resource are visible to the outer world immediately. We can see it as a just group of operations (a.k.a local transactions) which are executed one by one group by the Saga.
A Saga guarantees that either all operations succeed or all the work is undone by compensating actions. The compensating actions are not generically provided by a coordinator framework, instead, they have undone actions defined in business logic by the application programmer.
The Saga paper talks about two ways of handling a failure:
  • Backward recovery - when a running operation fails then compensating action are executed for any previously finished operation. The compensation actions are executed in reverse order than the operations were run in.
  • Forward recovery - when the running operation is aborted but in the following step, it’s replayed to finish with success. The benefit is that forward progress is ensured and work is not wasted even when the failure occurs - the particular operation is aborted, replayed and Saga continues from that operation forward.
    You don’t need to specify any compensation logic but you can be trapped in an infinite retry loop. This recovery is defined as the pure forward recovery in the paper[4].
  • Forward/backward recovery - for sake of completeness the paper[4] defines the way of combining the forward and backward recovery. The application logic or the saga manager defines save-points. A savepoint declares a place in the Saga processing where operations can be safely retried from. When the running operation fails then compensation action is executed for previously finished operations up to the defined save-point. From there, operations are replied with a try for finishing the Saga successfully.
    We say that the forward recovery defines save-points after each of the successfully finished operation.
The undo actions for compensation and replay actions of forward recovery are expected to be idempotent for being possible to retry it multiple times. If there isn’t any special handling in underlying transport, such as retaining results, it is up to the application developer to code the handler an idempotent way. The saga manager then needs to be able to cope with the situation when Saga participant responses that the operation was already done.
The approach of forward recovery could be very handy [8] (the talk references this approach at time 28:20) but we focus more on compensations (backward recovery).

For better outlining the process let’s take an example. Let’s say we have a message queue where we want to send a message to and a database where we want to insert a record . If we want to group those operations we know each of them is represented by a local transaction. For purpose of this text we define a resource-located transaction as the counterpart of the local transaction which is transaction managed by the resource itself (i.e. database transaction or message broker transaction). Saga groups together business operations and defines compensating undo action for each of them. When business logic sends the message the local transaction is committed and immediately available in the queue. When data is inserted into the database table, the local transaction is committed and the value could be read. When both business operations succeed the Saga finishes.

Let’s try to elaborate a little bit
  1. First saga manager starts Saga
  2. The business logic sends a message to the JMS queue. This business operation is attached to Saga which brings obligation from to define a compensation handler (a chunk of code which is executed in case of Saga needs to be aborted).
    Sending of the message is wrapped into a short local transaction which ends when JMS message is acknowledged by the receiver.
  3. The business logic inserts a row into the database table - again, the operation is attached to Saga, the SQL insert command is covered by short local transaction and a compensation handler has to be defined to undo the insertion in case of abortion of the Saga
  4. If everything goes fine this is the end of the Saga and business logic can continue with another saga.
     

Let’s say a failure occurs during insertion a record to the database and we apply backward recovery.

  • As database insertion fails the ACID resource-located transaction reached an error state and has to be rolled-back by the resource. The error is returned to the Saga.
  • The saga manager is responsible for data integrity being preserved which is ensured by aborting the whole Saga which means executing compensating actions for previously completed business operations. The compensation action for database insertion is not needed as the local transaction has been already rolled-back. But compensation action of sent JMS message is waiting to be executed.
  • The compensating action is inevitably bound to the business context, thus we have to define at least a simple one.
    Let’s assume that sending of the JMS message meant placing an order in a warehouse system. Now the compensating action is expected to undo it. Probably the easiest way to model such cancellation is by sending a different message with cancel command to the JMS broker queue. We expect there is a system listening to messages which process the real order cancellation later on.

Sagas and ACID

We consider two-phase commit to being compliant with ACID. It provides atomicity - the all actions are committed or rolled-back, consistency - system is inconsistent state before transaction begins and after it ends, isolation - isolation provided by resource-located transactions (ie. locking in database) prevents the inconsistent isolation, we can say that except for heuristic, the data is as if serialized access was maintained. And of course, durability is ensured with transaction logs.

On the other hand, we consider Sagas to relax the isolation property. The outcome of each operation, each single resource-located transaction, is visible it ends. From other ACID properties the atomicity is preserved - the whole Saga either ends successfully or all the work is compensated, durability is preserved - all data is persisted at the end of the Saga, consistency - consistency mostly depends on the application programmer but at the end of the Saga the system state should be consistent. In fact, this is the same matter as for the 2PC.

In the case of two-phase commit the length of the resource-located transaction spans almost over the whole global transaction lifetime. Impact on the concurrency depends on implementation of the underlying system but as an example, in relational databases we can expect locks which restricts concurrent work over the same data (locks are acquired for records being work with, and depending on isolation level the lock could be acquired even for the whole table where record resides, despite this is simplified as many databases uses optimistic concurrency where locks are taken at prepare time).
On the other hand, a Saga commits the resource-located transactions immediately after each step in a business process ends. The time when locks are acquired is just that short. Business logic inserts data to the database table, immediately commits the resource-located transactions and locks are released. From this point of view, the Saga more readily facilitates concurrent processing.

Blending Sagas and ACID

One advantage of the two-phase commit approach is ease of use for the application programmer. It’s on the transaction manager to manage all the transaction troubleshooting. The programmer cares only for his business logic - sending a message to the queue, inserting data to the database. On the other hand, Sagas do require that a compensating action being created and such action has to be defined for any Saga in particular.

Saga is a good fit for long-lived transactions (LLT)[7] which are represented by the well-known example of booking a flight, a hotel and a taxi to the hotel in one transaction. We expect communication with third party systems over WebService or REST call. All such communication is “time-consuming” and using two-phase commit locking resources for the whole time of the transaction existence is not a good solution. Here the Saga is nice as it provides a more fine-grained approach to work with the transaction participants.

Some people have the impression that 2PC and Saga stand once against other. That’s truly not so. Each of them plays well its role and helps to solve particular use cases. Furthermore, 2PC transactions and Sagas can be deployed in such a manner as to benefit from both of their advantages.

The Saga (long running action) could define the top level work unit. ACID transactions are the good fit for being steps (operations) of such unit of work.


Where does Narayana fit in

As you might know, Narayana - the premier open source transaction manager - provides implementations of both of these transaction models, suitable for a wide variety of deployment environments.

Many of you are likely familiar most with our JTA 2PC implementation as used in JBoss EAP and the open source WildFly application server. That implementation is based on X/Open specification[12]. For more information, you can refer to the Narayana documentation[10].
In terms of compensations; if you want to stick to standards, you can already use Narayana to achieve saga based transaction using the Narayana WS-BA implementation. However, Narayana also provides a modern compensating transactions framework based on CDI. For more information please refer to a great article from Paul about the compensating framework[5] or visit our Narayana quickstarts[6].


References
[1] https://www.youtube.com/watch?v=5ZjhNTM8XU8 (Transactions: myths, surprises and opportunities, Martin Kleppmann)
[2] https://wiki.postgresql.org/wiki/SSI (Serializable Snapshot Isolation (SSI) in PostgreSQL)
[3] https://en.wikipedia.org/wiki/Isolation_(database_systems) (Wikipedia, Isolation - database systems)
[5] https://developer.jboss.org/wiki/CompensatingTransactionsWhenACIDIsTooMuch (Narayana: Compensating Transactions: When ACID is too much)
[6] https://github.com/jbosstm/quickstart/tree/master/compensating-transactions (Narayana quickstart to compensating transactions)
[8] https://www.youtube.com/watch?v=xDuwrtwYHu8 (GOTO 2015, Applying the Saga Pattern, Caitie McCaffrey)
[10] http://narayana.io/documentation/index.html (Narayana documentation)
[11] https://www.progress.com/tutorials/jdbc/understanding-jta (Understanding JTA - The Java Transaction API)
[12] http://pubs.opengroup.org/onlinepubs/009680699/toc.pdf (The XA Specification - The Open Group Publications Catalog)

Wednesday, May 24, 2017

Narayana 5.6.0.Final released

The team are pleased to announce our latest release of Narayana - the premier open source transaction manager. As normal, the release may be downloaded from our website over here:
http://narayana.io/

The release notes for this version may be found over here:

This release brings with it a collection of improvements including the latest patches and feature work to improve our integration into Tomcat. The best way to get started with that is to take a look at our new quickstarts:

We also added an interesting Software Transactional Memory and vert.x example over here:

We have also managed to get CORBA JTS interop propagation working with Glassfish. You can read more about that in https://issues.jboss.org/browse/JBTM-2623. However validating recovery completes correctly is still in progress so stay tuned for further details in the coming releases...

Monday, May 22, 2017

Transactions and Microservices on the rise!

Our team spoke about how transactions and microservices can be used together over 3 years ago. Since then there have been a few people continuing to suggest there's no role for transactions. However, we're also starting to see more people talking about the same things such as at DevoxxUK and elsewhere. That's great! We'll keep pushing Narayana in this space and if you're interested then get involved with us too!

Wednesday, January 25, 2017

New Narayana release and Spring Boot quickstart

Last week we released Narayana 5.5.1.Final which is important if you’d like to use Narayana with Spring Boot. More specifically the new release contains a bug fix which resolves issues when making multiple database calls in a same transaction.

Subsequent to this release, we’ve added a new Narayana Spring Boot quickstart to our repository. It is a very much simplified stock market application where users can buy and sell shares. Every transaction executes a couple of database updates and sends a few notifications to a message queue. I’m not going to go through the code in this quickstart in depth, because most of it is pretty straightforward. However, there are a few bits which needs an explanation.

Making sure we are using Narayana


To begin with let’s go through the setup. Most of the necessary dependencies can be added by Spring Initializr:


Notice how “Narayana (JTA)” has been added to the “Selected Dependencies”.

Making sure we are using the right version of Narayana


Now, we need to make sure we’re running the latest version of Narayana. There are a number of options to do that as explained in a post on a Spring blog. But in this case, overriding version property is the easiest option:
<narayana.version>5.5.1.Final</narayana.version> 
<jboss-transaction-spi.version>7.5.0.Final</jboss-transaction-spi.version>
You can see this in the applications pom over here.

Observing the transaction demarcation


In this application, both buying and selling operations are transactional. Buy method looks like this:

And sell method looks like this:

Note on Artemis


Artemis broker is not available on Spring Initializr and to make our application self contained we would like to use an embedded broker. To do this we add the following dependency into the applications pom:
<dependency>
    <groupId>org.apache.activemq</groupId>
    <artifactId>artemis-jms-server</artifactId>
</dependency>
Everything else is quite self explanatory. Go ahead and try it out over here.
If you have comments or feedback please do contact us over on our forum.

Monday, October 17, 2016

Achieving Consistency in a Microservices Architecture

Microservices are loosely coupled independently deployable services. Although a well designed service will not directly operate on shared data it may still need to ensure that that data will ultimately remain consistent. For example, the requirement to debit an account to pay for an on-line purchase creates a dependency between the customer and supplier account balances and the stock database. Historically a distributed transaction has been used to maintain this consistency which in turn will employ some flavour of distributed locking during the data update phase. This dependency introduces tight coupling, higher latencies and greater lock contention, especially when failures occur where the locks cannot be released until all services involved in the transaction become available again. Whilst the user of the system may be satisfied with this state of affairs it should not be the only possible interaction pattern. A more common approach will employ the notion of eventual consistency where the data may sometimes be in an inconsistent state but will eventually come back into the desired state: in our example the stock level will be reduced, the payment has been processed and the item delivered.

I have, from time to time, seen blogs and articles that recognise this problem and suggest solutions but they seem to mandate that either service calls naturally map on to a single data update or that the service writer picks one of the services to do the coordination taking on the responsibility of ensuring that all services involved in the interaction will eventually reach their target consistent state (see for example a quote from the article Distributed Transactions: The Icebergs of Microservices: "you have to pick one of the services to be the primary handler for the event. It will handle the original event with a single commit, and then take responsibility for asynchronously communicating the secondary effects to other services"). This sounds feasible but now you have to start thinking about how to provide the reliability guarantees in the presence of failures, how to orchestrate services, storing extra state with every persistent update so that the activity coordinator can continue the interaction after failures have been resolved. In other words, whilst this is a workable approach it hides much of the complexity involved in reliably recovering from system and network failures which at scale will surely happen. A more robust design for microservice architectures is to delegate the coordination component of the workflow to a specialised service explicitly designed for this kind of task.

We have been working in this area for many years and one set of ideas and protocols that we believe are particularly suited to microservices architectures is the use of compensatable units of work to achieve eventual consistency guarantees in this kind of loosely coupled service based environment. I produced a write up of the approach and accompanying protocol for use in REST based systems back in 2009 (Compensating RESTful Transactions) based on earlier work done by Mark Little et al. Mark also wrote some interesting blogs in 2011 (When ACID is too strong and Slightly alkaline transactions if you please ...) about alternatives to ACID when various constraints are loosened and his summary is relevant to the problems facing microservice architects.

The use of compensations, coordinated by a dedicated service, will give all the benefits suggested in Graham Lea's article referred to earlier, but with the additional guarantees of consistency, reliability, manageability, reduced complexity etc in the presence of failures. The essence of the idea is that the prepare step is skipped and instead the services involved in the interaction register compensation actions with a dedicated coordinator:

  1. The client creates a coordination resource (identified via a resource url)
  2. The client makes service invocations passing the coordinator url by some (unspecified) mechanism
  3. The service registers its compensate logic with the coordinator and performs the service request as normal
  4. When the client is done it tells the coordinator to complete or cancel the interaction
    • in the complete case the coordinator has nothing to do (except clean up actions)
    • in the cancel case the coordinator initiates the undo logic. Services are not allowed to fail this step. If they are not available or cannot compensate for the activity immediately the coordinator will keep on trying until all services have compensated (and only then will it clean up)

We do not have an implementation of this (JDI) protocol but we do have an implementation of an ACID variant of it (called RTS) which has had extensive exposure in the field (and this can/will serve as the basis for the implementation of the JDI protocol). The documentation for RTS is available at our project web site. The nice thing about this work is that it can integrate seamlessly into Java EE environments and additionally is available as a WildFly subsystem. This latter feature means that it can be packaged as a WildFly Swarm microservice using the WildFly Swarm Project Generator. In this way if your microservices are using REST for API calls then they can make immediate use of this feature.

We also have a working prototype framework for how to do compensations in a Java SE environment. The API is available at github where we also provide a number of quickstarts showing how to use it.

Finally, we have a solution where we allow the compensation data to be stored at the same time as the data updates in a single (one phase) transaction thus ensuring that the coordinator will have access to the compensation data. This technique works particularly well with document oriented databases such as MongoDB

Monday, June 13, 2016

Karaf Integration

Narayana was introduced in the karaf 4.1.0-SNAPSHOT with 5.3.2.Final. You need to build from https://github.com/apache/karaf

Configuration

The narayana configuration file could be found in <karaf-4.1.0-SNAPSHOT>/etc/org.jboss.nararayana.cfg

Quickstart

First you need to install the narayana transaction manager feather and others related.
 karaf@root()> repo-add mvn:org.ops4j.pax.jdbc/pax-jdbc-features/0.8.0/xml/features
 karaf@root()> feature:install pax-jdbc-pool-narayana jdbc pax-jdbc-h2 transaction-manager-narayana jndi
 karaf@root()> jdbc:ds-create --driverName H2-pool-xa -dbName test test
 karaf@root()> bundle:install -s mvn:org.jboss.narayana.quickstarts.osgi/osgi-jta-example/5.3.2.Final

Run the commit example

karaf@root()> narayana-quickstart:testCommit

Run the recovery example

karaf@root()> narayana-quickstart:testRecovery -f
It could crash the karaf and generate the record to recovery. You need to restart the karaf and run the testRecovery command again.
bin/karaf
karaf@root()> narayana-quickstart:testRecovery

Admin tools

We are working on the JBTM-2624 [1] and support the commands
narayana:refresh                          Refresh the view of the object store
narayana:types                            List record types
narayana:select type                   Select a particular transaction type
narayana:ls [type]                        List the transactions
narayana:attach id                       Attach to a transaction log
narayana:detach id                      Detach to the transaction log
narayana:forget idx                      Move the specified heuristic participant back to the prepared list
narayana:delete idx                     Delete the specified heuristic participant

[1] https://issues.jboss.org/browse/JBTM-2624

Friday, June 3, 2016

Narayana in Spring Boot

It’s been available for over a month now, so some of you might have used it already. But I’m writing this post in order to give a better explanation of how to use Narayana transaction manager in your Spring Boot application.

First of all, Narayana integration was introduced in Spring Boot 1.4.0.M2, so make sure you’re up to date. At the moment of writing most recent available version is 1.4.0.M3.

Once you have versions sorted out, it’s a good idea to try it out. And in the rest of this post I’ll explain the quickstart application and what it does. After that you should be good to go with incorporating it in your code. The source code of this quickstart can be found in our GitHub repository [1].

Enabling Narayana

To enable Narayana transaction manager add its starter dependency to your pom.xml:
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-jta-narayana</artifactId>
</dependency>
After that Narayana will become a default transaction manager in your Spring Boot application. From then on simply use JTA or Spring annotations to manage the transactions.

Narayana configuration

Subset of Narayana configuration options is available via Spring’s application.properties file. It is the most convenient way to configure Narayana, if you don’t require to change a lot of its settings. For the list of possible options see properties prefixed with spring.jta.narayana in [2].
In addition, all traditional Narayana configuration options are also available. You can place jbossts-properties.xml in your application’s jar as well as use our configuration beans.

Quickstart explanation

Our Spring Boot quickstart [1] is a simple Spring Boot application. By exploring its code you can see how to set up Narayana for Spring Boot as well as configure it with application.properties file.
We have implemented three scenarios for you to demonstrate: commit, rollback, and crash recovery. They can be executed using Spring Boot Maven plugin. Please see the README.md for the exact steps of executing each example.

Commit and rollback examples are very straightforward and almost identical. They both Start the transaction, save the entry with your passed string to the database, send a JMS message, and commit/rollback the transaction.
Commit example outcome should look like this:
Entries at the start: []
Creating entry 'Test Value'
Message received: Created entry 'Test Value'
Entries at the end: [Entry{id=1, value='Test Value'}]
And rollback example outcome should be like this:
Entries at the start: []
Creating entry 'Test Value'
Entries at the end: []
Crash recovery scenario starts off the same as the other two, but then crashes the application between prepare and commit stages. Later, once you restart the application, the unfinished transaction is recovered. I need to note, that in this example we’ve added a DummyXAResource in order to allow us to crash the application on the right time. Feel free to ignore it, because it is in there only for the purpose of this example.
After the application is crashed you console outcome should look like this:
Entries at the start: []
Creating entry 'Test Value'
Preparing DummyXAResource
Committing DummyXAResource
Crashing the system
And after it is recovered the following should be printed:
Entries at the start: []
DummyXAResourceRecovery returning list of resources: [org.jboss.narayana.quickstart.spring.DummyXAResource@5bc98bd2]
Committing DummyXAResource
Message received: Created entry 'Test Value'
DummyXAResourceRecovery returning list of resources: []
Recovery completed successfully
Entries at the end: [Entry{id=1, value='Test Value'}]
Hope you'll enjoy using our transaction manager with Spring. And as always, if you have any insights or requests, feel free to post them on our forum.