You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ritesh Tijoriwala <ti...@gmail.com> on 2011/02/16 01:45:43 UTC

Patterns for writing enterprise applications on cassandra

Hi,
I have general questions on writing enterprise applications on cassandra. I
come from a background which involves writing enterprise applications using
DBMS.

What are the general patterns people follow in Cassandra world when
migrating a code that is within transaction boundaries in a traditional DBMS
application? for e.g. transfer $5 from account A to account B. The code
would normally look like:

        beginXT
        try {
                  A = A - $5;
                  B = B + $5;
                  commitXT;
        } catch (....) {
                  rollbackXT;
        }

The effect of this is that either both statements execute, or none. The sum
of account balances remain constant. How does one deal with this type of
code when writing on top of Cassandra? I understand that consistency will be
eventual and its fine that eventually, sum of both account balances remain
constant but how to detect that a transaction failed and only step "A = A -
$5" has executed and the later step has not been executed?

Are there any sample applications out there where I can browse code and see
how it is written? For e.g. customer purchase order application, etc. which
atleast involves some concept of transaction and has code to keep things
consistent.

Thanks,
Ritesh

Re: Patterns for writing enterprise applications on cassandra

Posted by "tijoriwala.ritesh" <ti...@gmail.com>.

Adding to the above message of mine, bulk atomic writes (or transaction
blocks) tend to be a common pattern in rich enterprise applications where
business logic requires "all or no writes" on set of entities. There may not
be a need for all "ACID" properties but atleast atomicity and durability are
a must.
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033178.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Patterns for writing enterprise applications on cassandra

Posted by "tijoriwala.ritesh" <ti...@gmail.com>.

thanks Dave. So general purpose mechanism would be use RDBMS for data that
requires locking semantics or use something like "Cages" on top of Cassandra
and then use Cassandra for data mining/high throughput read queries and
writable data that does not require transactions?

Are there any sample applications or open source projects that use Cassandra
and involve some application of transactions? I cannot imagine a fully
functional enterprise application not involving transactions...

Sorry about the long messages but I am trying to learn on how to decide on
design..

Thanks,
Ritesh
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033262.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Patterns for writing enterprise applications on cassandra

Posted by Anthony John <ch...@gmail.com>.

Dave,

I agree with you, mostly ;) !!

While the reference to 2PC is a tad misplaced here - the idea is that the
paradigm of transactions might have to get redefined or - better still -
broadened to include protocols that the provide similar guarantees in an
eventually consistent dispensation.

Bottom line - just like eventual consistency takes getting used to and
absorbed - we will figure out ways to write applications that require
transactions (per current definition) against a datastore that does not
strictly support it. Again, remember the current notion of transaction
(historically speaking) came after practice of transaction processing.

One can think of many ways to doing this, but no point in bloviating without
having something to show.

HTH,

-JA

On Wed, Feb 16, 2011 at 5:34 PM, Dave Revell <da...@meebo-inc.com> wrote:

> Re Anthony's statement:
>
> > So it can be done and frameworks like CAGES are showing a way forward.
> At
> > the heart of it, there will need to be a Two-Phase commit type protocol
> > coordinator that sits in front of Cassandra. Of which - one can be sure -
> there
> > will be many implementations / best practices in the coming months.
>
> I disagree. I think anyone who wants transactions should pick a database
> that supports them. Bolting a transactional system on top could perhaps be
> made to work at great cost if you always used CL ALL for every operation. I
> personally don't think it's possible, but I can't actually prove it.
>
> Consider how to enforce:
> 1) atomicity: you need some kind of undo/redo logging system with crash
> recovery to handle partially-executed transactions. This is a lot of tricky
> Cassandra-specific code. A locking system isn't good enough.
> 2) isolation: lock managers are f*&^ing hard, especially handling the
> failure cases. Performant deadlock detection is difficult. Getting
> sufficiently fine-grained locks would require Cassandra-specific code.
>
> I'm trying to argue that these features belong inside the database, and not
> bolted on top, so you should use a database that includes them.
>
> Plainly: don't use Cassandra for applications that require
> transactions. However, if you can express your app without the need of
> transactions, that where Cassandra really shines.
>
> +1 on Nate's recommendation to read the Helland paper.
>
> Contentiously,
> Dave
>
> On Wed, Feb 16, 2011 at 2:20 PM, Nate McCall <na...@datastax.com> wrote:
>
>> I found the following paper (PDF) very helpful in shaping my thoughts
>> about what it means to build systems without transactions.
>>
>> http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
>>
>> "LIfe Beyond Distributed Transactions: an Apostate's Opinion" by Pat
>> Helland
>>
>> On Wed, Feb 16, 2011 at 2:00 PM, tijoriwala.ritesh
>> <ti...@gmail.com> wrote:
>> >
>> > Thanks a lot Anthony. That does help me think on possible options...
>> > --
>> > View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033316.html
>> > Sent from the cassandra-user@incubator.apache.org mailing list archive
>> at Nabble.com.
>> >
>>
>
>

Re: Patterns for writing enterprise applications on cassandra

Posted by Dave Revell <da...@meebo-inc.com>.

Re Anthony's statement:

> So it can be done and frameworks like CAGES are showing a way forward. At
> the heart of it, there will need to be a Two-Phase commit type protocol
> coordinator that sits in front of Cassandra. Of which - one can be sure -
there
> will be many implementations / best practices in the coming months.

I disagree. I think anyone who wants transactions should pick a database
that supports them. Bolting a transactional system on top could perhaps be
made to work at great cost if you always used CL ALL for every operation. I
personally don't think it's possible, but I can't actually prove it.

Consider how to enforce:
1) atomicity: you need some kind of undo/redo logging system with crash
recovery to handle partially-executed transactions. This is a lot of tricky
Cassandra-specific code. A locking system isn't good enough.
2) isolation: lock managers are f*&^ing hard, especially handling the
failure cases. Performant deadlock detection is difficult. Getting
sufficiently fine-grained locks would require Cassandra-specific code.

I'm trying to argue that these features belong inside the database, and not
bolted on top, so you should use a database that includes them.

Plainly: don't use Cassandra for applications that require
transactions. However, if you can express your app without the need of
transactions, that where Cassandra really shines.

+1 on Nate's recommendation to read the Helland paper.

Contentiously,
Dave

On Wed, Feb 16, 2011 at 2:20 PM, Nate McCall <na...@datastax.com> wrote:

> I found the following paper (PDF) very helpful in shaping my thoughts
> about what it means to build systems without transactions.
>
> http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
>
> "LIfe Beyond Distributed Transactions: an Apostate's Opinion" by Pat
> Helland
>
> On Wed, Feb 16, 2011 at 2:00 PM, tijoriwala.ritesh
> <ti...@gmail.com> wrote:
> >
> > Thanks a lot Anthony. That does help me think on possible options...
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033316.html
> > Sent from the cassandra-user@incubator.apache.org mailing list archive
> at Nabble.com.
> >
>

Re: Patterns for writing enterprise applications on cassandra

Posted by Nate McCall <na...@datastax.com>.

I found the following paper (PDF) very helpful in shaping my thoughts
about what it means to build systems without transactions.

http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

"LIfe Beyond Distributed Transactions: an Apostate's Opinion" by Pat Helland

On Wed, Feb 16, 2011 at 2:00 PM, tijoriwala.ritesh
<ti...@gmail.com> wrote:
>
> Thanks a lot Anthony. That does help me think on possible options...
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033316.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

Re: Patterns for writing enterprise applications on cassandra

Posted by "tijoriwala.ritesh" <ti...@gmail.com>.

Thanks a lot Anthony. That does help me think on possible options...
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033316.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Patterns for writing enterprise applications on cassandra

Posted by Anthony John <ch...@gmail.com>.

Ritesh,

The gist of Dave's contention is that Casandra adds value in spite of the
lack of transactions. However, that need not mean that it can be used for
Enterprise applications. Transaction semantics needs to be re-imagined
within the capabilities of this new kind of database infrastructure, which
addresses some key challenges in scaling.

In some sense - RDBMSs have spoiled us to expect ACID transactions in the
database. Remember, transaction processing is older than RDBMSs - it was
done on COBOL/mainframes and a bulk of banking transactions - even today -
do not go through ACID supporting database platforms. They still live on the
mainframe.

So it can be done and frameworks like CAGES are showing a way forward. At
the heart of it, there will need to be a Two-Phase commit type protocol
coordinator that sits in front of Cassandra. Of which - one can be sure -
there will be many implementations / best practices in the coming months.

HTH,

-JA

On Wed, Feb 16, 2011 at 1:31 PM, Dave Revell <da...@meebo-inc.com> wrote:

> Ritesh,
>
> There don't seem to be any common best practices to do this. I think the
> reason is that by adding transaction semantics on top of Cassandra you're
> throwing away the most important properties of Cassandra. The effects of a
> transaction/locking layer:
>
> - A centralized performance bottleneck that won't scale linearly
> - Complex failure detection and recovery
> - Reduced availability/partition tolerance (CAP: C prevents simultaneous A
> and P)
> - High latency for geographically remote clients
> - Lower throughput due to enforced serial ordering of transactions
>
> There are probably other reasons that didn't occur to me. Cassandra's great
> at doing what it does, but it's certainly not a general purpose
> transactional database for all use cases.
>
> -Dave
>
> On Wed, Feb 16, 2011 at 11:19 AM, tijoriwala.ritesh <
> tijoriwala.ritesh@gmail.com> wrote:
>
>>
>> Hi Gaurav,
>> Thanks for the reply...I did look at the cages framework and I see that it
>> provides some functionality for locking and atomic writes for multiple
>> keys.
>> My question was that do people rely on these kind of frameworks - if so,
>> is
>> cages the only one or are there others as well...and if not, what do they
>> do
>> to solve these kind of problems...
>>
>> Thanks,
>> Ritesh
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033138.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>

Re: Patterns for writing enterprise applications on cassandra

Posted by Dave Revell <da...@meebo-inc.com>.

Ritesh,

There don't seem to be any common best practices to do this. I think the
reason is that by adding transaction semantics on top of Cassandra you're
throwing away the most important properties of Cassandra. The effects of a
transaction/locking layer:

- A centralized performance bottleneck that won't scale linearly
- Complex failure detection and recovery
- Reduced availability/partition tolerance (CAP: C prevents simultaneous A
and P)
- High latency for geographically remote clients
- Lower throughput due to enforced serial ordering of transactions

There are probably other reasons that didn't occur to me. Cassandra's great
at doing what it does, but it's certainly not a general purpose
transactional database for all use cases.

-Dave

On Wed, Feb 16, 2011 at 11:19 AM, tijoriwala.ritesh <
tijoriwala.ritesh@gmail.com> wrote:

>
> Hi Gaurav,
> Thanks for the reply...I did look at the cages framework and I see that it
> provides some functionality for locking and atomic writes for multiple
> keys.
> My question was that do people rely on these kind of frameworks - if so, is
> cages the only one or are there others as well...and if not, what do they
> do
> to solve these kind of problems...
>
> Thanks,
> Ritesh
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033138.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: Patterns for writing enterprise applications on cassandra

Posted by "tijoriwala.ritesh" <ti...@gmail.com>.

Hi Gaurav,
Thanks for the reply...I did look at the cages framework and I see that it
provides some functionality for locking and atomic writes for multiple keys.
My question was that do people rely on these kind of frameworks - if so, is
cages the only one or are there others as well...and if not, what do they do
to solve these kind of problems...

Thanks,
Ritesh
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6033138.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Patterns for writing enterprise applications on cassandra

Posted by buddhasystem <po...@bnl.gov>.

FWIW,

we'll keep RDBMS for transactional data, and Cassandra will be used for
referential data (browsing history and data mining). Horses for courses.

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6030436.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Patterns for writing enterprise applications on cassandra

Posted by Zhongwei Sun <zh...@gmail.com>.

Is there any Python implementation for transaction?


2011/2/16 Gaurav Sharma <ga...@gmail.com>:
> Enterprise applications is a very broad topic. There's no one answer for every type.
>
> You specifically mention a transactional scenario. For that, I can recommend you look at Cages (http://code.google.com/p/cages) if you haven't already.
>
> On Feb 15, 2011, at 19:45, Ritesh Tijoriwala <ti...@gmail.com> wrote:
>
>> Hi,
>> I have general questions on writing enterprise applications on cassandra. I come from a background which involves writing enterprise applications using DBMS.
>>
>> What are the general patterns people follow in Cassandra world when migrating a code that is within transaction boundaries in a traditional DBMS application? for e.g. transfer $5 from account A to account B. The code would normally look like:
>>
>>         beginXT
>>         try {
>>                   A = A - $5;
>>                   B = B + $5;
>>                   commitXT;
>>         } catch (....) {
>>                   rollbackXT;
>>         }
>>
>> The effect of this is that either both statements execute, or none. The sum of account balances remain constant. How does one deal with this type of code when writing on top of Cassandra? I understand that consistency will be eventual and its fine that eventually, sum of both account balances remain constant but how to detect that a transaction failed and only step "A = A - $5" has executed and the later step has not been executed?
>>
>> Are there any sample applications out there where I can browse code and see how it is written? For e.g. customer purchase order application, etc. which atleast involves some concept of transaction and has code to keep things consistent.
>>
>> Thanks,
>> Ritesh
>

Re: Patterns for writing enterprise applications on cassandra

Posted by Gaurav Sharma <ga...@gmail.com>.

Enterprise applications is a very broad topic. There's no one answer for every type.

You specifically mention a transactional scenario. For that, I can recommend you look at Cages (http://code.google.com/p/cages) if you haven't already.

On Feb 15, 2011, at 19:45, Ritesh Tijoriwala <ti...@gmail.com> wrote:

> Hi,
> I have general questions on writing enterprise applications on cassandra. I come from a background which involves writing enterprise applications using DBMS.
> 
> What are the general patterns people follow in Cassandra world when migrating a code that is within transaction boundaries in a traditional DBMS application? for e.g. transfer $5 from account A to account B. The code would normally look like:
> 
>         beginXT
>         try {
>                   A = A - $5;
>                   B = B + $5;
>                   commitXT;
>         } catch (....) {
>                   rollbackXT;
>         }
> 
> The effect of this is that either both statements execute, or none. The sum of account balances remain constant. How does one deal with this type of code when writing on top of Cassandra? I understand that consistency will be eventual and its fine that eventually, sum of both account balances remain constant but how to detect that a transaction failed and only step "A = A - $5" has executed and the later step has not been executed? 
> 
> Are there any sample applications out there where I can browse code and see how it is written? For e.g. customer purchase order application, etc. which atleast involves some concept of transaction and has code to keep things consistent.
> 
> Thanks,
> Ritesh