You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alex Araujo <ca...@alex.otherinbox.com> on 2011/04/08 22:26:38 UTC

Atomicity Strategies

Hi, I was wondering if there are any patterns/best practices for 
creating atomic units of work when dealing with several column families 
and their inverted indices.

For example, if I have Users and Groups column families and did 
something like:

Users.insert( user_id, columns )
UserGroupTimeline.insert( group_id, { timeuuid() : user_id } )
UserGroupStatus.insert( group_id + ":" + user_id, { "Active" : "True" } )
UserEvents.insert( timeuuid(), { "user_id" : user_id, "group_id" : 
group_id, "event_type" : "join" } )

Would I want the client to retry all subsequent operations that failed 
against other nodes after n succeeded,  maintain an "undo" queue of 
operations to run, batch the mutations and choose a strong consistency 
level, some combination of these/others, etc?

Thanks,
Alex

Re: Atomicity Strategies

Posted by AJ <aj...@dude.podzone.net>.

Thanks Aaron!

On 6/22/2011 5:25 PM, aaron morton wrote:
> Atomic on a single machine yes.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23 Jun 2011, at 09:42, AJ wrote:
>
>> On 4/9/2011 7:52 PM, aaron morton wrote:
>>> My understanding of what they did with locking (based on the examples) was to achieve a level of transaction isolation http://en.wikipedia.org/wiki/Isolation_(database_systems)<http://en.wikipedia.org/wiki/Isolation_%28database_systems%29>
>>>
>>> I think the issue here is more about atomicity http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
>>>
>>> We cannot guarantee that all or none of the mutations in your batch are completed. There is some work in this area though https://issues.apache.org/jira/browse/CASSANDRA-1684
>>>
>> Just to be clear, you are speaking in the general sense, right?  The batch mutate link you provide says that in the case that ALL the mutates of the batch are for the SAME key (row), then the whole batch is atomic:
>>
>>     "As a special case, mutations against a single key are atomic but not isolated."
>>
>> So, is it true that if I want to update multiple columns for one key, then it will be an all or nothing update for the whole batch if using batch update?  But, if your batch mutate containts mutates for more than one key, then all the updates for one key will be atomic, followed by all the updates for the next key will be atomic, and so on.  Correct?
>>
>> Thanks!
>>
>

Re: Atomicity Strategies

Posted by aaron morton <aa...@thelastpickle.com>.

Atomic on a single machine yes. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 09:42, AJ wrote:

> On 4/9/2011 7:52 PM, aaron morton wrote:
>> My understanding of what they did with locking (based on the examples) was to achieve a level of transaction isolation http://en.wikipedia.org/wiki/Isolation_(database_systems) <http://en.wikipedia.org/wiki/Isolation_%28database_systems%29>
>> 
>> I think the issue here is more about atomicity http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
>> 
>> We cannot guarantee that all or none of the mutations in your batch are completed. There is some work in this area though https://issues.apache.org/jira/browse/CASSANDRA-1684
>> 
> 
> Just to be clear, you are speaking in the general sense, right?  The batch mutate link you provide says that in the case that ALL the mutates of the batch are for the SAME key (row), then the whole batch is atomic:
> 
>    "As a special case, mutations against a single key are atomic but not isolated."
> 
> So, is it true that if I want to update multiple columns for one key, then it will be an all or nothing update for the whole batch if using batch update?  But, if your batch mutate containts mutates for more than one key, then all the updates for one key will be atomic, followed by all the updates for the next key will be atomic, and so on.  Correct?
> 
> Thanks!
>

Re: Atomicity Strategies

Posted by AJ <aj...@dude.podzone.net>.

On 4/9/2011 7:52 PM, aaron morton wrote:
> My understanding of what they did with locking (based on the examples) 
> was to achieve a level of transaction isolation 
> http://en.wikipedia.org/wiki/Isolation_(database_systems) 
> <http://en.wikipedia.org/wiki/Isolation_%28database_systems%29>
>
> I think the issue here is more about atomicity 
> http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
>
> We cannot guarantee that all or none of the mutations in your batch 
> are completed. There is some work in this area though 
> https://issues.apache.org/jira/browse/CASSANDRA-1684
>

Just to be clear, you are speaking in the general sense, right?  The 
batch mutate link you provide says that in the case that ALL the mutates 
of the batch are for the SAME key (row), then the whole batch is atomic:

     "As a special case, mutations against a single key are atomic but 
not isolated."

So, is it true that if I want to update multiple columns for one key, 
then it will be an all or nothing update for the whole batch if using 
batch update?  But, if your batch mutate containts mutates for more than 
one key, then all the updates for one key will be atomic, followed by 
all the updates for the next key will be atomic, and so on.  Correct?

Thanks!

Re: Atomicity Strategies

Posted by Roland Gude <ro...@yoochoose.com>.

A Strategy that should Cover at least some use Cases is roughly like this:

Given cf A and B should Be in Sync
In write 'a' to cf A Add another Column 'Synchronisation_token' and Write a tuuid 'T' (or a timestamp or some Otter Value that Allows (Time based) ordering) As its value.
On the related write to cfB Write the Token As well.
When Reading check Client Side if tokens Match and reread Data with Lower Token until it does.

Roland

Am 10.04.2011 um 03:53 sc"aaron morton" <aa...@thelastpickle.com>>:

My understanding of what they did with locking (based on the examples) was to achieve a level of transaction isolation <http://en.wikipedia.org/wiki/Isolation_(database_systems)> http://en.wikipedia.org/wiki/Isolation_(database_systems)

I think the issue here is more about atomicity <http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic> http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

<http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic>We cannot guarantee that all or none of the mutations in your batch are completed. There is some work in this area though <https://issues.apache.org/jira/browse/CASSANDRA-1684> https://issues.apache.org/jira/browse/CASSANDRA-1684

<https://issues.apache.org/jira/browse/CASSANDRA-1684>AFAIK the best approach now is to work at Quourm, and write your code to handle missing relations. Also cassandra does do a lot of work upfront before the write starts to ensure it will succeed, failures during a write will probably be due to a SW/HW failure or overload on a node that gossip has not picked up.

Retrying is the recommended approach when a request fails.

Hope that helps.
Aaron

On 9 Apr 2011, at 15:58, Dan Washusen wrote:

Here's a good writeup on how <http://www.fightmymonster.com/> fightmymonster.com<http://fightmymonster.com> does it...

<http://ria101.wordpress.com/category/nosql-databases/locking/>http://ria101.wordpress.com/category/nosql-databases/locking/

--
Dan Washusen
Make big files fly
visit <http://digitalpigeon.com/> digitalpigeon.com<http://digitalpigeon.com>

On Saturday, 9 April 2011 at 11:53 AM, Alex Araujo wrote:

On 4/8/11 5:46 PM, Drew Kutcharian wrote:
I'm interested in this too, but I don't think this can be done with Cassandra alone. Cassandra doesn't support transactions. I think hector can retry operations, but I'm not sure about the atomicity of the whole thing.

On Apr 8, 2011, at 1:26 PM, Alex Araujo wrote:

Hi, I was wondering if there are any patterns/best practices for creating atomic units of work when dealing with several column families and their inverted indices.

For example, if I have Users and Groups column families and did something like:

Users.insert( user_id, columns )
UserGroupTimeline.insert( group_id, { timeuuid() : user_id } )
UserGroupStatus.insert( group_id + ":" + user_id, { "Active" : "True" } )
UserEvents.insert( timeuuid(), { "user_id" : user_id, "group_id" : group_id, "event_type" : "join" } )

Would I want the client to retry all subsequent operations that failed against other nodes after n succeeded, maintain an "undo" queue of operations to run, batch the mutations and choose a strong consistency level, some combination of these/others, etc?

Thanks,
Alex
Thanks Drew. I'm familiar with lack of transactions and have read about
people usiing ZK (possibly Cages as well?) to accomplish this, but since
it seems that inverted indices are common place I'm interested in how
anyone is mitigating lack of atomicity to any extent without the use of
such tools. It appears that Hector and Pelops have retrying built in to
their APIs and I'm fairly confident that proper use of those
capabilities may help. Just trying to cover all bases. Hopefully
someone can share their approaches and/or experiences. Cheers, Alex.

Re: Atomicity Strategies

Posted by aaron morton <aa...@thelastpickle.com>.

My understanding of what they did with locking (based on the examples) was to achieve a level of transaction isolation http://en.wikipedia.org/wiki/Isolation_(database_systems)  

I think the issue here is more about atomicity http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

We cannot guarantee that all or none of the mutations in your batch are completed. There is some work in this area though https://issues.apache.org/jira/browse/CASSANDRA-1684

AFAIK the best approach now is to work at Quourm, and write your code to handle missing relations. Also cassandra does do a lot of work upfront before the write starts to ensure it will succeed, failures during a write will probably be due to a SW/HW failure or overload on a node that gossip has not picked up. 

Retrying is the recommended approach when a request fails. 

Hope that helps. 
Aaron

On 9 Apr 2011, at 15:58, Dan Washusen wrote:

> Here's a good writeup on how fightmymonster.com does it...
> 
> http://ria101.wordpress.com/category/nosql-databases/locking/
> 
> -- 
> Dan Washusen
> Make big files fly
> visit digitalpigeon.com
> On Saturday, 9 April 2011 at 11:53 AM, Alex Araujo wrote:
> 
>> On 4/8/11 5:46 PM, Drew Kutcharian wrote:
>>> I'm interested in this too, but I don't think this can be done with Cassandra alone. Cassandra doesn't support transactions. I think hector can retry operations, but I'm not sure about the atomicity of the whole thing.
>>> 
>>> 
>>> 
>>> On Apr 8, 2011, at 1:26 PM, Alex Araujo wrote:
>>> 
>>>> Hi, I was wondering if there are any patterns/best practices for creating atomic units of work when dealing with several column families and their inverted indices.
>>>> 
>>>> For example, if I have Users and Groups column families and did something like:
>>>> 
>>>> Users.insert( user_id, columns )
>>>> UserGroupTimeline.insert( group_id, { timeuuid() : user_id } )
>>>> UserGroupStatus.insert( group_id + ":" + user_id, { "Active" : "True" } )
>>>> UserEvents.insert( timeuuid(), { "user_id" : user_id, "group_id" : group_id, "event_type" : "join" } )
>>>> 
>>>> Would I want the client to retry all subsequent operations that failed against other nodes after n succeeded, maintain an "undo" queue of operations to run, batch the mutations and choose a strong consistency level, some combination of these/others, etc?
>>>> 
>>>> Thanks,
>>>> Alex
>> Thanks Drew. I'm familiar with lack of transactions and have read about 
>> people usiing ZK (possibly Cages as well?) to accomplish this, but since 
>> it seems that inverted indices are common place I'm interested in how 
>> anyone is mitigating lack of atomicity to any extent without the use of 
>> such tools. It appears that Hector and Pelops have retrying built in to 
>> their APIs and I'm fairly confident that proper use of those 
>> capabilities may help. Just trying to cover all bases. Hopefully 
>> someone can share their approaches and/or experiences. Cheers, Alex.
>

Re: Atomicity Strategies

Posted by Dan Washusen <da...@reactive.org>.

Here's a good writeup on how fightmymonster.com does it...

http://ria101.wordpress.com/category/nosql-databases/locking/

-- 
Dan Washusen
Make big files fly
visit digitalpigeon.com

On Saturday, 9 April 2011 at 11:53 AM, Alex Araujo wrote:
On 4/8/11 5:46 PM, Drew Kutcharian wrote:
> > I'm interested in this too, but I don't think this can be done with Cassandra alone. Cassandra doesn't support transactions. I think hector can retry operations, but I'm not sure about the atomicity of the whole thing.
> > 
> > 
> > 
> > On Apr 8, 2011, at 1:26 PM, Alex Araujo wrote:
> > 
> > > Hi, I was wondering if there are any patterns/best practices for creating atomic units of work when dealing with several column families and their inverted indices.
> > > 
> > > For example, if I have Users and Groups column families and did something like:
> > > 
> > > Users.insert( user_id, columns )
> > > UserGroupTimeline.insert( group_id, { timeuuid() : user_id } )
> > > UserGroupStatus.insert( group_id + ":" + user_id, { "Active" : "True" } )
> > > UserEvents.insert( timeuuid(), { "user_id" : user_id, "group_id" : group_id, "event_type" : "join" } )
> > > 
> > > Would I want the client to retry all subsequent operations that failed against other nodes after n succeeded, maintain an "undo" queue of operations to run, batch the mutations and choose a strong consistency level, some combination of these/others, etc?
> > > 
> > > Thanks,
> > > Alex
> Thanks Drew. I'm familiar with lack of transactions and have read about 
> people using ZK (possibly Cages as well?) to accomplish this, but since 
> it seems that inverted indices are common place I'm interested in how 
> anyone is mitigating lack of atomicity to any extent without the use of 
> such tools. It appears that Hector and Pelops have retrying built in to 
> their APIs and I'm fairly confident that proper use of those 
> capabilities may help. Just trying to cover all bases. Hopefully 
> someone can share their approaches and/or experiences. Cheers, Alex.
>

Re: Atomicity Strategies

Posted by Alex Araujo <ca...@alex.otherinbox.com>.

On 4/8/11 5:46 PM, Drew Kutcharian wrote:
> I'm interested in this too, but I don't think this can be done with Cassandra alone. Cassandra doesn't support transactions. I think hector can retry operations, but I'm not sure about the atomicity of the whole thing.
>
>
>
> On Apr 8, 2011, at 1:26 PM, Alex Araujo wrote:
>
>> Hi, I was wondering if there are any patterns/best practices for creating atomic units of work when dealing with several column families and their inverted indices.
>>
>> For example, if I have Users and Groups column families and did something like:
>>
>> Users.insert( user_id, columns )
>> UserGroupTimeline.insert( group_id, { timeuuid() : user_id } )
>> UserGroupStatus.insert( group_id + ":" + user_id, { "Active" : "True" } )
>> UserEvents.insert( timeuuid(), { "user_id" : user_id, "group_id" : group_id, "event_type" : "join" } )
>>
>> Would I want the client to retry all subsequent operations that failed against other nodes after n succeeded,  maintain an "undo" queue of operations to run, batch the mutations and choose a strong consistency level, some combination of these/others, etc?
>>
>> Thanks,
>> Alex
>>
>>
Thanks Drew.  I'm familiar with lack of transactions and have read about 
people using ZK (possibly Cages as well?) to accomplish this, but since 
it seems that inverted indices are  common place I'm interested in how 
anyone is mitigating lack of atomicity to any extent without the use of 
such tools.  It appears that Hector and Pelops have retrying built in to 
their APIs and I'm fairly confident that proper use of those 
capabilities may help.  Just trying to cover all bases.  Hopefully 
someone can share their approaches and/or experiences.  Cheers, Alex.

Re: Atomicity Strategies

Posted by Drew Kutcharian <dr...@venarc.com>.

I'm interested in this too, but I don't think this can be done with Cassandra alone. Cassandra doesn't support transactions. I think hector can retry operations, but I'm not sure about the atomicity of the whole thing.



On Apr 8, 2011, at 1:26 PM, Alex Araujo wrote:

> Hi, I was wondering if there are any patterns/best practices for creating atomic units of work when dealing with several column families and their inverted indices.
> 
> For example, if I have Users and Groups column families and did something like:
> 
> Users.insert( user_id, columns )
> UserGroupTimeline.insert( group_id, { timeuuid() : user_id } )
> UserGroupStatus.insert( group_id + ":" + user_id, { "Active" : "True" } )
> UserEvents.insert( timeuuid(), { "user_id" : user_id, "group_id" : group_id, "event_type" : "join" } )
> 
> Would I want the client to retry all subsequent operations that failed against other nodes after n succeeded,  maintain an "undo" queue of operations to run, batch the mutations and choose a strong consistency level, some combination of these/others, etc?
> 
> Thanks,
> Alex
> 
>