You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Lucas Nodine <lu...@gmail.com> on 2010/09/26 20:04:06 UTC

Curious as to how Cassandra handles the following

I'm looking at a design where multiple clients will connect to Cassandra and
get/mutate resources, possibly concurrently.  After planning a bit, I ran
into the following scenero for which I have not been able to research to
find an answer sufficient for my needs.  I have found where others have
recommended Zookeeper for such tasks, but I want to determine if there is a
simple solution before including another product in my design.

Make the following assumption for all following situations:
Assuming multiple clients where a client is someone accessing Cassandra
using thrift.  All reads and writes are performed using the QUORUM
consistency level.

Situation 1:
Client A ("A") connects to Cassandra and requests a QUORUM consistency level
get of an entire row.  At or very shortly thereafter (before A's request
completes), Client B ("B") connects to Cassandra and inserts (or mutates) a
column (or multiple columns) within the row.

Does A receive the new data saved by B or does A receive the data prior to
B's save?

Situaton 2:
B connects and mutates multiple columns within a row.  A requests some data
therein while B is processing.

Result?

Situation 3:
B mutates multiple columns within multiple rows.  A requests some data
therein while B is processing.

Result?

Justification: At certain points I want to essentially lock a resource (row)
in cassandra for exclusive write access (think checkout a resource) by
setting a flag value of a column within that row.  I'm just considering race
conditions.


Thanks,

Lucas Nodine

Re: Curious as to how Cassandra handles the following

Posted by Benjamin Black <b...@b3k.us>.
On Sun, Sep 26, 2010 at 11:04 AM, Lucas Nodine <lu...@gmail.com> wrote:
> I'm looking at a design where multiple clients will connect to Cassandra and
> get/mutate resources, possibly concurrently.  After planning a bit, I ran
> into the following scenero for which I have not been able to research to
> find an answer sufficient for my needs.  I have found where others have
> recommended Zookeeper for such tasks, but I want to determine if there is a
> simple solution before including another product in my design.
>
> Make the following assumption for all following situations:
> Assuming multiple clients where a client is someone accessing Cassandra
> using thrift.  All reads and writes are performed using the QUORUM
> consistency level.
>
> Situation 1:
> Client A ("A") connects to Cassandra and requests a QUORUM consistency level
> get of an entire row.  At or very shortly thereafter (before A's request
> completes), Client B ("B") connects to Cassandra and inserts (or mutates) a
> column (or multiple columns) within the row.
>
> Does A receive the new data saved by B or does A receive the data prior to
> B's save?
>

Depends on the exact order of operations across several nodes.  Since
you can't know what that ordering will be (or what it was), you can't
predict whether you see the pre- or post-update version.

> Situaton 2:
> B connects and mutates multiple columns within a row.  A requests some data
> therein while B is processing.
>
> Result?
>

Which call was used to make the changes?

> Situation 3:
> B mutates multiple columns within multiple rows.  A requests some data
> therein while B is processing.
>
> Result?
>

Undefined, as in situation 1.

> Justification: At certain points I want to essentially lock a resource (row)
> in cassandra for exclusive write access (think checkout a resource) by
> setting a flag value of a column within that row.  I'm just considering race
> conditions.
>

If you really can't fix your design to avoid locks, then you need a
system to permit locking.  That usually means Zookeeper.


b

Re: Curious as to how Cassandra handles the following

Posted by Benjamin Black <b...@b3k.us>.
On Sun, Sep 26, 2010 at 4:01 PM, Lucas Nodine <lu...@gmail.com> wrote:
> Ok, so based on everyone's input it seems that I need to put some sort of
> server in front of Cassandra to handle locking and exclusive access.
>
> I am planning on building a system (DMS) that will store resources
> (document, images, media, etc) using Cassandra for data.  As my target user
> is going to be someone without any understanding of a 'diff' I have elected
> for locking instead of conflict resolution in versions.

Good thing, as versioned conflict resolution is not available in Cassandra.


b

Re: Curious as to how Cassandra handles the following

Posted by Lucas Nodine <lu...@gmail.com>.
Ok, so based on everyone's input it seems that I need to put some sort of
server in front of Cassandra to handle locking and exclusive access.

I am planning on building a system (DMS) that will store resources
(document, images, media, etc) using Cassandra for data.  As my target user
is going to be someone without any understanding of a 'diff' I have elected
for locking instead of conflict resolution in versions.  This means I
require a method of exclusive access.  Suggestions have been Cage and
Zookeeper, I'm going to be writting this is C# for .net and mono.  Any other
suggestions besides just building a simple solution?

I currently have an application sitting between the end client and Cassandra
to handle business logic, by the sound of it I should just implement a
method of communication between however many "servers" I will have (it will
support distributed deployment) to apply a lock to the resource, system
wide.

I understand the technical aspects of how that should work; however, before
I begin that, does anyone have any suggestions or tips?

Thanks,

- Lucas Nodine




On Sun, Sep 26, 2010 at 4:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Atomic (all of it will complete, or none), but not isolated (readers
> can see parts of a write before they see the whole thing).
>
> On Sun, Sep 26, 2010 at 1:19 PM, Aaron Morton <aa...@thelastpickle.com>
> wrote:
> > Mutations against a single key on a single machine are
> > atomic http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
> >
> > Aaron
> > On 27 Sep, 2010,at 07:48 AM, Norman Maurer <no...@apache.org> wrote:
> >
> > To be more clear (maybe I was not before). BatchMutate is not atomic.
> > So it only "batch up" mutates to reduce overhead. So it can be that
> > you will receive data from it even if the whole operation is not
> > complete or will not complete.
> >
> > bye,
> > Norman
> >
> >
> > 2010/9/26 Norman Maurer <no...@apache.org>:
> >> Comments inside..
> >>
> >> 2010/9/26 Lucas Nodine <lu...@gmail.com>:
> >>> I'm looking at a design where multiple clients will connect to
> Cassandra
> >>> and
> >>> get/mutate resources, possibly concurrently.  After planning a bit, I
> ran
> >>> into the following scenero for which I have not been able to research
> to
> >>> find an answer sufficient for my needs.  I have found where others have
> >>> recommended Zookeeper for such tasks, but I want to determine if there
> is
> >>> a
> >>> simple solution before including another product in my design.
> >>>
> >>> Make the following assumption for all following situations:
> >>> Assuming multiple clients where a client is someone accessing Cassandra
> >>> using thrift.  All reads and writes are performed using the QUORUM
> >>> consistency level.
> >>>
> >>> Situation 1:
> >>> Client A ("A") connects to Cassandra and requests a QUORUM consistency
> >>> level
> >>> get of an entire row.  At or very shortly thereafter (before A's
> request
> >>> completes), Client B ("B") connects to Cassandra and inserts (or
> mutates)
> >>> a
> >>> column (or multiple columns) within the row.
> >>>
> >>> Does A receive the new data saved by B or does A receive the data prior
> >>> to
> >>> B's save?
> >>
> >> Shoud receive A stuff.
> >>>
> >>> Situaton 2:
> >>> B connects and mutates multiple columns within a row.  A requests some
> >>> data
> >>> therein while B is processing.
> >>>
> >>> Result?
> >>
> >> Depends.. is it done in BatchMutate or not ?
> >>
> >>>
> >>> Situation 3:
> >>> B mutates multiple columns within multiple rows.  A requests some data
> >>> therein while B is processing.
> >>>
> >>> Result?
> >>
> >> See above..
> >>
> >>>
> >>> Justification: At certain points I want to essentially lock a resource
> >>> (row)
> >>> in cassandra for exclusive write access (think checkout a resource) by
> >>> setting a flag value of a column within that row.  I'm just considering
> >>> race
> >>> conditions.
> >>>
> >> You will need to use cages or something like that..
> >>
> >>
> >>> Thanks,
> >>>
> >>> Lucas Nodine
> >>
> >> Bye,
> >> Norman
> >>
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
Lucas J. Nodine
Assistant Labette County Attorney
201 S. Central, Suite B
Parsons, KS 67357
(620) 421-6370

Re: Curious as to how Cassandra handles the following

Posted by Jonathan Ellis <jb...@gmail.com>.
Atomic (all of it will complete, or none), but not isolated (readers
can see parts of a write before they see the whole thing).

On Sun, Sep 26, 2010 at 1:19 PM, Aaron Morton <aa...@thelastpickle.com> wrote:
> Mutations against a single key on a single machine are
> atomic http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
>
> Aaron
> On 27 Sep, 2010,at 07:48 AM, Norman Maurer <no...@apache.org> wrote:
>
> To be more clear (maybe I was not before). BatchMutate is not atomic.
> So it only "batch up" mutates to reduce overhead. So it can be that
> you will receive data from it even if the whole operation is not
> complete or will not complete.
>
> bye,
> Norman
>
>
> 2010/9/26 Norman Maurer <no...@apache.org>:
>> Comments inside..
>>
>> 2010/9/26 Lucas Nodine <lu...@gmail.com>:
>>> I'm looking at a design where multiple clients will connect to Cassandra
>>> and
>>> get/mutate resources, possibly concurrently.  After planning a bit, I ran
>>> into the following scenero for which I have not been able to research to
>>> find an answer sufficient for my needs.  I have found where others have
>>> recommended Zookeeper for such tasks, but I want to determine if there is
>>> a
>>> simple solution before including another product in my design.
>>>
>>> Make the following assumption for all following situations:
>>> Assuming multiple clients where a client is someone accessing Cassandra
>>> using thrift.  All reads and writes are performed using the QUORUM
>>> consistency level.
>>>
>>> Situation 1:
>>> Client A ("A") connects to Cassandra and requests a QUORUM consistency
>>> level
>>> get of an entire row.  At or very shortly thereafter (before A's request
>>> completes), Client B ("B") connects to Cassandra and inserts (or mutates)
>>> a
>>> column (or multiple columns) within the row.
>>>
>>> Does A receive the new data saved by B or does A receive the data prior
>>> to
>>> B's save?
>>
>> Shoud receive A stuff.
>>>
>>> Situaton 2:
>>> B connects and mutates multiple columns within a row.  A requests some
>>> data
>>> therein while B is processing.
>>>
>>> Result?
>>
>> Depends.. is it done in BatchMutate or not ?
>>
>>>
>>> Situation 3:
>>> B mutates multiple columns within multiple rows.  A requests some data
>>> therein while B is processing.
>>>
>>> Result?
>>
>> See above..
>>
>>>
>>> Justification: At certain points I want to essentially lock a resource
>>> (row)
>>> in cassandra for exclusive write access (think checkout a resource) by
>>> setting a flag value of a column within that row.  I'm just considering
>>> race
>>> conditions.
>>>
>> You will need to use cages or something like that..
>>
>>
>>> Thanks,
>>>
>>> Lucas Nodine
>>
>> Bye,
>> Norman
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Curious as to how Cassandra handles the following

Posted by Aaron Morton <aa...@thelastpickle.com>.
Mutations against a single key on a single machine are atomic http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic


Aaron

On 27 Sep, 2010,at 07:48 AM, Norman Maurer <no...@apache.org> wrote:

To be more clear (maybe I was not before). BatchMutate is not atomic.
So it only "batch up" mutates to reduce overhead. So it can be that
you will receive data from it even if the whole operation is not
complete or will not complete.

bye,
Norman


2010/9/26 Norman Maurer <no...@apache.org>:
> Comments inside..
>
> 2010/9/26 Lucas Nodine <lu...@gmail.com>:
>> I'm looking at a design where multiple clients will connect to Cassandra and
>> get/mutate resources, possibly concurrently.  After planning a bit, I ran
>> into the following scenero for which I have not been able to research to
>> find an answer sufficient for my needs.  I have found where others have
>> recommended Zookeeper for such tasks, but I want to determine if there is a
>> simple solution before including another product in my design.
>>
>> Make the following assumption for all following situations:
>> Assuming multiple clients where a client is someone accessing Cassandra
>> using thrift.  All reads and writes are performed using the QUORUM
>> consistency level.
>>
>> Situation 1:
>> Client A ("A") connects to Cassandra and requests a QUORUM consistency level
>> get of an entire row.  At or very shortly thereafter (before A's request
>> completes), Client B ("B") connects to Cassandra and inserts (or mutates) a
>> column (or multiple columns) within the row.
>>
>> Does A receive the new data saved by B or does A receive the data prior to
>> B's save?
>
> Shoud receive A stuff.
>>
>> Situaton 2:
>> B connects and mutates multiple columns within a row.  A requests some data
>> therein while B is processing.
>>
>> Result?
>
> Depends.. is it done in BatchMutate or not ?
>
>>
>> Situation 3:
>> B mutates multiple columns within multiple rows.  A requests some data
>> therein while B is processing.
>>
>> Result?
>
> See above..
>
>>
>> Justification: At certain points I want to essentially lock a resource (row)
>> in cassandra for exclusive write access (think checkout a resource) by
>> setting a flag value of a column within that row.  I'm just considering race
>> conditions.
>>
> You will need to use cages or something like that..
>
>
>> Thanks,
>>
>> Lucas Nodine
>
> Bye,
> Norman
>

Re: Curious as to how Cassandra handles the following

Posted by Norman Maurer <no...@apache.org>.
To be more clear (maybe I was not before). BatchMutate is not atomic.
So it only "batch up" mutates to reduce overhead. So it can be that
you will receive data from it even if the whole operation is not
complete or will not complete.

bye,
Norman


2010/9/26 Norman Maurer <no...@apache.org>:
> Comments inside..
>
> 2010/9/26 Lucas Nodine <lu...@gmail.com>:
>> I'm looking at a design where multiple clients will connect to Cassandra and
>> get/mutate resources, possibly concurrently.  After planning a bit, I ran
>> into the following scenero for which I have not been able to research to
>> find an answer sufficient for my needs.  I have found where others have
>> recommended Zookeeper for such tasks, but I want to determine if there is a
>> simple solution before including another product in my design.
>>
>> Make the following assumption for all following situations:
>> Assuming multiple clients where a client is someone accessing Cassandra
>> using thrift.  All reads and writes are performed using the QUORUM
>> consistency level.
>>
>> Situation 1:
>> Client A ("A") connects to Cassandra and requests a QUORUM consistency level
>> get of an entire row.  At or very shortly thereafter (before A's request
>> completes), Client B ("B") connects to Cassandra and inserts (or mutates) a
>> column (or multiple columns) within the row.
>>
>> Does A receive the new data saved by B or does A receive the data prior to
>> B's save?
>
> Shoud receive A stuff.
>>
>> Situaton 2:
>> B connects and mutates multiple columns within a row.  A requests some data
>> therein while B is processing.
>>
>> Result?
>
> Depends.. is it done in BatchMutate or not ?
>
>>
>> Situation 3:
>> B mutates multiple columns within multiple rows.  A requests some data
>> therein while B is processing.
>>
>> Result?
>
> See above..
>
>>
>> Justification: At certain points I want to essentially lock a resource (row)
>> in cassandra for exclusive write access (think checkout a resource) by
>> setting a flag value of a column within that row.  I'm just considering race
>> conditions.
>>
> You will need to use cages or something like that..
>
>
>> Thanks,
>>
>> Lucas Nodine
>
> Bye,
> Norman
>

Re: Curious as to how Cassandra handles the following

Posted by Norman Maurer <no...@apache.org>.
Comments inside..

2010/9/26 Lucas Nodine <lu...@gmail.com>:
> I'm looking at a design where multiple clients will connect to Cassandra and
> get/mutate resources, possibly concurrently.  After planning a bit, I ran
> into the following scenero for which I have not been able to research to
> find an answer sufficient for my needs.  I have found where others have
> recommended Zookeeper for such tasks, but I want to determine if there is a
> simple solution before including another product in my design.
>
> Make the following assumption for all following situations:
> Assuming multiple clients where a client is someone accessing Cassandra
> using thrift.  All reads and writes are performed using the QUORUM
> consistency level.
>
> Situation 1:
> Client A ("A") connects to Cassandra and requests a QUORUM consistency level
> get of an entire row.  At or very shortly thereafter (before A's request
> completes), Client B ("B") connects to Cassandra and inserts (or mutates) a
> column (or multiple columns) within the row.
>
> Does A receive the new data saved by B or does A receive the data prior to
> B's save?

Shoud receive A stuff.
>
> Situaton 2:
> B connects and mutates multiple columns within a row.  A requests some data
> therein while B is processing.
>
> Result?

Depends.. is it done in BatchMutate or not ?

>
> Situation 3:
> B mutates multiple columns within multiple rows.  A requests some data
> therein while B is processing.
>
> Result?

See above..

>
> Justification: At certain points I want to essentially lock a resource (row)
> in cassandra for exclusive write access (think checkout a resource) by
> setting a flag value of a column within that row.  I'm just considering race
> conditions.
>
You will need to use cages or something like that..


> Thanks,
>
> Lucas Nodine

Bye,
Norman