You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Даниел Симеонов <ds...@gmail.com> on 2010/04/21 09:46:03 UTC

questions about consistency

Hello,
   I am pretty new to Cassandra and I have some questions, they may seem
trivial, but still I am pretty new to the subject. First is about the lack
of a compareAndSet() operation, as I understood it is not supported
currently in Cassandra, do you know of use cases which really require such
operations and how these use cases currently workaround this .
Second topic I'd like to discuss a little bit more is about the read repair,
as I understand is that it is being done by the timestamps supplied by the
client application servers. Since computer clocks (which requires
synchronization algorithms working regularly) diverge there should be a time
frame during which the order of the client request written to the database
is not guaranteed, do you have real world experiences with this? Is this
similar to the casual consistency (
http://en.wikipedia.org/wiki/Causal_consistency ) .What happens if two
application servers try to update the same data and supply one and the same
timestamp (it could happen although rarely), what if they try to update
several columns in batch operation this way, is there a chance that the
column value could be intermixed between the two update requests?
I have one last question about the consistency level ALL, do you know of
real use cases where it is required (instead of QUORUM) and why (both read
and write)?
Thank you very much for your help to better understand 'Cassandra'!
Best regards, Daniel.

Re: questions about consistency

Posted by Masood Mortazavi <ma...@gmail.com>.
Hi Daniel,

For a general theoretical understanding, try reading some of the papers on
eventual consistency by Werner Vogels.

Reading the SOSP'07, Dynamo paper would also help with some of the
theoretical foundations and academic references.

To get even further into it, try reading  Replication Techniques in
Distributed Systems by Abdelsalam Helal, Abdelsalam Heddaya and Bharat
Bhargava (
http://www.amazon.com/Replication-Techniques-Distributed-Advances-Database/dp/0792398009/ref=sr_1_12?ie=UTF8&s=books&qid=1271891223&sr=8-12)

Regards,
m.


2010/4/21 Даниел Симеонов <ds...@gmail.com>

> Hi Paul,
>    about the last answer I still need some more clarifications, as I
> understand it if QUORUM is used, then reads doesn't get old values either?
> Or am I wrong?
> Thank you very much!
> Best regards, Daniel.
>
> 2010/4/21 Paul Prescod <pr...@gmail.com>
>
> I'm not an expert, so take what I say with a grain of salt.
>>
>> 2010/4/21 Даниел Симеонов <ds...@gmail.com>:
>> > Hello,
>> >    I am pretty new to Cassandra and I have some questions, they may seem
>> > trivial, but still I am pretty new to the subject. First is about the
>> lack
>> > of a compareAndSet() operation, as I understood it is not supported
>> > currently in Cassandra, do you know of use cases which really require
>> such
>> > operations and how these use cases currently workaround this .
>>
>> I think your question is paradoxical. If the use case really requires
>> the operation then there is no workaround by definition. The existence
>> of the workaround implies that the use case really did not require the
>> operation.
>>
>> Anyhow, vector clocks are probably relevant to this question and your next
>> one.
>>
>> > Second topic I'd like to discuss a little bit more is about the read
>> repair,
>> > as I understand is that it is being done by the timestamps supplied by
>> the
>> > client application servers. Since computer clocks (which requires
>> > synchronization algorithms working regularly) diverge there should be a
>> time
>> > frame during which the order of the client request written to the
>> database
>> > is not guaranteed, do you have real world experiences with this? Is this
>> > similar to the casual consistency (
>> > http://en.wikipedia.org/wiki/Causal_consistency ) .What happens if two
>> > application servers try to update the same data and supply one and the
>> same
>> > timestamp (it could happen although rarely), what if they try to update
>> > several columns in batch operation this way, is there a chance that the
>> > column value could be intermixed between the two update requests?
>>
>> All of this is changing with vector clocks in Cassandra 0.7.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-580
>>
>> > I have one last question about the consistency level ALL, do you know of
>> > real use cases where it is required (instead of QUORUM) and why (both
>> read
>> > and write)?
>>
>> It would be required when your business rules do not allow any client
>> to read the old value. For example if it would be illegal to provide
>> an obsolete stock value.
>>
>> > Thank you very much for your help to better understand 'Cassandra'!
>> > Best regards, Daniel.
>> >
>>
>
>

Re: questions about consistency

Posted by Даниел Симеонов <ds...@gmail.com>.
Hi Paul,
   about the last answer I still need some more clarifications, as I
understand it if QUORUM is used, then reads doesn't get old values either?
Or am I wrong?
Thank you very much!
Best regards, Daniel.

2010/4/21 Paul Prescod <pr...@gmail.com>

> I'm not an expert, so take what I say with a grain of salt.
>
> 2010/4/21 Даниел Симеонов <ds...@gmail.com>:
> > Hello,
> >    I am pretty new to Cassandra and I have some questions, they may seem
> > trivial, but still I am pretty new to the subject. First is about the
> lack
> > of a compareAndSet() operation, as I understood it is not supported
> > currently in Cassandra, do you know of use cases which really require
> such
> > operations and how these use cases currently workaround this .
>
> I think your question is paradoxical. If the use case really requires
> the operation then there is no workaround by definition. The existence
> of the workaround implies that the use case really did not require the
> operation.
>
> Anyhow, vector clocks are probably relevant to this question and your next
> one.
>
> > Second topic I'd like to discuss a little bit more is about the read
> repair,
> > as I understand is that it is being done by the timestamps supplied by
> the
> > client application servers. Since computer clocks (which requires
> > synchronization algorithms working regularly) diverge there should be a
> time
> > frame during which the order of the client request written to the
> database
> > is not guaranteed, do you have real world experiences with this? Is this
> > similar to the casual consistency (
> > http://en.wikipedia.org/wiki/Causal_consistency ) .What happens if two
> > application servers try to update the same data and supply one and the
> same
> > timestamp (it could happen although rarely), what if they try to update
> > several columns in batch operation this way, is there a chance that the
> > column value could be intermixed between the two update requests?
>
> All of this is changing with vector clocks in Cassandra 0.7.
>
> https://issues.apache.org/jira/browse/CASSANDRA-580
>
> > I have one last question about the consistency level ALL, do you know of
> > real use cases where it is required (instead of QUORUM) and why (both
> read
> > and write)?
>
> It would be required when your business rules do not allow any client
> to read the old value. For example if it would be illegal to provide
> an obsolete stock value.
>
> > Thank you very much for your help to better understand 'Cassandra'!
> > Best regards, Daniel.
> >
>

Re: questions about consistency

Posted by Paul Prescod <pa...@prescod.net>.
2010/4/22 Даниел Симеонов <ds...@gmail.com>:
> Hi Paul,
>     Thank you for your answer, about the first question, I wondered if it is
> possible to workaround this issue but relaxing some consistency, as I
> understand you it should be possible to implement this compareAndSet
> operation with the presence of vector clocks, then the client is going to
> reconcile the data.

I believe that the proposed implementation of vector clocks in
Cassandra allows the servers to do the reconciliation through
"plugins". So you'd do a "compareAndSet" "plugin", or Cassandra might
ship with one out of the box (there are several obvious ones that
should probably be right in the box).

> Regarding the second question I understood that without again the vector
> clocks and client reconciliation then there is this causality problem
> currently in Cassandra.

In general, Cassandra 0.6 has little protection against overlapping
and conflicting writes.

> About the third question, isn't it the same as if the writes and reads both
> use QUORUMs?

I think that you can use Consistency.ALL on write, and Consistency.ONE
on read, to optimize for read-speed, and the opposite to optimize for
write speed.

> What about implementation of counters, currently it seems it is not
> implementable in 'Cassandra', will the vector clocks help here? Do you have
> experiences with counters in Cassandra?

Counters are the "classic" example of why you need vector clocks.

The description for CASSANDRA-580 is "Allow a ColumnFamily to be
versioned via vector clocks, instead of long timestamps. Purpose:
enable incr/decr; flexible conflict resolution."

https://issues.apache.org/jira/browse/CASSANDRA-580

 Paul Prescod

Re: questions about consistency

Posted by Даниел Симеонов <ds...@gmail.com>.
Hi Paul,
    Thank you for your answer, about the first question, I wondered if it is
possible to workaround this issue but relaxing some consistency, as I
understand you it should be possible to implement this compareAndSet
operation with the presence of vector clocks, then the client is going to
reconcile the data.
Regarding the second question I understood that without again the vector
clocks and client reconciliation then there is this causality problem
currently in Cassandra.
About the third question, isn't it the same as if the writes and reads both
use QUORUMs?
What about implementation of counters, currently it seems it is not
implementable in 'Cassandra', will the vector clocks help here? Do you have
experiences with counters in Cassandra?

Best regards, Daniel.

2010/4/21 Paul Prescod <pr...@gmail.com>

> I'm not an expert, so take what I say with a grain of salt.
>
> 2010/4/21 Даниел Симеонов <ds...@gmail.com>:
> > Hello,
> >    I am pretty new to Cassandra and I have some questions, they may seem
> > trivial, but still I am pretty new to the subject. First is about the
> lack
> > of a compareAndSet() operation, as I understood it is not supported
> > currently in Cassandra, do you know of use cases which really require
> such
> > operations and how these use cases currently workaround this .
>
> I think your question is paradoxical. If the use case really requires
> the operation then there is no workaround by definition. The existence
> of the workaround implies that the use case really did not require the
> operation.
>
> Anyhow, vector clocks are probably relevant to this question and your next
> one.
>
> > Second topic I'd like to discuss a little bit more is about the read
> repair,
> > as I understand is that it is being done by the timestamps supplied by
> the
> > client application servers. Since computer clocks (which requires
> > synchronization algorithms working regularly) diverge there should be a
> time
> > frame during which the order of the client request written to the
> database
> > is not guaranteed, do you have real world experiences with this? Is this
> > similar to the casual consistency (
> > http://en.wikipedia.org/wiki/Causal_consistency ) .What happens if two
> > application servers try to update the same data and supply one and the
> same
> > timestamp (it could happen although rarely), what if they try to update
> > several columns in batch operation this way, is there a chance that the
> > column value could be intermixed between the two update requests?
>
> All of this is changing with vector clocks in Cassandra 0.7.
>
> https://issues.apache.org/jira/browse/CASSANDRA-580
>
> > I have one last question about the consistency level ALL, do you know of
> > real use cases where it is required (instead of QUORUM) and why (both
> read
> > and write)?
>
> It would be required when your business rules do not allow any client
> to read the old value. For example if it would be illegal to provide
> an obsolete stock value.
>
> > Thank you very much for your help to better understand 'Cassandra'!
> > Best regards, Daniel.
> >
>

Re: questions about consistency

Posted by Paul Prescod <pr...@gmail.com>.
I'm not an expert, so take what I say with a grain of salt.

2010/4/21 Даниел Симеонов <ds...@gmail.com>:
> Hello,
>    I am pretty new to Cassandra and I have some questions, they may seem
> trivial, but still I am pretty new to the subject. First is about the lack
> of a compareAndSet() operation, as I understood it is not supported
> currently in Cassandra, do you know of use cases which really require such
> operations and how these use cases currently workaround this .

I think your question is paradoxical. If the use case really requires
the operation then there is no workaround by definition. The existence
of the workaround implies that the use case really did not require the
operation.

Anyhow, vector clocks are probably relevant to this question and your next one.

> Second topic I'd like to discuss a little bit more is about the read repair,
> as I understand is that it is being done by the timestamps supplied by the
> client application servers. Since computer clocks (which requires
> synchronization algorithms working regularly) diverge there should be a time
> frame during which the order of the client request written to the database
> is not guaranteed, do you have real world experiences with this? Is this
> similar to the casual consistency (
> http://en.wikipedia.org/wiki/Causal_consistency ) .What happens if two
> application servers try to update the same data and supply one and the same
> timestamp (it could happen although rarely), what if they try to update
> several columns in batch operation this way, is there a chance that the
> column value could be intermixed between the two update requests?

All of this is changing with vector clocks in Cassandra 0.7.

https://issues.apache.org/jira/browse/CASSANDRA-580

> I have one last question about the consistency level ALL, do you know of
> real use cases where it is required (instead of QUORUM) and why (both read
> and write)?

It would be required when your business rules do not allow any client
to read the old value. For example if it would be illegal to provide
an obsolete stock value.

> Thank you very much for your help to better understand 'Cassandra'!
> Best regards, Daniel.
>