You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/04/03 00:55:14 UTC

[jira] Created: (CASSANDRA-48) test-and-set

test-and-set
------------

                 Key: CASSANDRA-48
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jonathan Ellis


Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.

I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (CASSANDRA-48) test-and-set

Posted by Sandeep Tata <sa...@gmail.com>.
PNUTS, of course, solves this by mastering the record at some node
(leader election with ZK?) and walking away from hinted-handoff.




On Fri, Apr 3, 2009 at 9:33 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Fri, Apr 3, 2009 at 10:30 AM, Sandeep Tata <sa...@gmail.com> wrote:
>> Hmm, I don't remember -- when we do a strong read, do we have a way to
>> grab any hinted-handoff-ed values ?
>
> no.
>

Re: [jira] Commented: (CASSANDRA-48) test-and-set

Posted by Jonathan Ellis <jb...@gmail.com>.
On Fri, Apr 3, 2009 at 10:30 AM, Sandeep Tata <sa...@gmail.com> wrote:
> Hmm, I don't remember -- when we do a strong read, do we have a way to
> grab any hinted-handoff-ed values ?

no.

Re: [jira] Commented: (CASSANDRA-48) test-and-set

Posted by Sandeep Tata <sa...@gmail.com>.
This is very interesting: an eventually consistent system provides an
eventually consistent test-and-set :-)
If my semantics is that test-and-set succeeds if a majority return
success, the resolution rule here cannot be "latest timestamp wins".
It has to be a "majority wins".

A fails, B & C succeed = success. The value on A is inconsistent with
the "truth". Local reads on 'A' will return an invalid value. Strong
read to fix it.

The more interesting case is when one of the replicas is down. How can
you do a hinted handoff with the test-and-set ?
Hmm, I don't remember -- when we do a strong read, do we have a way to
grab any hinted-handoff-ed values ?

On Fri, Apr 3, 2009 at 9:18 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> That makes sense.
>
> In that case you would have to do a read that waits for all replicas.
> Which will hurt availability but I don't see any alternative.
>
> Again, this isn't intended going to provide strong consistency.  You
> can still come up with failure scenarios where inconsistent data could
> be read temporarily.  I'm not sure if it's going to be useful given
> that.
>
> On Fri, Apr 3, 2009 at 10:10 AM, Sandeep Tata <sa...@gmail.com> wrote:
>> Well, the test and set failed could have failed because of a new value
>> replacing the old one since this client read it.
>> Jun is talking about re-reading to get this new value and retrying the
>> "transaction"
>>
>> On Fri, Apr 3, 2009 at 9:08 AM, Jonathan Ellis (JIRA) <ji...@apache.org> wrote:
>>>
>>>    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695470#action_12695470 ]
>>>
>>> Jonathan Ellis commented on CASSANDRA-48:
>>> -----------------------------------------
>>>
>>> The client should not do that.  It knows the old version (or it couldn't be doing test-and-set) and it knows the new version and it knows that not all replicas were written but some were.  Doing a read cannot possibly tell it anything it doesn't already know.
>>>
>>>> test-and-set
>>>> ------------
>>>>
>>>>                 Key: CASSANDRA-48
>>>>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>>>>             Project: Cassandra
>>>>          Issue Type: Improvement
>>>>            Reporter: Jonathan Ellis
>>>>
>>>> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
>>>> I do not think that this requires strong consistency to be useful.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>
>

Re: [jira] Commented: (CASSANDRA-48) test-and-set

Posted by Jonathan Ellis <jb...@gmail.com>.
That makes sense.

In that case you would have to do a read that waits for all replicas.
Which will hurt availability but I don't see any alternative.

Again, this isn't intended going to provide strong consistency.  You
can still come up with failure scenarios where inconsistent data could
be read temporarily.  I'm not sure if it's going to be useful given
that.

On Fri, Apr 3, 2009 at 10:10 AM, Sandeep Tata <sa...@gmail.com> wrote:
> Well, the test and set failed could have failed because of a new value
> replacing the old one since this client read it.
> Jun is talking about re-reading to get this new value and retrying the
> "transaction"
>
> On Fri, Apr 3, 2009 at 9:08 AM, Jonathan Ellis (JIRA) <ji...@apache.org> wrote:
>>
>>    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695470#action_12695470 ]
>>
>> Jonathan Ellis commented on CASSANDRA-48:
>> -----------------------------------------
>>
>> The client should not do that.  It knows the old version (or it couldn't be doing test-and-set) and it knows the new version and it knows that not all replicas were written but some were.  Doing a read cannot possibly tell it anything it doesn't already know.
>>
>>> test-and-set
>>> ------------
>>>
>>>                 Key: CASSANDRA-48
>>>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>>>             Project: Cassandra
>>>          Issue Type: Improvement
>>>            Reporter: Jonathan Ellis
>>>
>>> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
>>> I do not think that this requires strong consistency to be useful.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>

Re: [jira] Commented: (CASSANDRA-48) test-and-set

Posted by Sandeep Tata <sa...@gmail.com>.
Well, the test and set failed could have failed because of a new value
replacing the old one since this client read it.
Jun is talking about re-reading to get this new value and retrying the
"transaction"

On Fri, Apr 3, 2009 at 9:08 AM, Jonathan Ellis (JIRA) <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695470#action_12695470 ]
>
> Jonathan Ellis commented on CASSANDRA-48:
> -----------------------------------------
>
> The client should not do that.  It knows the old version (or it couldn't be doing test-and-set) and it knows the new version and it knows that not all replicas were written but some were.  Doing a read cannot possibly tell it anything it doesn't already know.
>
>> test-and-set
>> ------------
>>
>>                 Key: CASSANDRA-48
>>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>>             Project: Cassandra
>>          Issue Type: Improvement
>>            Reporter: Jonathan Ellis
>>
>> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
>> I do not think that this requires strong consistency to be useful.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713237#action_12713237 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

I'd like to get the "dial up consistency" part in first rather than the other way around, since something like that (Zookeeper?) would be useful in other situations as well while T&S w/o consistency is not really useful as you point out.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695229#action_12695229 ] 

Jun Rao commented on CASSANDRA-48:
----------------------------------

So you block until all N replicas reply and return success as long as 1 reply says the insert is successful?

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695476#action_12695476 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

Like I said on the ML, in that case you would have to do a read that waits for all replicas.  Which will hurt availability but I don't see any alternative.

Again, this isn't intended going to provide strong consistency.  You can still come up with failure scenarios where inconsistent data could be read temporarily.  I'm not sure if it's going to be useful given that.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713242#action_12713242 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

... actually, thinking about it a little more, if you are using ZK locks for consistency I don't really see the point of  T&S at all.  Read/write locks are described under shared locks here: http://zookeeper.wiki.sourceforge.net/ZooKeeperRecipes.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695799#action_12695799 ] 

Jun Rao commented on CASSANDRA-48:
----------------------------------

What are the version numbers set by the 2 updates? If the version numbers are the same, it's not clear if majority read helps since the conflict resolution logic is based on version number.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695475#action_12695475 ] 

Jun Rao commented on CASSANDRA-48:
----------------------------------

Here is a contrived but simple example. Suppose that some shopping cart info is stored in a column and multiple clients are updating it. A client may want to update the column value only if the value hasn't changed since he last read it. If the column value has indeed changed, he wants to read the latest value back and MERGE it with the local value he has and then reinsert. This will be hard to do with the current test-and-set insert semantic.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713253#action_12713253 ] 

Sandeep Tata commented on CASSANDRA-48:
---------------------------------------

Yup, I'm planning to submit a "dial-up the consistency patch" first (modulo some failure scenarios, it's not super hard). I'm testing it on a local cluster today .. I'll open a ticket and describe the design. Just wanted to give you guys a heads up about the status on this one and reasons for the consistency patch.

Using ZK to implement concurrency control is not a great option:

If you expect high level of conflicts, you're in trouble with a shared nothing architecture anyway. Talking to ZK for this is going to kill performance.

If you expect low level of conflicts, you're better off with optimistic concurrency control, and T&S is the right primitive for that.


> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695481#action_12695481 ] 

Sandeep Tata commented on CASSANDRA-48:
---------------------------------------

Say we have 2 processes doing test-and-set.

Read X              Read X
T&S X +1           T&S X+2

The two T&S are routed to A , B, C:

@A T&S X + 1
                            @B T&S X+2
                            @C T&S X+2
                            @A T&S X+2 <------Fail, A already updated X 
@B T&S X+1 <---- fail
@C T&S X+2 <---- fail


X+1 fails, X+2 succeeds.
A contains a bad value that can be repaired by majority read. Any holes ?



> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695461#action_12695461 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

No, that's up to the client.  It may wish to retry the write, or try to write back the old value.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713234#action_12713234 ] 

Sandeep Tata commented on CASSANDRA-48:
---------------------------------------

Looks like it is much too tricky to guarantee convergence without somewhat stronger consistency if we admit a test-and-set instruction.

I've started work on a simple implementation without these additional repair mechanisms -- apps can use it safely if they either
a) manage concurrency in the app (bad assumption -- I don't expect apps to be able to do this, if they can, T&S doesn't really provide much value)
b) can expect to dial up the consistency if they want to use this part of the API (very useful if they can do this without losing performance for those parts of the app that don't need consistency)



> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695177#action_12695177 ] 

Sandeep Tata commented on CASSANDRA-48:
---------------------------------------

This'd be great, except we probably want to condition on the timestamp. i.e., 

Get X
Put X+1 if latest timestamp hasn't changed

That makes sure Cassandra doesn't have to to a bit-by-bit compare on an arbitrarily large value, just compare timestamps.
This is the approach that PNUTS takes. See http://research.yahoo.com/files/pnuts.pdf

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695225#action_12695225 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

Sandeep: you're right, timestamp is better here.

Jun: I don't think we need to do anything special -- we already have code to handle saying "i want to block for N replicas before calling the write a success."  that could work here unchanged.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695459#action_12695459 ] 

Jun Rao commented on CASSANDRA-48:
----------------------------------

That's when complexity comes in. Say N=3 and you insert to node A,B, and C. The insert succeeds at A, but failed at B and C. You want to fail the insert. However, the insert already succeeds at A. Now, you should roll back the insert made at A. Then, you need to remember the previous version at A. You will be essentially doing some sort of 2-phase commit.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695232#action_12695232 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

No, you return failure unless N succeed.  (So obviously this can reduce availability if you are setting N = replication count.  Normally setting it to a simple majority is sufficient.)

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695467#action_12695467 ] 

Jun Rao commented on CASSANDRA-48:
----------------------------------

A common case for a client is that if a test-and-set insert fails, the client will re-read the latest version and then take some action accordingly. The client could read an incorrect version in this case.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695470#action_12695470 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

The client should not do that.  It knows the old version (or it couldn't be doing test-and-set) and it knows the new version and it knows that not all replicas were written but some were.  Doing a read cannot possibly tell it anything it doesn't already know.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713293#action_12713293 ] 

Jonathan Ellis commented on CASSANDRA-48:
-----------------------------------------

ISTM that we're not going to have a correct T&S w/o the kind of more brute-force locking that ZK gives us, but I will wait to see what you have come up with. :)

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697908#action_12697908 ] 

Sandeep Tata commented on CASSANDRA-48:
---------------------------------------

There's a problem with my previous example. Here it is again, slightly improved:

Process1:
Read X 
T&S X +1 , timestamp=5

Process2:
Read X 
T&S X +2, timestamp=4

The two T&S are routed to A , B, C: 

@A T&S X + 1 
                            @B T&S X+2 
                            @C T&S X+2 
                            @A T&S X+2 <------Fail, A already updated X 
@B T&S X+1 <---- fail 
@C T&S X+2 <---- fail 

The node that coordinated the T&S for Process1 reports a failure because 2 out of 3 T&S attempts failed.
Process2 reports success because 1 out of 3 failed, 2 succeeded.
The correct value at the node is now X+2 (from Process2)

Now, if node C dies, here's what we have on A & B:

@A: X+1, ts=5
@B: X+2, ts=4

We can't resolve this using majority (we don't do that currently anyway). The latest timestamp rule will give us the wrong answer, and we'll lose the update from Process2. Bad.

If the value at A was fixed before C went down, we wouldn't be in this trouble:

The node coordinating the T&S should always "cancel" the successful (minority) writes from a failed attempt. A simple way to implement cancelled writes would be to add a bit for that column indicating that the column is "cancelled". The cancelling write should also use T&S to avoid trampling on a newer value if the old invalid value was replaced in the meantime. If the cancelling T&S fails, we have nothing to worry about. When a node sees a cancelled column, it should replace that value with the value from a peer even if the peer node sends a value with an older timestamp. This would involve 1) a simple change to read repair logic, 2) change to the read path to make sure cancelled columns are not read (if a read results in a cancelled column, then you trigger quorum read), and 3) modifying ColumnFamily to understand cancelled writes.

The cancel-writes approach is cleaner than a forced rollback to an older value by removing this bad value from the memtable. At first glance it seems simpler, but what if the memtable got flushed in the meantime? You can't really remove this bad value from the SSTable.

What happens with Hinted-handoff:
If you don't hear back from any of the intended peers, the T&S is assumed failed. Handing off T&S doesn't make sense because it requires a success quorum. The failed node can come back up and get the correct values using read repair -- business as usual.

Of course, this still leaves us with a window of vulnerability. If C goes down (in the example above) *and* the node running Process1 dies before it has a chance to fix A, we still end up with an invalid update. But I'm guessing this is a good tradeoff to make in a system that relies on eventual consistency. This becomes a problem if C happens to be the node running Process1. The alternative would be a protocol based on tentative writes and confirmations -- that brings in a whole slew of other new semantics and other complications.

I'm going to scan the literature to see if there are alternative protocols in this consistency class  -- any thoughts on this approach? Any alternatives?

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-48) test-and-set

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-48.
-------------------------------------

    Resolution: Won't Fix

It seems clear that this is unworkable in cassandra core.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-48) test-and-set

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695224#action_12695224 ] 

Jun Rao commented on CASSANDRA-48:
----------------------------------

We have to be careful about the semantic of test-and-set insert. What happens if the condition is met at some, but not all replicas? If we want to fail the insert in this case, we need to be able to roll back the insert at some replicas.

> test-and-set
> ------------
>
>                 Key: CASSANDRA-48
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-48
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>
> Atomic test-and-set insert operation would be nice: "set value to X but only if the current value is still Y."  This allows a sort of optimistic consistency: perform a GET, then perform test-and-set with the value of that GET as Y.
> I do not think that this requires strong consistency to be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.