You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Peter Schuller (JIRA)" <ji...@apache.org> on 2011/01/20 02:19:46 UTC

[jira] Created: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Add CL.TWO, CL.THREE; tweak CL documentation
--------------------------------------------

Key: CASSANDRA-2013
URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
Project: Cassandra
Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor

Attaching draft patch to add CL.TWO and CL.THREE.

Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.

Technically one would want to specify an arbitrary number, but that is a much more significant change.

Two open questions:

(1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.

(2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:

assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
: String.format("invalid response count %d for replication factor %d",
blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());

At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.

Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).

If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug). Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.

'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004016#comment-13004016 ] 

Peter Schuller commented on CASSANDRA-2013:
-------------------------------------------

For the record, the patch as committed still retains the "regression" of sorts that you can get an AssertionFailure instead of a cleaner error when attempting to insert at CL.THREE in a cluster with RF < 3. That may be okay, I just wanted to say that clearly outside of my wall of text above.

(Sorry for the delay, caught up a bit on JIRA traffic today.)

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2013:
--------------------------------------

    Attachment: 2013-assert.txt

Looks like the easiest fix is to just remove the assert, which is somewhat obsolete anyway. Then assureSufficientLiveNodes will throw UnavailableException, if necessary.

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013-assert.txt, 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller updated CASSANDRA-2013:
--------------------------------------

    Attachment: 2013.txt

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002869#comment-13002869 ] 

Hudson commented on CASSANDRA-2013:
-----------------------------------

Integrated in Cassandra-0.7 #349 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/349/])
    add CL.TWO, THREE
patch by Peter Schuller; reviewed by tjake for CASSANDRA-2013


> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018404#comment-13018404 ] 

Jonathan Ellis commented on CASSANDRA-2013:
-------------------------------------------

Planning to supercede these in CASSANDRA-2338

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013-assert.txt, 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986675#action_12986675 ] 

T Jake Luciani commented on CASSANDRA-2013:
-------------------------------------------

I can see how someone would want this with RF > 6  but who does that?

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002663#comment-13002663 ] 

T Jake Luciani commented on CASSANDRA-2013:
-------------------------------------------

Narendra, good point.  I think the case that you want more than ONE without requiring a QUORUM would give you a bit of redundancy without the strict consistency requirements.

My only issue here is this is a "power user" feature and we are throwing it in so everyone will not start using it and not understand the implications...

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007119#comment-13007119 ] 

Peter Schuller commented on CASSANDRA-2013:
-------------------------------------------

Sounds good to me.

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013-assert.txt, 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005470#comment-13005470 ] 

Jonathan Ellis commented on CASSANDRA-2013:
-------------------------------------------

(patch attached to remove assert)

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013-assert.txt, 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2013:
--------------------------------------

      Component/s: Core
    Fix Version/s: 0.8

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007135#comment-13007135 ] 

Jonathan Ellis commented on CASSANDRA-2013:
-------------------------------------------

committed

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013-assert.txt, 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002663#comment-13002663 ] 

T Jake Luciani edited comment on CASSANDRA-2013 at 3/4/11 3:28 PM:
-------------------------------------------------------------------

Narendra, good point.  I think the case that you want more than ONE without requiring a QUORUM would give you a bit of redundancy without the strict consistency requirements.

My only issue here is this is a "power user" feature and we are throwing it in so everyone will start using it and not understand the implications...

      was (Author: tjake):
    Narendra, good point.  I think the case that you want more than ONE without requiring a QUORUM would give you a bit of redundancy without the strict consistency requirements.

My only issue here is this is a "power user" feature and we are throwing it in so everyone will not start using it and not understand the implications...
  
> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986885#action_12986885 ] 

Peter Schuller commented on CASSANDRA-2013:
-------------------------------------------

It is relevant already at RF=4 (which seems very reasonable; e.g. two copies per data center).

You may not care about quorum consistency yet want to require at least two copies of writes for durability purposes.

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007158#comment-13007158 ] 

Hudson commented on CASSANDRA-2013:
-----------------------------------

Integrated in Cassandra-0.7 #386 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/386/])
    r/m obsolete assert
patch by jbellis; reviewed by Peter Schuller for CASSANDRA-2013


> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 2013-assert.txt, 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2013) Add CL.TWO, CL.THREE; tweak CL documentation

Posted by "Narendra Sharma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994223#comment-12994223 ] 

Narendra Sharma commented on CASSANDRA-2013:
--------------------------------------------

We are using Cassandra as storage to build a message store. There are few customers who are asking for two copies of data even when two replicas are down. These are big customers with large number of subscribers and large data volume. The most obvious solution we recommended is RF = 5 and R/W CL = QUORUM. However, this increases the TCO. Keeping 5 copies of data makes the system less attractive. Based on logical reason that you cannot have two working copies unless you have two working replicas, they agreed to have RF = 4.

To achieve the requirement explained above with RF=4 we will need to start with R/W CL = QUORUM. If one of the replica goes down we continue with CL=QUORUM ( we can downgrade to TWO as well). If second replica goes down, we reduce the CL to TWO. This way we satisfy the R + W > N while two replica's are down, also meet the requirement of maintaining two working copies even after two replicas are down. As the replicas come up we upgrade the CL to QUORUM. We are aware that this strategy has some loose ends where we may have partitioned reads and writes with CL.TWO.

We believe this is a common usecase and other users of Cassandra will also find this useful. Hence, atleast CL.TWO make lot of sense in practical cases.

> Add CL.TWO, CL.THREE; tweak CL documentation
> --------------------------------------------
>
>                 Key: CASSANDRA-2013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2013
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 2013.txt
>
>
> Attaching draft patch to add CL.TWO and CL.THREE.
> Motivation for adding is that having to select between either ONE or QUORUM is too narrow a choice for clusters with RF > 3. In such a case, it makes particular sense to want to do writes at e.g. CL.TWO for durability purposes even though you are not looking to get strong consistency with QUORUM. CL.THREE is the same argument. TWO and THREE felt reasonable; there is no objective reason why stopping at THREE is the obvious choice.
> Technically one would want to specify an arbitrary number, but that is a much more significant change. 
> Two open questions:
> (1) I adjusted the documentation of ConsistencyLevel to be more consistent and also to reflect what I believe to be reality (for example, as far as I can tell QUORUM doesn't send requests to all nodes as claimed in the .thrift file). I'm not terribly confident that I have not missed something though.
> (2) There is at least one unresolved issue, which is this assertion check WriteResponseHandler:
>         assert 1 <= blockFor && blockFor <= 2 * Table.open(table).getReplicationStrategy().getReplicationFactor()
>             : String.format("invalid response count %d for replication factor %d",
>                             blockFor, Table.open(table).getReplicationStrategy().getReplicationFactor());
> At THREE, this causes an assertion failure on keyspace with RF=1. I would, as a user, expect UnavailableException. However I am uncertain as to what to do about this assertion. I think this highlights one TWO/THREE are different from previously existing CL:s, in that they essentially hard-code replicate counts rather than expressing them in terms that can by definition be served by the cluster at any RF.
> Given that with THREE (and not TWO, but that is only due to the implementation detail that bootstrapping is involved) implies a replicate count that is independent of the replication factor, there is essentially a new failure mode. It is suddenly possible for a consistency level to be fundamentally incompatible with the RF. My gut reaction is to want UnavailableException still, and that the assertion check can essentially be removed (other than the <= 1 part).
> If a different failure mode is desired, presumably it would not be an assertion failure (which should indicate a Cassandra bug).  Maybe UnstisfiableConsistencyLevel? I propose just adjusting the assertion (which has no equivalent in ReadCallback btw); giving a friendlier error message in case of a CL/RF mismatch would be good, but doesn't feel worth introducing extra complexity to deal with it.
> 'ant test' passes. I have tested w/ py_stress with a three-node cluster and an RF=3 keyspace and with 1 and 2 nodes down, and get expected behavior (available or unavailable as a function of nodes that are up).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira