You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Peter Schuller (Created) (JIRA)" <ji...@apache.org> on 2012/01/29 06:30:10 UTC

[jira] [Created] (CASSANDRA-3810) reconsider rack awareness

reconsider rack awareness
-------------------------

                 Key: CASSANDRA-3810
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
             Project: Cassandra
          Issue Type: Task
            Reporter: Peter Schuller
            Assignee: Peter Schuller
            Priority: Minor


We believed we wanted to be rack aware because we want to ensure that loosing a rack only affects a single replica of any given row key.

When using rack awareness, the first problem you encounter immediately if you aren't careful is that you induce hotspots as a result of rack aware replica selection. Using the format {{rackname-nodename}}, consider a part of the ring that looks like this:

{code}
...
r1-n1
r1-n2
r1-n3
r2-n1
r3-n1
r4-n1
...
{code}

Due to the rack awareness, {{r2-n1}} will be the second replica for all data whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they would all be forced to skip over any identical racks.

The way we end up allocating nodes in a cluster is to satisfy this criteria:

* Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must not have another node in {{r}} within {{rf-1}} steps in the ring in either direction.

Any violation of this criteria implies the induction of hotspots due to rack awareness.

The realization however, that I had a few days ago, is that *the rackawareness is not actually changing replica placement* when using this ring topology. In other words, *the way you have to use* rack awareness is to construct the ring such that *the rack awareness is a NOOP*.

So, questions:

* Is there any non-hotspot inducing use-case where rack awareness can be used ("used" in the sense that it actually changes the placement relative to non-awareness) effectively without satisfying the criteria above?
* Is it misleading and counter-productive to teach people (via documentation for example) to rely on rack awareness in their rings instead of just giving them the rule above for ring topology?
* Would it be a better service to the user to provide an easy way to *ensure* that the ring topology adheres to this criteria (such as refusing to bootstrap a new node if rack awareness is requested, and taking it into consideration on automatic token selection (does anyone use that?)), than to "silently" generate hotspots by altering the replication strategy? (The "silence" problem is magnified by the fact that {{nodetool ring}} doesn't reflect this; so the user must take into account both the RF *and* the racks when interpreting {{nodetool ring}} output.)

FWIW, internally we just go with the criteria outlined above, and we have a separate tool which will print the *actual* ownership percentage of a node in the ring (based on the thrift {{describe_ring}} call). Any ring that has node selections that causes a violation of the criteria is effectively a bug/mis-configured ring, so only in the event of mistakes are we "using" the rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3810) reconsider rack awareness

Posted by "Nick Bailey (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196238#comment-13196238 ] 

Nick Bailey commented on CASSANDRA-3810:
----------------------------------------

I would be in favor of removing the current concept of rack awareness and better documenting the way to achieve distribution among racks. I don't know of any situations where the current implementation is useful.

As far as preventing hotspots, I think ultimately the best solution would be solving the problem in a way that user's don't have to care about ordering racks in the ring. That solution would probably involve some pretty big changes though and I'm not sure how feasible it would be to actually implement. 

I'm not sure about the approach of refusing to bootstrap or select tokens that introduce a rack imbalance. I'd rather see nodetool get fixed so that imbalances can easily be seen and changing the documentation around racks so that the solution is apparent.
                
> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing a rack only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you aren't careful is that you induce hotspots as a result of rack aware replica selection. Using the format {{rackname-nodename}}, consider a part of the ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they would all be forced to skip over any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must not have another node in {{r}} within {{rf-1}} steps in the ring in either direction.
> Any violation of this criteria implies the induction of hotspots due to rack awareness.
> The realization however, that I had a few days ago, is that *the rackawareness is not actually changing replica placement* when using this ring topology. In other words, *the way you have to use* rack awareness is to construct the ring such that *the rack awareness is a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used ("used" in the sense that it actually changes the placement relative to non-awareness) effectively without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation for example) to rely on rack awareness in their rings instead of just giving them the rule above for ring topology?
> * Would it be a better service to the user to provide an easy way to *ensure* that the ring topology adheres to this criteria (such as refusing to bootstrap a new node if rack awareness is requested, and taking it into consideration on automatic token selection (does anyone use that?)), than to "silently" generate hotspots by altering the replication strategy? (The "silence" problem is magnified by the fact that {{nodetool ring}} doesn't reflect this; so the user must take into account both the RF *and* the racks when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a separate tool which will print the *actual* ownership percentage of a node in the ring (based on the thrift {{describe_ring}} call). Any ring that has node selections that causes a violation of the criteria is effectively a bug/mis-configured ring, so only in the event of mistakes are we "using" the rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3810) reconsider rack awareness

Posted by "Tyler Hobbs (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228120#comment-13228120 ] 

Tyler Hobbs commented on CASSANDRA-3810:
----------------------------------------

bq. I would be in favor of removing the current concept of rack awareness and better documenting the way to achieve distribution among racks...

I agree, although we'd presumably want to deprecate NTS (similar to what was done for ONTS) and make a new MultiDCStrategy.

bq. I'd rather see nodetool get fixed so that imbalances can easily be seen...

I think the correct way to do that would be to show the percentage of the total ring that the node owns for each of the defined keyspaces.  You could add a column to the output for each keyspace, but since the current output is already fairly wide, perhaps a separate table below the current output or a separate command entirely would be appropriate.
                
> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing a rack only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you aren't careful is that you induce hotspots as a result of rack aware replica selection. Using the format {{rackname-nodename}}, consider a part of the ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they would all be forced to skip over any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must not have another node in {{r}} within {{rf-1}} steps in the ring in either direction.
> Any violation of this criteria implies the induction of hotspots due to rack awareness.
> The realization however, that I had a few days ago, is that *the rackawareness is not actually changing replica placement* when using this ring topology. In other words, *the way you have to use* rack awareness is to construct the ring such that *the rack awareness is a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used ("used" in the sense that it actually changes the placement relative to non-awareness) effectively without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation for example) to rely on rack awareness in their rings instead of just giving them the rule above for ring topology?
> * Would it be a better service to the user to provide an easy way to *ensure* that the ring topology adheres to this criteria (such as refusing to bootstrap a new node if rack awareness is requested, and taking it into consideration on automatic token selection (does anyone use that?)), than to "silently" generate hotspots by altering the replication strategy? (The "silence" problem is magnified by the fact that {{nodetool ring}} doesn't reflect this; so the user must take into account both the RF *and* the racks when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a separate tool which will print the *actual* ownership percentage of a node in the ring (based on the thrift {{describe_ring}} call). Any ring that has node selections that causes a violation of the criteria is effectively a bug/mis-configured ring, so only in the event of mistakes are we "using" the rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3810) reconsider rack awareness

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195692#comment-13195692 ] 

Peter Schuller commented on CASSANDRA-3810:
-------------------------------------------

FWIW, a similar problem does *not* exist for multiple DC:s because they are effectively completely separate rings even though they aren't formally (but the problem of {{nodetool ring}} not being aware remains for the user).
                
> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing a rack only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you aren't careful is that you induce hotspots as a result of rack aware replica selection. Using the format {{rackname-nodename}}, consider a part of the ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they would all be forced to skip over any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must not have another node in {{r}} within {{rf-1}} steps in the ring in either direction.
> Any violation of this criteria implies the induction of hotspots due to rack awareness.
> The realization however, that I had a few days ago, is that *the rackawareness is not actually changing replica placement* when using this ring topology. In other words, *the way you have to use* rack awareness is to construct the ring such that *the rack awareness is a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used ("used" in the sense that it actually changes the placement relative to non-awareness) effectively without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation for example) to rely on rack awareness in their rings instead of just giving them the rule above for ring topology?
> * Would it be a better service to the user to provide an easy way to *ensure* that the ring topology adheres to this criteria (such as refusing to bootstrap a new node if rack awareness is requested, and taking it into consideration on automatic token selection (does anyone use that?)), than to "silently" generate hotspots by altering the replication strategy? (The "silence" problem is magnified by the fact that {{nodetool ring}} doesn't reflect this; so the user must take into account both the RF *and* the racks when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a separate tool which will print the *actual* ownership percentage of a node in the ring (based on the thrift {{describe_ring}} call). Any ring that has node selections that causes a violation of the criteria is effectively a bug/mis-configured ring, so only in the event of mistakes are we "using" the rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3810) reconsider rack awareness

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195694#comment-13195694 ] 

Peter Schuller commented on CASSANDRA-3810:
-------------------------------------------

I am incorrectly using the term "hotspot". What I mean is rather "imbalance". Let's leave the word hotspot for actual hotspots in the token space :)
                
> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing a rack only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you aren't careful is that you induce hotspots as a result of rack aware replica selection. Using the format {{rackname-nodename}}, consider a part of the ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they would all be forced to skip over any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must not have another node in {{r}} within {{rf-1}} steps in the ring in either direction.
> Any violation of this criteria implies the induction of hotspots due to rack awareness.
> The realization however, that I had a few days ago, is that *the rackawareness is not actually changing replica placement* when using this ring topology. In other words, *the way you have to use* rack awareness is to construct the ring such that *the rack awareness is a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used ("used" in the sense that it actually changes the placement relative to non-awareness) effectively without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation for example) to rely on rack awareness in their rings instead of just giving them the rule above for ring topology?
> * Would it be a better service to the user to provide an easy way to *ensure* that the ring topology adheres to this criteria (such as refusing to bootstrap a new node if rack awareness is requested, and taking it into consideration on automatic token selection (does anyone use that?)), than to "silently" generate hotspots by altering the replication strategy? (The "silence" problem is magnified by the fact that {{nodetool ring}} doesn't reflect this; so the user must take into account both the RF *and* the racks when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a separate tool which will print the *actual* ownership percentage of a node in the ring (based on the thrift {{describe_ring}} call). Any ring that has node selections that causes a violation of the criteria is effectively a bug/mis-configured ring, so only in the event of mistakes are we "using" the rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira