You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Brett Eisenberg (JIRA)" <ji...@apache.org> on 2009/04/02 18:03:12 UTC

[jira] Created: (CASSANDRA-45) Integrate with ZooKeeper to enhance fault tolerance and coordination

Integrate with ZooKeeper to enhance fault tolerance and coordination
--------------------------------------------------------------------

                 Key: CASSANDRA-45
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-45
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Brett Eisenberg


Per Avinash:

(1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
(2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
(3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance.
(4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.

additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.

Opening ticket for discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-45) Integrate with ZooKeeper to enhance fault tolerance and coordination

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695141#action_12695141 ] 

Sandeep Tata commented on CASSANDRA-45:
---------------------------------------

Oh, and we should probably work towards an initial release and mark these more "basic" issues as affecting the next release?

> Integrate with ZooKeeper to enhance fault tolerance and coordination
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-45
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-45
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brett Eisenberg
>
> Per Avinash:
> (1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
> (2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
> (3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
> (4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.
> additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.
> Per Neophytos: distributed locks for multi-row puts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-45) Integrate with ZooKeeper to enhance fault tolerance and coordination

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695071#action_12695071 ] 

Sandeep Tata commented on CASSANDRA-45:
---------------------------------------

I agree with a majority of the goals. 

> (1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info
0 : I don't see big problems except we'll need to make sure this doesn't adversely affect the gossip protocol. If we're just using ZK to cache each node's token info separately, we're simply using ZK as "backup" so it won't affect the  protocol at all. I'm not sure how useful this is.

> (2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.

+1 : Makes sense. Leader election for coordinating load balancing -- exactly what ZK was built for.

> (3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance.
+1 : This is going to add some complexity, but I'm guessing this becomes critical when you want to guarantee read performance in a production setting.

> (4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.
0 : Not sure I know what "operation stuff" is yet. Not sure if there's any value to using ZK as a backup log.

I'm fine with people layering stronger consistency on top of Cassandra where this layer sacrifices some availability. If you want multi-row puts, you will give up availability under a bunch of scenarios. That is fine so long as the implementation does not affect the performance of the underlying eventually consistent system in any way. 

> Integrate with ZooKeeper to enhance fault tolerance and coordination
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-45
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-45
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brett Eisenberg
>
> Per Avinash:
> (1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
> (2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
> (3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
> (4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.
> additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.
> Per Neophytos: distributed locks for multi-row puts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-45) Integrate with ZooKeeper to enhance fault tolerance and coordination

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-45.
-------------------------------------

    Resolution: Duplicate

superceded by CASSANDRA-44 et al

> Integrate with ZooKeeper to enhance fault tolerance and coordination
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-45
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-45
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brett Eisenberg
>
> Per Avinash:
> (1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
> (2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
> (3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
> (4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.
> additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.
> Per Neophytos: distributed locks for multi-row puts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-45) Integrate with ZooKeeper to enhance fault tolerance and coordination

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695059#action_12695059 ] 

Jonathan Ellis commented on CASSANDRA-45:
-----------------------------------------

-1 on locks.  it's adding more complexity in an effort to make the system more consistent, which can only be really done if you are willing to give up availability, which we are not.

In my mind the place for zookeeper is for rare system-level events, e.g. load balancing or bootstrap, not to attempt to layer strong consistency on the client-facing API.

Avinash: "Options are either [eventual consistency or] strong consistency which is hard to get right in a distributed setting. If you do get it right then there is availability problem. All tools like read-repair etc help in achieving eventual consistency. So I guess it boils down to what you want from your app C or A."

Of course I don't want to put words into his mouth as to what he thinks about this specific proposal.

> Integrate with ZooKeeper to enhance fault tolerance and coordination
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-45
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-45
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brett Eisenberg
>
> Per Avinash:
> (1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
> (2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
> (3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
> (4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.
> additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.
> Per Neophytos: distributed locks for multi-row puts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-45) Integrate with ZooKeeper to enhance fault tolerance and coordination

Posted by "Brett Eisenberg (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brett Eisenberg updated CASSANDRA-45:
-------------------------------------

    Description: 
Per Avinash:

(1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
(2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
(3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
(4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.

additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.

Opening ticket for discussion.

Per Neophytos: distributed locks for multi-row puts

  was:
Per Avinash:

(1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
(2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
(3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance.
(4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.

additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.

Opening ticket for discussion.


> Integrate with ZooKeeper to enhance fault tolerance and coordination
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-45
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-45
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brett Eisenberg
>
> Per Avinash:
> (1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
> (2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
> (3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
> (4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.
> additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.
> Opening ticket for discussion.
> Per Neophytos: distributed locks for multi-row puts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-45) Integrate with ZooKeeper to enhance fault tolerance and coordination

Posted by "Brett Eisenberg (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brett Eisenberg updated CASSANDRA-45:
-------------------------------------

    Description: 
Per Avinash:

(1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
(2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
(3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
(4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.

additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.

Per Neophytos: distributed locks for multi-row puts

  was:
Per Avinash:

(1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
(2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
(3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
(4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.

additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.

Opening ticket for discussion.

Per Neophytos: distributed locks for multi-row puts


> Integrate with ZooKeeper to enhance fault tolerance and coordination
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-45
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-45
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brett Eisenberg
>
> Per Avinash:
> (1) Store all configuration specific information in ZK for availability purposes. For eg. today we store the token info of a node in local disk. In production we lost that disk and with that the token info.
> (2) For storage load balance which does not exist today. I would like to have the notion of leader who would orchestrate a load balance strategy.
> (3) Distributed locks - suppose one of the replicas wanted to trigger a compaction process. Then we can prevent the other replicas to also initiate one so that we can get better read performance. 
> (4) There are operation stuff that needs to be set up when bootstrap of new nodes is in order. This intermediate state can be placed in ZK and then deleted on bootstrap completion. This way if the node handing of the data dies in between then it can continue from where it left off on re-start.
> additionally, configuration state data, cluster membership, and node visibility could be enhanced using ZK as well.
> Per Neophytos: distributed locks for multi-row puts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.