You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Chris Goffinet (Created) (JIRA)" <ji...@apache.org> on 2011/11/22 07:02:40 UTC

[jira] [Created] (CASSANDRA-3516) Make bootstrapping smarter instead of using 120 seconds to stabilize the ring

Make bootstrapping smarter instead of using 120 seconds to stabilize the ring
-----------------------------------------------------------------------------

                 Key: CASSANDRA-3516
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3516
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 1.0.2
            Reporter: Chris Goffinet


We run a very large cluster, and the 120s doesn't really make sense. We see gossip take anywhere from 30 to 60 seconds for the ring to actually mark nodes up or down (stabilize). We need to come up with a better way instead of picking an arbitrary number to wait. Our clusters are growing so large where 120s won't be enough time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3516) Make bootstrapping smarter instead of using 120 seconds to stabilize the ring

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208262#comment-13208262 ] 

Peter Schuller commented on CASSANDRA-3516:
-------------------------------------------

Chris, I assume these timings are mostly waiting for things to go Down?

The time it takes for a node to go down is actually much more about phi conviction than anything else, if things work okay otherwise.

Radim - same question to you. Are you really measuring propagation delay or are you measuring how long it takes for a node to realize another node is down?

CASSANDRA-3830 and CASSANDRA-3829 has related discussion. In my test I mention in the former, the average propagation delay at ~ 180 nodes was about 1.5 seconds. And as per the discussion in the latter, as long as seeds are up something would have to be very broken for propagation to be severely delayed by a large cluster.

                
> Make bootstrapping smarter instead of using 120 seconds to stabilize the ring
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3516
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3516
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>            Reporter: Chris Goffinet
>
> We run a very large cluster, and the 120s doesn't really make sense. We see gossip take anywhere from 30 to 60 seconds for the ring to actually mark nodes up or down (stabilize). We need to come up with a better way instead of picking an arbitrary number to wait. Our clusters are growing so large where 120s won't be enough time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3516) Make bootstrapping smarter instead of using 120 seconds to stabilize the ring

Posted by "Radim Kolar (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154912#comment-13154912 ] 

Radim Kolar commented on CASSANDRA-3516:
----------------------------------------

how many nodes you have? We need here about 40 seconds to stabilize ring.
                
> Make bootstrapping smarter instead of using 120 seconds to stabilize the ring
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3516
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3516
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>            Reporter: Chris Goffinet
>
> We run a very large cluster, and the 120s doesn't really make sense. We see gossip take anywhere from 30 to 60 seconds for the ring to actually mark nodes up or down (stabilize). We need to come up with a better way instead of picking an arbitrary number to wait. Our clusters are growing so large where 120s won't be enough time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3516) Make bootstrapping smarter instead of using 120 seconds to stabilize the ring

Posted by "Chris Goffinet (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154919#comment-13154919 ] 

Chris Goffinet commented on CASSANDRA-3516:
-------------------------------------------

168 nodes to stabilize in 30-60 seconds.
                
> Make bootstrapping smarter instead of using 120 seconds to stabilize the ring
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3516
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3516
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>            Reporter: Chris Goffinet
>
> We run a very large cluster, and the 120s doesn't really make sense. We see gossip take anywhere from 30 to 60 seconds for the ring to actually mark nodes up or down (stabilize). We need to come up with a better way instead of picking an arbitrary number to wait. Our clusters are growing so large where 120s won't be enough time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira