You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jay Zhuang (JIRA)" <ji...@apache.org> on 2019/05/24 17:33:00 UTC

[jira] [Updated] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage

     [ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Zhuang updated CASSANDRA-15141:
-----------------------------------
         Complexity: Challenging
    Change Category: Performance
             Status: Open  (was: Triage Needed)

> RemoveNode takes long time and blocks gossip stage
> --------------------------------------------------
>
>                 Key: CASSANDRA-15141
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Gossip, Cluster/Membership
>            Reporter: Jay Zhuang
>            Assignee: Jay Zhuang
>            Priority: Normal
>
> This function [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] during removenode and decommission is slow for large vnode cluster with NetworkTopologyStrategy. As it needs to build whole replications map for every token range.
> In one of our cluster (> 1k nodes), it takes about 20 seconds for each NetworkTopologyStrategy keyspace, so the total time to process a removenode message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org