You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "ZhaoYang (JIRA)" <ji...@apache.org> on 2018/12/11 10:56:00 UTC
[jira] [Updated] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared

     [ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ZhaoYang updated CASSANDRA-14930:
---------------------------------
    Description: 
On a 3-node cluster with RF=2, decommissioning a node may cause quorum write timeout because messaging backlog to decommissioned node is cleared via {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}.
(Timeout is less likely to happen with RF=3, because we can afford one less response)

{code:java}
What happened:
1. [WriteStage] before the leaving node is removed from tokenmetadata, the write endpoints are generated ( leaving endpoint is included )
2. [GossipStage] the leaving node is removed from tokenmetadata, no more future write handler will include leaving endpoints
3. [WriteStage] write handlers sends messages to messaging-service backlog
4. [GossipStage] messaging-service backlog is cleared, messages are not sent and connection closed
5. [WriteStage] write time out
 {code}


| patch |
| [3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]  |
| [3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]  |

We can avoid it by delaying to destroy messaging connection so that messages are sent and responded. This patch also avoids reopen already closed connection on {{MessagingService#convict()}}.
New messaging framework rewrite in {{Trunk}} avoids the issues by not clearing messaging backlog.


  was:
On a 3-node cluster with RF=2, decommissioning a node may cause quorum write timeout because messaging backlog to decommissioned node is cleared via {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}.
(Timeout is less likely to happen with RF=3, because we can afford one less response)

{code:java}
What happened:
1. [WriteStage] before the leaving node is removed from tokenmetadata, the write endpoints are generated ( leaving endpoint is included )
2. [GossipStage] the leaving node is removed from tokenmetadata, no more future write handler will include leaving endpoints
3. [WriteStage] write handlers sends messages to messaging-service backlog
4. [GossipStage] messaging-service backlog is cleared, messages are not sent and connection closed
5. [WriteStage] write time out
 {code}


| patch |
| [3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]  |
| [3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]  |

We can avoid it by delaying to destroy messaging connection so that messages are sent and responded. New messaging framework rewrite in {{Trunk}} avoids the issues by not clearing messaging backlog.



> decommission may cause timeout because messaging backlog is cleared 
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-14930
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14930
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination, Core
>            Reporter: ZhaoYang
>            Assignee: ZhaoYang
>            Priority: Major
>             Fix For: 3.0.x, 3.11.x
>
>
> On a 3-node cluster with RF=2, decommissioning a node may cause quorum write timeout because messaging backlog to decommissioned node is cleared via {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}.
> (Timeout is less likely to happen with RF=3, because we can afford one less response)
> {code:java}
> What happened:
> 1. [WriteStage] before the leaving node is removed from tokenmetadata, the write endpoints are generated ( leaving endpoint is included )
> 2. [GossipStage] the leaving node is removed from tokenmetadata, no more future write handler will include leaving endpoints
> 3. [WriteStage] write handlers sends messages to messaging-service backlog
> 4. [GossipStage] messaging-service backlog is cleared, messages are not sent and connection closed
> 5. [WriteStage] write time out
>  {code}
> | patch |
> | [3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]  |
> | [3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]  |
> We can avoid it by delaying to destroy messaging connection so that messages are sent and responded. This patch also avoids reopen already closed connection on {{MessagingService#convict()}}.
> New messaging framework rewrite in {{Trunk}} avoids the issues by not clearing messaging backlog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org