You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Michael Shuler (JIRA)" <ji...@apache.org> on 2018/02/12 21:14:00 UTC

[jira] [Updated] (CASSANDRA-13144) Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16

     [ https://issues.apache.org/jira/browse/CASSANDRA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Shuler updated CASSANDRA-13144:
---------------------------------------
    Fix Version/s:     (was: 2.1.2)
                   2.1.x

> Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-13144
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13144
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>         Environment: Centos 6
> Java 8
>            Reporter: sai k potturi
>            Priority: Major
>             Fix For: 2.1.x
>
>
> In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node or datacenter, we observe the decommissioned nodes marked as DOWN in the cluster when you do a "nodetool describecluster". The nodes however do not show up in the "nodetool status" command.
>    The decommissioned node also does not show up in the "system_peers" table on the nodes.
> The workaround we follow is rolling restart of the cluster, which removes the decommissioned nodes from the "UNREACHABLE STATE", and shows the actual state of the cluster. The workaround is tedious for huge clusters.
> We also verified the decommission process in CCM tool, and observed the same issue for clusters with versions from 2.1.12 to 2.1.16. The issue was not observed in versions prior to or later than the ones mentioned above.
> Below are the observed logs from the versions without the bug, and with the bug.
> Cassandra 2.1.1 Logs showing the decommissioned node :
> 2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval time of 2049943233 for /X.X.X.X
> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node /X.X.X.X state left, tokens [ 59353109817657926242901533144729725259, 60254520910109313597677907197875221475, 75698727618038614819889933974570742305, 84508739091270910297310401957975430578]
> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire time for endpoint : /X.X.X.X (1485116334088)
> 2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing tokens [100434964734820719895982857900842892337, 114144647582686041354301802358217767299, 132090888860517964702932350041942412177, 138409460913927199437556572481804704749] for /X.X.X.X
> 2017-01-19 20:18:56,418 [HintedHandoff:3] INFO HintedHandOffManager Deleting any stored hints for /X.X.X.X
> 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG MessagingService Resetting version for /X.X.X.X
> 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG Gossiper removing endpoint /X.X.X.X
> 2017-01-19 20:18:56,437 [GossipStage:1] DEBUG StorageService Ignoring state change for dead or unknown endpoint: /X.X.X.X
> 2017-01-19 20:19:02,022 [WRITE-/X.X.X.X] DEBUG OutboundTcpConnection attempting to connect to /X.X.X.X
> 2017-01-19 20:19:02,023 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection Handshaking version with /X.X.X.X
> 2017-01-19 20:19:02,023 [WRITE-/X.X.X.X] DEBUG MessagingService Setting version 7 for /X.X.X.X
> 2017-01-19 20:19:08,096 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval time of 2074454222 for /X.X.X.X
> 2017-01-19 20:19:54,407 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval time of 4302985797 for /X.X.X.X
> 2017-01-19 20:19:57,405 [GossipTasks:1] DEBUG Gossiper 60000 elapsed, /X.X.X.X gossip quarantine over
> 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval time of 3047826501 for /X.X.X.X
> 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG StorageService Ignoring state change for dead or unknown endpoint: /X.X.X.X
> Cassandra 2.1.16 Logs showing the decommissioned node : (The logs in 2.1.16 show the same as 2.1.1 upto "DEBUG Gossiper 60000 elapsed, /X.X.X.X gossip quarantine over", and then is followed by "NODE is now DOWN"
> 017-01-19 19:52:23,687 [GossipStage:1] DEBUG  StorageService.java:1883 - Node /X.X.X.X state left, tokens [-1112888759032625467, -228773855963737699, -311455042375
> 4381391, -4848625944949064281, -6920961603460018610, -8566729719076824066, 1611098831406674636, 7278843689020594771, 7565410054791352413, 9166885764, 8654747784805453046]
> 2017-01-19 19:52:23,688 [GossipStage:1] DEBUG  Gossiper.java:1520 - adding expire time for endpoint : /X.X.X.X (1485114743567)
> 2017-01-19 19:52:23,688 [GossipStage:1] INFO   StorageService.java:1965 - Removing tokens [-1112888759032625467, -228773855963737699, -3114550423754381391, -48486259449
> 49064281, -6920961603460018610, 5690722015779071557, 6202373691525063547, 7191120402564284381, 7278843689020594771, 7565410054791352413, 8524200089166885764, 865474778
> 4805453046] for /X.X.X.X
> 2017-01-19 19:52:23,689 [HintedHandoffManager:1] INFO   HintedHandOffManager.java:230 - Deleting any stored hints for /X.X.X.X
> 2017-01-19 19:52:23,689 [GossipStage:1] DEBUG  MessagingService.java:840 - Resetting version for /X.X.X.X
> 2017-01-19 19:52:23,690 [GossipStage:1] DEBUG  Gossiper.java:417 - removing endpoint /X.X.X.X
> 2017-01-19 19:52:23,691 [GossipStage:1] DEBUG  StorageService.java:1552 - Ignoring state change for dead or unknown endpoint: /X.X.X.X
> 2017-01-19 19:52:31,617 [MessagingService-Outgoing-/X.X.X.X] DEBUG  OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
> 2017-01-19 19:52:31,618 [HANDSHAKE-/X.X.X.X] INFO   OutboundTcpConnection.java:488 - Handshaking version with /X.X.X.X
> 2017-01-19 19:52:31,619 [MessagingService-Outgoing-/X.X.X.X] DEBUG  MessagingService.java:826 - Setting version 8 for /X.X.X.X
> 2017-01-19 19:53:09,699 [GossipStage:1] DEBUG  FailureDetector.java:423 - Ignoring interval time of 4001002966 for /X.X.X.X
> 2017-01-19 19:53:13,910 [GossipStage:1] DEBUG  FailureDetector.java:423 - Ignoring interval time of 4210611081 for /X.X.X.X
> 2017-01-19 19:53:19,914 [GossipStage:1] DEBUG  FailureDetector.java:423 - Ignoring interval time of 6004119075 for /X.X.X.X
> 2017-01-19 19:53:23,702 [GossipTasks:1] DEBUG  Gossiper.java:795 - 60000 elapsed, /X.X.X.X gossip quarantine over
> 2017-01-19 19:53:23,985 [GossipStage:1] DEBUG  StorageService.java:1552 - Ignoring state change for dead or unknown endpoint: /X.X.X.X
> 2017-01-19 19:53:26,223 [GossipStage:1] DEBUG  FailureDetector.java:423 - Ignoring interval time of 6309159352 for /X.X.X.X
> 2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting /X.X.X.X with status LEFT - alive true
> 2017-01-19 19:53:50,709 [GossipTasks:1] INFO   Gossiper.java:1008 - InetAddress /X.X.X.X is now DOWN
> 2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG  MessagingService.java:429 - Resetting pool for /X.X.X.X
> 2017-01-19 19:53:51,710 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting /X.X.X.X with status LEFT - alive false
> 2017-01-19 19:53:52,710 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting /X.X.X.X with status LEFT - alive false
> 2017-01-19 19:53:53,711 [MessagingService-Outgoing-/X.X.X.X] DEBUG  OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
> 2017-01-19 19:53:53,711 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting /X.X.X.X with status LEFT - alive false
> 2017-01-19 19:53:54,711 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting /X.X.X.X with status LEFT - alive false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org