You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Harvey (JIRA)" <ji...@apache.org> on 2011/09/22 09:21:26 UTC

[jira] [Created] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on one node on a single node

Node which was decommissioned and shut-down reappears on one node on a single node
----------------------------------------------------------------------------------

                 Key: CASSANDRA-3243
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.8.5
            Reporter: Jason Harvey
            Priority: Minor


I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.

In an attempt to clean it out of the dead gossip list so I could truncate, I shut down every node in the ring and brought them all back up. Once they all came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.

I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.

Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112852#comment-13112852 ] 

Jason Harvey commented on CASSANDRA-3243:
-----------------------------------------

bq. Can you explain what you mean by "dead gossip list" and how this prevents truncate?

The decommissioned node is showing up in the 'UNREACHABLE' list when calling 'describe cluster'. When I attempt to run truncate, the command returns that truncate cannot occur due to a node being down.

bq. After CASSANDRA-2496, we store dead gossip states for 3 days, so that any other nodes that were down at the time of removal can know later not to repopulate the ring with the removed node, but this isn't persisted anywhere, so since you did a full ring restart, the only candidate left is the persisted endpoints, though all nodes should have removed it from there after the decommission/removetoken.

Is there a way I can get a list of endpoints to see how this node showed back up?


Also, any thoughts on why this node only re-appeared on a single node?

Thanks!
Jason

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reassigned CASSANDRA-3243:
-------------------------------------------

    Assignee: Brandon Williams

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Benjamin Coverston (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118360#comment-13118360 ] 

Benjamin Coverston commented on CASSANDRA-3243:
-----------------------------------------------

+1 on the patch
                
> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.7
>
>         Attachments: 3243.txt, locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Harvey updated CASSANDRA-3243:
------------------------------------

    Description: 
I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.

In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.

I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.

Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?

I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

  was:
I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.

In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.

I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.

Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?


> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Harvey updated CASSANDRA-3243:
------------------------------------

    Description: 
I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.

In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.

I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.

Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?

  was:
I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.

In an attempt to clean it out of the dead gossip list so I could truncate, I shut down every node in the ring and brought them all back up. Once they all came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.

I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.

Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?


> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113713#comment-13113713 ] 

Brandon Williams commented on CASSANDRA-3243:
---------------------------------------------

0919 is missing the LOCATION_KEY (the node's own token) which is odd, because cassandra will refuse to startup with this table since it should not exist without this key. It does show itself in the saved endpoints, but no other nodes.

0922 is complete in that it contains LOCATION_KEY and cassandra starts right up with it, and I can see the removed token in the saved endpoints with an ip address of 10.34.22.201.  However the strange thing is the timestamp on that column is approximately 2 days _older_ than the one for the local node itself, which should be impossible.  Is there any chance this node's clock was way off or changed?

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113819#comment-13113819 ] 

Jason Harvey commented on CASSANDRA-3243:
-----------------------------------------

For outside reference:

Brandon recommended I delete the 'Ring' key from the LocationInfo on this node and then restart to resolve the weirdness.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112382#comment-13112382 ] 

Jason Harvey commented on CASSANDRA-3243:
-----------------------------------------

Looking through the logs, the node which saw the decommissioned node didn't print anything about discovering it via gossip. The very first log line I have regarding the phantom node is when I forced a removetoken.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118490#comment-13118490 ] 

Hudson commented on CASSANDRA-3243:
-----------------------------------

Integrated in Cassandra-0.8 #351 (See [https://builds.apache.org/job/Cassandra-0.8/351/])
    Flush system table after updating or removing tokens.
Patch by brandonwilliams, reviewed by Ben Coverston for CASSANDRA-3243

brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177810
Files : 
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/SystemTable.java

                
> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.7
>
>         Attachments: 3243.txt, locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112970#comment-13112970 ] 

Brandon Williams commented on CASSANDRA-3243:
---------------------------------------------

bq. Is there a way I can get a list of endpoints to see how this node showed back up?

It must be a saved endpoint, if you can attached the latest LocationInfo sstable from that machine (and tell me the IP and/or token) I can take a look.

bq. Also, any thoughts on why this node only re-appeared on a single node?

I'm not sure, let's see if it was still persisted first.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Harvey updated CASSANDRA-3243:
------------------------------------

    Attachment: locationinfo_0919.tgz

LocationInfo from the node which re-added the dead node back to the ring.

This LocationInfo is from *after* the decommission of the phantom node, but before the restart which resulted in the node re-adding the phantom node.

The phantom node's token was 120000000000000000000000000000000000000.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: locationinfo_0919.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113812#comment-13113812 ] 

Jason Harvey edited comment on CASSANDRA-3243 at 9/23/11 10:30 PM:
-------------------------------------------------------------------

Clock on these boxes seems fine. We keep ntpd running at all times. I've also verified via logging that it has been consistent.

How do I go about getting that endpoint *out* of the LocationInfo?

      was (Author: alienth):
    Clock on these boxes seems fine. We keep ntpd running at all times. I've also verified via logging that it has been consistent.
  
> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3243:
----------------------------------------

    Fix Version/s: 0.8.7

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.7
>
>         Attachments: 3243.txt, locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Harvey updated CASSANDRA-3243:
------------------------------------

    Summary: Node which was decommissioned and shut-down reappears on a single node  (was: Node which was decommissioned and shut-down reappears on one node on a single node)

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down every node in the ring and brought them all back up. Once they all came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3243:
----------------------------------------

    Attachment: 3243.txt

The best explanation I have for how you could get here is that SystemTable only forces a flush on the updateToken(Token token) signature.  removeToken and updateToken(InetAddress ep, Token token) do not, so if the machine is restarted before the commitlog is synced, the update/removal can be lost.  Patch to address this.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: 3243.txt, locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113812#comment-13113812 ] 

Jason Harvey commented on CASSANDRA-3243:
-----------------------------------------

Clock on these boxes seems fine. We keep ntpd running at all times. I've also verified via logging that it has been consistent.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3243:
--------------------------------------

    Reviewer: bcoverston
    
> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.7
>
>         Attachments: 3243.txt, locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112460#comment-13112460 ] 

Brandon Williams commented on CASSANDRA-3243:
---------------------------------------------

Can you explain what you mean by "dead gossip list" and how this prevents truncate?

bq. Where might the info on this decommissioned node be being stored?

After CASSANDRA-2496, we store dead gossip states for 3 days, so that any other nodes that were down at the time of removal can know later not to repopulate the ring with the removed node, but this isn't persisted anywhere, so since you did a full ring restart, the only candidate left is the persisted endpoints, though all nodes should have removed it from there after the decommission/removetoken.

bq. Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?

No, HH will only attempt delivery on an onAlive event, and it doesn't inject any gossip states.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3243) Node which was decommissioned and shut-down reappears on a single node

Posted by "Jason Harvey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Harvey updated CASSANDRA-3243:
------------------------------------

    Attachment: locationinfo_0922.tgz

LocationInfo from the node which re-added the dead node back to the ring.

This LocationInfo is from *after* the restart which resulted in the phantom node re-appearing. It is also after the forced token removal of the phantom node.

The phantom node's token was 120000000000000000000000000000000000000.

> Node which was decommissioned and shut-down reappears on a single node
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3243
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: locationinfo_0919.tgz, locationinfo_0922.tgz
>
>
> I decommissioned a node several days ago. It was no longer in the ring list on any node in the ring. However, it was in the dead gossip list.
> In an attempt to clean it out of the dead gossip list so I could truncate, I shut down the entire ring and bought it back up. Once the ring came back up, one node showed the decommissioned node as still in the ring in a state of 'Down'. No other node in the ring shows this info.
> I successfully ran removetoken on the node to get that phantom node out. However, it is back in the dead gossip list, preventing me from truncating.
> Where might the info on this decommissioned node be being stored? Is HH possibly trying to deliver to the removed node, thus putting it back in the ring on one node?
> I find it extremely curious that none of the other nodes in the ring showed the phantom node. Shouldn't gossip have propagated the node everywhere, even if it was down?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira