You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2013/10/08 17:25:43 UTC

[jira] [Updated] (CASSANDRA-5916) gossip and tokenMetadata get hostId out of sync on failed replace_node with the same IP address

     [ https://issues.apache.org/jira/browse/CASSANDRA-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-5916:
----------------------------------------

    Attachment: 5916-v2.txt

It's not true for replacing, not only because we're down but also because we don't do any pending range announcement since there's no point.

I'd be fine with telling people they need to have a large enough hint window to complete the replace to avoid needing to repair, but we have to spin up 'real' gossip to get the schema anyway, so staying in shadow mode the entire time won't work.

However, there is a relatively simple way to have our cake (automatically extended hint window) and eat it too (be able to retry on failure and not have to specify anything new.)  As soon as we receive the tokens via shadow gossip, we can set them ourselves along with the hibernate state.  When we spin up the full gossip mode to get the schema, we'll be using the same HOST_ID and TOKENS that we grabbed, so if anything goes wrong at that point we can just grab them again next time.

This just leaves the issue of checking that the host is really dead, but this doesn't make any sense when replacing with the same IP anyway, so we can skip it when the addresses match.

v2 does all of this and includes a few other minor cleanups.

> gossip and tokenMetadata get hostId out of sync on failed replace_node with the same IP address
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5916
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5916
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.2.11
>
>         Attachments: 5916.txt, 5916-v2.txt
>
>
> If you try to replace_node an existing, live hostId, it will error out.  However if you're using an existing IP to do this (as in, you chose the wrong uuid to replace on accident) then the newly generated hostId wipes out the old one in TMD, and when you do try to replace it replace_node will complain it does not exist.  Examination of gossipinfo still shows the old hostId, however now you can't replace it either.



--
This message was sent by Atlassian JIRA
(v6.1#6144)