You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Matthias Keller (JIRA)" <ji...@apache.org> on 2011/02/22 12:06:38 UTC

[jira] Created: (CASSANDRA-2214) Bootstrap Token collision after nodetool loadbalance

Bootstrap Token collision after nodetool loadbalance
----------------------------------------------------

                 Key: CASSANDRA-2214
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7.2
            Reporter: Matthias Keller


I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
Then I added a third node (bootstrapped it):
{noformat}
Address         Status State   Load            Owns    Token
                                                       101483442157567999664061592210059906302
10.0.0.2        Up     Normal  320.42 KB       50.00%  16412850427333383798217940352117853438
10.0.0.3        Up     Normal  341.53 KB       26.25%  61078635599166706937511052402724559481
10.0.0.1        Up     Normal  321.3 KB        23.75%  101483442157567999664061592210059906302
{noformat}

Then I wanted to re-balance node 2 (10.0.0.2), so I issued the  loadbalance  command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
{quote}
ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
java.lang.RuntimeException: java.lang.NullPointerException
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.*NullPointerException*
        at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
        at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
        at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
        at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more
{quote}

And a second later, this on node 1 (10.0.0.1):
{quote}
ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
        at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
        at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
        at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
        at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
        at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
        at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
        at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
        at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{quote}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2214) Bootstrap Token collision after nodetool loadbalance

Posted by "Jaakko Laine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005463#comment-13005463 ] 

Jaakko Laine commented on CASSANDRA-2214:
-----------------------------------------

Do you have 10.0.0.2 logs from loadbalance period (and especially the parts when it chose the new token)?


> Bootstrap Token collision after nodetool loadbalance
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.2
>            Reporter: Matthias Keller
>            Assignee: Nick Bailey
>
> I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
> Then I added a third node (bootstrapped it):
> {noformat}
> Address         Status State   Load            Owns    Token
>                                                        101483442157567999664061592210059906302
> 10.0.0.2        Up     Normal  320.42 KB       50.00%  16412850427333383798217940352117853438
> 10.0.0.3        Up     Normal  341.53 KB       26.25%  61078635599166706937511052402724559481
> 10.0.0.1        Up     Normal  321.3 KB        23.75%  101483442157567999664061592210059906302
> {noformat}
> Then I wanted to re-balance node 2 (10.0.0.2), so I issued the  loadbalance  command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
> {quote}
> ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.*NullPointerException*
>         at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
>         at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
>         at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>         at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> {quote}
> And a second later, this on node 1 (10.0.0.1):
> {quote}
> ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
> java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
>         at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
>         at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
>         at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
>         at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
>         at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
>         at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
>         at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (CASSANDRA-2214) Bootstrap Token collision after nodetool loadbalance

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2214:
-----------------------------------------

    Assignee: Nick Bailey

> Bootstrap Token collision after nodetool loadbalance
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.2
>            Reporter: Matthias Keller
>            Assignee: Nick Bailey
>
> I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
> Then I added a third node (bootstrapped it):
> {noformat}
> Address         Status State   Load            Owns    Token
>                                                        101483442157567999664061592210059906302
> 10.0.0.2        Up     Normal  320.42 KB       50.00%  16412850427333383798217940352117853438
> 10.0.0.3        Up     Normal  341.53 KB       26.25%  61078635599166706937511052402724559481
> 10.0.0.1        Up     Normal  321.3 KB        23.75%  101483442157567999664061592210059906302
> {noformat}
> Then I wanted to re-balance node 2 (10.0.0.2), so I issued the  loadbalance  command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
> {quote}
> ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.*NullPointerException*
>         at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
>         at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
>         at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>         at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> {quote}
> And a second later, this on node 1 (10.0.0.1):
> {quote}
> ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
> java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
>         at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
>         at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
>         at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
>         at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
>         at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
>         at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
>         at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {quote}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2214) Bootstrap Token collision after nodetool loadbalance

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2214:
--------------------------------------

    Priority: Minor  (was: Major)
    Assignee:     (was: Nick Bailey)

Not sure how this could happen -- StorageService.getBootstrapToken only samples keys from a node's own partitioner range, so even if you haven't run cleanup between node movement you should never be able to get a token belonging to another node out of it.

Are you able to reproduce?

> Bootstrap Token collision after nodetool loadbalance
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.2
>            Reporter: Matthias Keller
>            Priority: Minor
>
> I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
> Then I added a third node (bootstrapped it):
> {noformat}
> Address         Status State   Load            Owns    Token
>                                                        101483442157567999664061592210059906302
> 10.0.0.2        Up     Normal  320.42 KB       50.00%  16412850427333383798217940352117853438
> 10.0.0.3        Up     Normal  341.53 KB       26.25%  61078635599166706937511052402724559481
> 10.0.0.1        Up     Normal  321.3 KB        23.75%  101483442157567999664061592210059906302
> {noformat}
> Then I wanted to re-balance node 2 (10.0.0.2), so I issued the  loadbalance  command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
> {quote}
> ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.*NullPointerException*
>         at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
>         at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
>         at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>         at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> {quote}
> And a second later, this on node 1 (10.0.0.1):
> {quote}
> ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
> java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
>         at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
>         at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
>         at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
>         at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
>         at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
>         at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
>         at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira