You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Matthias Keller (JIRA)" <ji...@apache.org> on 2011/02/22 12:06:38 UTC
[jira] Created: (CASSANDRA-2214) Bootstrap Token collision after
nodetool loadbalance
Bootstrap Token collision after nodetool loadbalance
----------------------------------------------------
Key: CASSANDRA-2214
URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
Project: Cassandra
Issue Type: Bug
Affects Versions: 0.7.2
Reporter: Matthias Keller
I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
Then I added a third node (bootstrapped it):
{noformat}
Address Status State Load Owns Token
101483442157567999664061592210059906302
10.0.0.2 Up Normal 320.42 KB 50.00% 16412850427333383798217940352117853438
10.0.0.3 Up Normal 341.53 KB 26.25% 61078635599166706937511052402724559481
10.0.0.1 Up Normal 321.3 KB 23.75% 101483442157567999664061592210059906302
{noformat}
Then I wanted to re-balance node 2 (10.0.0.2), so I issued the loadbalance command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
{quote}
ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.*NullPointerException*
at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more
{quote}
And a second later, this on node 1 (10.0.0.1):
{quote}
ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{quote}
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2214) Bootstrap Token collision after
nodetool loadbalance
Posted by "Jaakko Laine (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005463#comment-13005463 ]
Jaakko Laine commented on CASSANDRA-2214:
-----------------------------------------
Do you have 10.0.0.2 logs from loadbalance period (and especially the parts when it chose the new token)?
> Bootstrap Token collision after nodetool loadbalance
> ----------------------------------------------------
>
> Key: CASSANDRA-2214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.7.2
> Reporter: Matthias Keller
> Assignee: Nick Bailey
>
> I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
> Then I added a third node (bootstrapped it):
> {noformat}
> Address Status State Load Owns Token
> 101483442157567999664061592210059906302
> 10.0.0.2 Up Normal 320.42 KB 50.00% 16412850427333383798217940352117853438
> 10.0.0.3 Up Normal 341.53 KB 26.25% 61078635599166706937511052402724559481
> 10.0.0.1 Up Normal 321.3 KB 23.75% 101483442157567999664061592210059906302
> {noformat}
> Then I wanted to re-balance node 2 (10.0.0.2), so I issued the loadbalance command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
> {quote}
> ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.NullPointerException
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.*NullPointerException*
> at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
> at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
> at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
> at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> ... 3 more
> {quote}
> And a second later, this on node 1 (10.0.0.1):
> {quote}
> ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
> java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
> at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
> at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
> at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
> at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
> at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
> at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
> at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (CASSANDRA-2214) Bootstrap Token collision after
nodetool loadbalance
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis reassigned CASSANDRA-2214:
-----------------------------------------
Assignee: Nick Bailey
> Bootstrap Token collision after nodetool loadbalance
> ----------------------------------------------------
>
> Key: CASSANDRA-2214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.7.2
> Reporter: Matthias Keller
> Assignee: Nick Bailey
>
> I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
> Then I added a third node (bootstrapped it):
> {noformat}
> Address Status State Load Owns Token
> 101483442157567999664061592210059906302
> 10.0.0.2 Up Normal 320.42 KB 50.00% 16412850427333383798217940352117853438
> 10.0.0.3 Up Normal 341.53 KB 26.25% 61078635599166706937511052402724559481
> 10.0.0.1 Up Normal 321.3 KB 23.75% 101483442157567999664061592210059906302
> {noformat}
> Then I wanted to re-balance node 2 (10.0.0.2), so I issued the loadbalance command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
> {quote}
> ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.NullPointerException
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.*NullPointerException*
> at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
> at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
> at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
> at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> ... 3 more
> {quote}
> And a second later, this on node 1 (10.0.0.1):
> {quote}
> ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
> java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
> at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
> at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
> at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
> at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
> at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
> at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
> at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {quote}
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2214) Bootstrap Token collision after
nodetool loadbalance
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-2214:
--------------------------------------
Priority: Minor (was: Major)
Assignee: (was: Nick Bailey)
Not sure how this could happen -- StorageService.getBootstrapToken only samples keys from a node's own partitioner range, so even if you haven't run cleanup between node movement you should never be able to get a token belonging to another node out of it.
Are you able to reproduce?
> Bootstrap Token collision after nodetool loadbalance
> ----------------------------------------------------
>
> Key: CASSANDRA-2214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2214
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.7.2
> Reporter: Matthias Keller
> Priority: Minor
>
> I had two nodes for testing. They both owned around 50% of all data, my test CF has RF=2
> Then I added a third node (bootstrapped it):
> {noformat}
> Address Status State Load Owns Token
> 101483442157567999664061592210059906302
> 10.0.0.2 Up Normal 320.42 KB 50.00% 16412850427333383798217940352117853438
> 10.0.0.3 Up Normal 341.53 KB 26.25% 61078635599166706937511052402724559481
> 10.0.0.1 Up Normal 321.3 KB 23.75% 101483442157567999664061592210059906302
> {noformat}
> Then I wanted to re-balance node 2 (10.0.0.2), so I issued the loadbalance command on it. It took quite a while but after leaving the ring and coming back, it seems to have assigned the same token as node 3 (10.0.0.3); I get this on node3:
> {quote}
> ERROR 11:47:09,719 Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.NullPointerException
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.*NullPointerException*
> at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:250)
> at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
> at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
> at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> ... 3 more
> {quote}
> And a second later, this on node 1 (10.0.0.1):
> {quote}
> ERROR 11:47:10,719 Fatal exception in thread Thread[GossipStage:2,5,main]
> java.lang.RuntimeException: Bootstrap Token collision between /10.0.0.3 and /10.0.0.2 (token 61078635599166706937511052402724559481
> at org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:143)
> at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:696)
> at org.apache.cassandra.service.StorageService.onChange(StorageService.java:638)
> at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1114)
> at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:639)
> at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:614)
> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:686)
> at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira