You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Matthew F. Dennis (JIRA)" <ji...@apache.org> on 2010/10/28 00:28:21 UTC
[jira] Created: (CASSANDRA-1670) cannot decom a node then bring it
back to the cluster
cannot decom a node then bring it back to the cluster
-----------------------------------------------------
Key: CASSANDRA-1670
URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
Project: Cassandra
Issue Type: Bug
Affects Versions: 0.7 beta 2
Environment: RAX
Reporter: Matthew F. Dennis
two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
One of two things happen:
* node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
* both node0 and node1 think they are in rings by themselves
If you restart node0 after decom, it appears to work normally.
Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927025#action_12927025 ]
Jonathan Ellis commented on CASSANDRA-1670:
-------------------------------------------
how does moving the removal out of the for loop fix the state-attached-to-it problem?
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot decom a node then bring
it back to the cluster
Posted by "Mike Bulman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926469#action_12926469 ]
Mike Bulman commented on CASSANDRA-1670:
----------------------------------------
Because move is decommission+bootstrap, the same behavior occurs when moving node1 as well.
> cannot decom a node then bring it back to the cluster
> -----------------------------------------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Environment: RAX
> Reporter: Matthew F. Dennis
> Priority: Minor
> Fix For: 0.7.1
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927661#action_12927661 ]
Jonathan Ellis commented on CASSANDRA-1670:
-------------------------------------------
bq. The old code will only remove a node from justRemovedEndpoints_ if it currently exists in endpointStateMap_
Isn't it "remove from jRE if _any_ [other] node exists in eSM?" Which means this is only a bug in 2-node clusters?
+1 if so, just trying to understand.
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Mike Bulman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927652#action_12927652 ]
Mike Bulman commented on CASSANDRA-1670:
----------------------------------------
Ok got it working properly. Patch fixes the issue described.
As a node, that patch doesn't work in 0.7 branch because justRemovedEndPoints_ (.6) is now justRemovedEndpoints (.7). Not sure how you guys handle that, but the change is simple enough.
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Dusbabek updated CASSANDRA-1670:
-------------------------------------
Fix Version/s: 0.7.0
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Dusbabek updated CASSANDRA-1670:
-------------------------------------
Attachment: 1670-0.6.txt
The code that removes endpoints from Gossiper.justRemovedEndpoints after RING_DELAY was only getting called if the endpoint had a state attached to it. Since state is removed for decommissioned nodes, the code was never getting called.
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7
>
> Attachments: 1670-0.6.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927034#action_12927034 ]
Gary Dusbabek commented on CASSANDRA-1670:
------------------------------------------
When a node is decommissioned, it gets added to justRemovedEndpoints_, but removed from endpointStateMap_. The old code will only remove a node from justRemovedEndpoints_ if it currently exists in endpointStateMap_. If the node stays in justRemovedEndpoints_ (which it will currently), it can never be recognized as part of the ring because of the check in Gossiper.handleNewJoin().
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Mike Bulman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927198#action_12927198 ]
Mike Bulman commented on CASSANDRA-1670:
----------------------------------------
Running from nodetool as well as ripcord decommission code (direct call to StorageService) gets:
Exception in thread "main" java.lang.AssertionError
at org.apache.cassandra.service.StorageService.getLocalToken(StorageService.java:1128)
at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1527)
at org.apache.cassandra.service.StorageService.decommission(StorageService.java:1546)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927840#action_12927840 ]
Gary Dusbabek commented on CASSANDRA-1670:
------------------------------------------
bq. sn't it "remove from jRE if any [other] node exists in eSM?" Which means this is only a bug in 2-node clusters?
Yes. good observation. I'll only bother committing this to 0.7/trunk then.
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Mike Bulman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927463#action_12927463 ]
Mike Bulman commented on CASSANDRA-1670:
----------------------------------------
That's all I get. The only other thing I can add is that the node being decommissioned logs "DECOMMISSIONING" when level is set to DEBUG, and that the exception comes back almost immediately. fwiw, I'm running this on r1029870 of the .7 branch
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Dusbabek reassigned CASSANDRA-1670:
----------------------------------------
Assignee: Gary Dusbabek
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.7.0
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927471#action_12927471 ]
Gary Dusbabek commented on CASSANDRA-1670:
------------------------------------------
There should be more. What I'm looking for is some indication that the node is not in the middle of a bootstrap operation, which would trigger this exception.
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Mike Bulman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927484#action_12927484 ]
Mike Bulman commented on CASSANDRA-1670:
----------------------------------------
INFO 15:48:22,176 Joining: getting load information
INFO 15:48:22,177 Sleeping 90000 ms to wait for load information...
DEBUG 15:48:23,053 GC for ParNew: 12 ms, 15822112 reclaimed leaving 15102720 used; max is 1268449280
DEBUG 15:48:23,166 attempting to connect to /184.106.231.n0
INFO 15:48:23,553 Node /184.106.231.n0 is now part of the cluster
DEBUG 15:48:23,554 Resetting pool for /184.106.231.n0
DEBUG 15:48:23,559 Node /184.106.231.n0 state normal, token 104110673354167092736227093944218730763
DEBUG 15:48:23,559 New node /184.106.231.n0 at token 104110673354167092736227093944218730763
DEBUG 15:48:23,559 clearing cached endpoints
DEBUG 15:48:24,167 attempting to connect to /184.106.231.n0
DEBUG 15:48:24,169 Disseminating load info ...
INFO 15:48:24,554 InetAddress /184.106.231.n0 is now UP
INFO 15:48:24,554 Started hinted handoff for endpoint /184.106.231.n0
INFO 15:48:24,557 Finished hinted handoff of 0 rows to endpoint /184.106.231.n0
DEBUG 15:49:24,169 Disseminating load info ...
DEBUG 15:49:39,106 GC for ParNew: 16 ms, 16111512 reclaimed leaving 85537648 used; max is 1268449280
DEBUG 15:49:52,177 ... got load info
INFO 15:49:52,177 Joining: getting bootstrap token
DEBUG 15:49:52,183 attempting to connect to /184.106.231.n0
DEBUG 15:49:52,191 Processing response on a callback from 270@/184.106.231.n0
INFO 15:49:52,192 New token will be 19040081623932476870383442086276677899 to assume load from /184.106.231.n0
DEBUG 15:49:52,193 clearing cached endpoints
DEBUG 15:49:52,194 Will try to load mx4j now, if it's in the classpath
INFO 15:49:52,194 Will not load MX4J, mx4j-tools.jar is not in the classpath
INFO 15:49:52,220 Binding thrift service to /0.0.0.0:9160
INFO 15:49:52,222 Using TFramedTransport with a max frame size of 15728640 bytes.
INFO 15:49:52,226 Listening for thrift clients...
DEBUG 15:50:24,170 Disseminating load info ...
DEBUG 15:50:52,247 DECOMMISSIONING
DEBUG 15:51:24,170 Disseminating load info ...
DEBUG 15:52:24,171 Disseminating load info ...
>From n0:
root@ripcord:/usr/src/cassandra/branches/cassandra-0.7# bin/nodetool -h 184.106.231.n0 ring
Address Status State Load Token
104110673354167092736227093944218730763
184.106.228.n1 Up Normal 5.3 KB 19040081623932476870383442086276677899
184.106.231.n0 Up Normal 10.27 KB 104110673354167092736227093944218730763
root@ripcord:/usr/src/cassandra/branches/cassandra-0.7# bin/nodetool -h 184.106.228.n1 decommission
<TRACE FROM MY PREVIOUS COMMENT>
root@ripcord:/usr/src/cassandra/branches/cassandra-0.7# bin/nodetool -h 184.106.231.n0 ring
Address Status State Load Token
104110673354167092736227093944218730763
184.106.228.n1 Up Normal 5.3 KB 19040081623932476870383442086276677899
184.106.231.n0 Up Normal 10.27 KB 104110673354167092736227093944218730763
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927422#action_12927422 ]
Gary Dusbabek commented on CASSANDRA-1670:
------------------------------------------
Mike, can you provide a more complete log? That trace is unrelated to the patch and likely indicates a different problem.
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Dusbabek updated CASSANDRA-1670:
-------------------------------------
Attachment: v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7, 0.7.0
>
> Attachments: 1670-0.6.txt, v1-0001-code-that-tidied-Gossiper.justRemovedEndpoints_-was-no.txt
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1670) cannot decom a node then bring it
back to the cluster
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-1670:
--------------------------------------
Priority: Minor (was: Major)
Affects Version/s: (was: 0.7 beta 2)
Fix Version/s: 0.7.1
> cannot decom a node then bring it back to the cluster
> -----------------------------------------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Environment: RAX
> Reporter: Matthew F. Dennis
> Priority: Minor
> Fix For: 0.7.1
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1670) cannot move a node
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-1670:
--------------------------------------
Component/s: Core
Priority: Major (was: Minor)
Fix Version/s: (was: 0.7.1)
0.7.0
Summary: cannot move a node (was: cannot decom a node then bring it back to the cluster)
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: RAX
> Reporter: Matthew F. Dennis
> Fix For: 0.7.0
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1670) cannot move a node
Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Dusbabek updated CASSANDRA-1670:
-------------------------------------
Affects Version/s: 0.6.6
Fix Version/s: (was: 0.7.0)
0.6.7
Looks like this bug goes all the way back to 0.6.
> cannot move a node
> ------------------
>
> Key: CASSANDRA-1670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.6
> Environment: RAX
> Reporter: Matthew F. Dennis
> Assignee: Gary Dusbabek
> Fix For: 0.6.7
>
>
> two node cluster (node0, node1). node0 is listed as the only seed on both nodes. Listen addresses explicitly set to an IP on both nodes. No initial token, no autobootstrap (but see below). Bring up the ring. Everything is fine on both nodes.
> decom node1. verify decom completed correctly by reading the logs on both nodes. rm all data/logs on node1. bring node1 up again.
> One of two things happen:
> * node0 thinks it is in a ring by itself, node1 thinks both nodes are in the ring.
> * both node0 and node1 think they are in rings by themselves
> If you restart node0 after decom, it appears to work normally.
> Similar issues seem to present if you kill node1 (either when autobootstrapping before it completes or after it is in the ring) and removetoken.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.