You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ryan Daum (JIRA)" <ji...@apache.org> on 2010/01/18 22:10:54 UTC

[jira] Created: (CASSANDRA-713) Stacktrace when node taken offline

Stacktrace when node taken offline
----------------------------------

                 Key: CASSANDRA-713
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.5
            Reporter: Ryan Daum


I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:

ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
java.lang.NullPointerException
        at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
        at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
        at java.util.Collections.indexedBinarySearch(Collections.java:215)
        at java.util.Collections.binarySearch(Collections.java:201)
        at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
        at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
        at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
        at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
        at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
        at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
        at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

In addition, I get frequent UnavailableExceptions on the other nodes.

I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Ryan Daum (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806374#action_12806374 ] 

Ryan Daum commented on CASSANDRA-713:
-------------------------------------

The reason I think this is a bug, and a critical one, is two things:

   * With a RF of 3 and a cluster size of 6, I would not expect client insert operations to fail with a loss of a single node.
   * Further, the other nodes should notice via gossip that the 6th node has gone offline after a few seconds and stop trying to send traffic to it

What I have now is a node that is not really a member of the cluster proper, but that I can't take offline without making the entire cluster useless.

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Ryan Daum (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806371#action_12806371 ] 

Ryan Daum commented on CASSANDRA-713:
-------------------------------------

I am getting the same exception on the same configuration (RF 3, RandomPartitioner, 6 nodes, about 5-10gigs of data per node) again with 0.5 final from debian packages when I do the following:

1. run 'loadbalance' on 6th node
2. see, after a reasonable amount of time: 

 INFO [STREAM-STAGE:1] 2010-01-28 18:32:26,624 BootStrapper.java (line 119) New token will be 62831091626283968915592956651596253668 to assume load from /10.252.90.224
 INFO [STREAM-STAGE:1] 2010-01-28 18:32:26,624 StorageService.java (line 1392) re-bootstrapping to new token 62831091626283968915592956651596253668
 INFO [STREAM-STAGE:1] 2010-01-28 18:32:26,625 StorageService.java (line 342) bootstrap sleeping 30000

3. wait what seems a very unreasonable amount of hours (in this case 24 hours, local node had 5 gigs of data in it). node has not rejoined ring with a new token range.  get very concerned nothing is happening.
4. try taking node offline, all other nodes in the cluster now complain incessantly with the same stack trace reported originally.
5. notice that all writing clients now get exceptions on _all_ writes:

Caused by: org.apache.thrift.TApplicationException: Internal error processing insert
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:107)
        at org.apache.cassandra.service.Cassandra$Client.recv_insert(Cassandra.java:569)
        at org.apache.cassandra.service.Cassandra$Client.insert(Cassandra.java:547)

6. restarting the node makes the exceptions go away and reports:

 INFO [STREAM-STAGE:1] 2010-01-28 18:32:26,624 BootStrapper.java (line 119) New token will be 62831091626283968915592956651596253668 to assume load from /10.252.90.224
 INFO [STREAM-STAGE:1] 2010-01-28 18:32:26,624 StorageService.java (line 1392) re-bootstrapping to new token 62831091626283968915592956651596253668
 INFO [STREAM-STAGE:1] 2010-01-28 18:32:26,625 StorageService.java (line 342) bootstrap sleeping 30000

but if past experience is any judge, this will never lead to this node rejoining the ring.

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Ryan Daum (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803120#action_12803120 ] 

Ryan Daum commented on CASSANDRA-713:
-------------------------------------

Replication strategy is RackUnaware, replication factor 3, RandomPartitioner, 6 nodes.

I did restart the nodes at least partially one by one, so that may be part of the issue.

Today I had the necessity to remove a bunch of data we are no longer using, so I used that opportunity to bring all nodes down, delete the largest keyspace and all commitlogs (didn't need the data in them), and bring them back up in an orderly fashion. This time the errant node bootstrapped correctly, so I assume this is related to gossip holding onto a memory of the node.

Still, this seems like a bug -- I just wish I could give you a better recipe on how to reproduce it.

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Ryan Daum (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832092#action_12832092 ] 

Ryan Daum commented on CASSANDRA-713:
-------------------------------------

I was able to rescue myself from this situation by restarting the node, so that does sound like a plausible situation.

If I run into this again I will get you a thread stack dump from the running process.

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Jaakko Laine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802083#action_12802083 ] 

Jaakko Laine commented on CASSANDRA-713:
----------------------------------------

Let me try to meditate after two cups of coffee. That usually helps :)

In the meantime a question for Ryan: Did you restart nodes one by one or did you stop the whole cluster and only then restart nodes?


> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Jaakko Laine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802083#action_12802083 ] 

Jaakko Laine edited comment on CASSANDRA-713 at 1/19/10 4:34 AM:
-----------------------------------------------------------------

Let me try to meditate after two cups of coffee. That usually helps :)

In the meantime a question for Ryan: Did you restart nodes one by one or did you stop the whole cluster and only then restart nodes?

Edit: Also, which replication strategy you're using?


      was (Author: jaakko):
    Let me try to meditate after two cups of coffee. That usually helps :)

In the meantime a question for Ryan: Did you restart nodes one by one or did you stop the whole cluster and only then restart nodes?

  
> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832090#action_12832090 ] 

Gary Dusbabek commented on CASSANDRA-713:
-----------------------------------------

Ryan,  if you're still running into this, can you send the stuck process a SIGQUIT (kill -3) to get a nice dump of all the threads?  I suspect a few threads are deadlocked.

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-713:
-------------------------------------

    Fix Version/s: 0.5
         Assignee: Jaakko Laine

Any psychic insight, Jaakko? :)

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek resolved CASSANDRA-713.
-------------------------------------

    Resolution: Cannot Reproduce

Closing for now since we cannot duplicate the problem.

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803728#action_12803728 ] 

Jonathan Ellis commented on CASSANDRA-713:
------------------------------------------

+1

are you comfortable committing this to 0.5 branch?

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Jaakko Laine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802099#action_12802099 ] 

Jaakko Laine commented on CASSANDRA-713:
----------------------------------------

two additional things:

How many nodes do you have and what replication factor you are using? If replication factor is smaller than total amount of nodes, could you please verify that this happens on all nodes?


> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832207#action_12832207 ] 

Jonathan Ellis commented on CASSANDRA-713:
------------------------------------------

(This may have been caused by CASSANDRA-778, in which case, it is also fixed now.)

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-713) Stacktrace when node taken offline

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-713:
-------------------------------------

    Comment: was deleted

(was: +1

are you comfortable committing this to 0.5 branch?)

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.