You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Lerman (JIRA)" <ji...@apache.org> on 2010/06/24 01:56:50 UTC
[jira] Commented: (CASSANDRA-713) Stacktrace when node taken offline

    [ https://issues.apache.org/jira/browse/CASSANDRA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881966#action_12881966 ] 

Jeff Lerman commented on CASSANDRA-713:
---------------------------------------

Hi all,

I just had this happen in Cassandra 0.6.1.    We're only running two nodes as of now and our second one was barely accepting any requests and only being replicated to for the most part.   The load went up to 9 consistently so we investigated and noticed its "Load" on nodetool was 2x as large as our other instance.   I went and cleared out the data and commitlogs, set autobootstrap to true and put it back in.

This is where our case gets funky...we noticed the other instance's load going up a lot and saw that the one I just readded was not doing much.  After awhile of contemplating, I took down the second one again.  Minutes later I found an open case about the anticompaction happening before full bootstrapping occurs.  I found the data/stream dir on the working instance and saw that it was complete...but I had already taken down the second one!  So I deleted the stream dir to save space and figured I'd start the process again tomorrow.

A few hours later I am getting these Internal errors on writes:


ERROR [pool-1-thread-287117] 2010-06-23 19:16:51,754 Cassandra.java (line 1492) Internal error processing insert
java.lang.NullPointerException

The cassandra is still running, so I could sigquit it if anyone is interested in this mystery.

Thanks,

Jeff

> Stacktrace when node taken offline
> ----------------------------------
>
>                 Key: CASSANDRA-713
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-713
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Ryan Daum
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> I took a node offline last week and then attempted to re-bootstrap its token range with a new cassandra install on the same IP. I made gossip forget about the node by restarting all other instances, then brought up the new node. It said was bootstrapping, but it never finished bootstrapping after several days. The node never showed up in the ring, but when I take it offline, I get the following exception continually from all other nodes in the cluster:
> ERROR [pool-1-thread-8] 2010-01-18 21:01:32,405 Cassandra.java (line 1096) Internal error processing batch_insert
> java.lang.NullPointerException
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:38)
>         at org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:23)
>         at java.util.Collections.indexedBinarySearch(Collections.java:215)
>         at java.util.Collections.binarySearch(Collections.java:201)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:130)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:76)
>         at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1183)
>         at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>         at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>         at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>         at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>         at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> In addition, I get frequent UnavailableExceptions on the other nodes.
> I cannot remove the token range for this node because it never officially joined the ring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.