You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Rick Branson (JIRA)" <ji...@apache.org> on 2014/08/13 21:18:14 UTC

[jira] [Commented] (CASSANDRA-7246) Gossip Null Pointer Exception when a cassandra instance in ring is restarted

    [ https://issues.apache.org/jira/browse/CASSANDRA-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095965#comment-14095965 ] 

Rick Branson commented on CASSANDRA-7246:
-----------------------------------------

Got some similar NPEs on restart on a node. This resulted in this node having a broken gossip table in memory (ugh) that didn't get fixed until a restart. This host was running 1.2.18. Presumably this is getEndpointStateForEndpoint returning null inside of getHostId.

{code}
java.lang.NullPointerException
        at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:698)
        at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1521)
        at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1341)
        at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:975)
        at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:966)
        at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:924)
        at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

{code}
java.lang.NullPointerException
        at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:698)
        at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1521)
        at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1341)
        at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2033)
        at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:863)
        at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:914)
        at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

> Gossip Null Pointer Exception when a cassandra instance in ring is restarted
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7246
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7246
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 12 node ring of 1.2.x.
> 11 of 12 are 1.2.15.
> 1 is 1.2.16.
>            Reporter: Peter Haggerty
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: gossip, nullpointerexception
>         Attachments: 7246.txt
>
>
> 12 Cassandra instances, one per node.
> 11 of the Cassandra instances are 1.2.15.
> 1 of the Cassandra instances is 1.2.16.
> One of the eleven 1.2.15 Cassandra instances is restarted (disable thrift, gossip, then flush, drain, stop, start).
> The 1.2.16 Cassandra instance noted this by throwing a Null Pointer Exception. None of the 1.2.15 instances threw an exception and this is new behavior that hasn't been observed before.
> ERROR 02:18:06,009 Exception in thread Thread[GossipStage:1,5,main]
> java.lang.NullPointerException
>         at org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:264)
>         at org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:246)
>         at org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>  INFO 02:18:23,402 Node /10.x.y.x is now part of the cluster
>  INFO 02:18:23,403 InetAddress /10.x.y.z is now UP
>  INFO 02:18:53,494 FatClient /10.x.y.z has been silent for 30000ms, removing from gossip
>  INFO 02:19:00,031 Handshaking version with /10.x.y.z



--
This message was sent by Atlassian JIRA
(v6.2#6252)