You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Chris Burroughs (JIRA)" <ji...@apache.org> on 2013/10/07 17:38:43 UTC
[jira] [Commented] (CASSANDRA-5815) NPE from migration manager

    [ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788228#comment-13788228 ] 

Chris Burroughs commented on CASSANDRA-5815:
--------------------------------------------

I'm seeing an NPE in migration manager in 1.2.9 and what I think is the same spot (line numbers changed slightly since July).  This occurs on at least one node every time (about 10 attempts) I try to bootstrap with a 2 dc production cluster using the GPFS w/ reconnecting.

{noformat}
ERROR [OptionalTasks:1] 2013-10-07 08:06:05,658 CassandraDaemon.java (line 194) Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
        at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:130)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

I added a log message to confirm that Gossiper really really thinks it's not there (off of the 1.2.10 tag if that matters).  I'm suspicious of this being a timing problem the reconnect dance, but I'm not sure how to prove or disprove that.

{noformat}
                    logger.warn("[csb] Trying to get endpoint state for {} ; exists {}", new Object[] {endpoint, Gossiper.instance.isKnownEndpoint(endpoint)});

 INFO [GossipTasks:1] 2013-10-07 11:19:10,565 Gossiper.java (line 803) InetAddress /208.49.103.36 is now DOWN
 INFO [GossipTasks:1] 2013-10-07 11:19:13,572 Gossiper.java (line 608) FatClient /208.49.103.36 has been silent for 30000ms, removing from gossip
 INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:13,863 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36
 INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:15,275 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36
 WARN [OptionalTasks:1] 2013-10-07 11:19:36,696 MigrationManager.java (line 130) [csb] Trying to get endpoint state for /208.49.103.36 ; exists false
ERROR [OptionalTasks:1] 2013-10-07 11:19:36,696 CassandraDaemon.java (line 193) Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
        at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:131)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

> NPE from migration manager
> --------------------------
>
>                 Key: CASSANDRA-5815
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5815
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.12
>            Reporter: Vishy Kasar
>            Assignee: Brandon Williams
>            Priority: Minor
>
> In one of our production clusters we see this error often. Looking through the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is returning null for some end point. De we need any config change on our end to resolve this? In any case, cassandra should be updated to protect against this NPE.
> ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[OptionalTasks:1,5,main] 
> java.lang.NullPointerException 
> at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) 
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:662)
> It turned out that the reason for NPE was we bootstrapped a node with the same token as another node. Cassandra should not throw an NPE here but log a meaningful error message. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)