You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Chris Burroughs (JIRA)" <ji...@apache.org> on 2013/10/07 17:38:43 UTC
[jira] [Commented] (CASSANDRA-5815) NPE from migration manager
[ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788228#comment-13788228 ]
Chris Burroughs commented on CASSANDRA-5815:
--------------------------------------------
I'm seeing an NPE in migration manager in 1.2.9 and what I think is the same spot (line numbers changed slightly since July). This occurs on at least one node every time (about 10 attempts) I try to bootstrap with a 2 dc production cluster using the GPFS w/ reconnecting.
{noformat}
ERROR [OptionalTasks:1] 2013-10-07 08:06:05,658 CassandraDaemon.java (line 194) Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:130)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
I added a log message to confirm that Gossiper really really thinks it's not there (off of the 1.2.10 tag if that matters). I'm suspicious of this being a timing problem the reconnect dance, but I'm not sure how to prove or disprove that.
{noformat}
logger.warn("[csb] Trying to get endpoint state for {} ; exists {}", new Object[] {endpoint, Gossiper.instance.isKnownEndpoint(endpoint)});
INFO [GossipTasks:1] 2013-10-07 11:19:10,565 Gossiper.java (line 803) InetAddress /208.49.103.36 is now DOWN
INFO [GossipTasks:1] 2013-10-07 11:19:13,572 Gossiper.java (line 608) FatClient /208.49.103.36 has been silent for 30000ms, removing from gossip
INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:13,863 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36
INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:15,275 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36
WARN [OptionalTasks:1] 2013-10-07 11:19:36,696 MigrationManager.java (line 130) [csb] Trying to get endpoint state for /208.49.103.36 ; exists false
ERROR [OptionalTasks:1] 2013-10-07 11:19:36,696 CassandraDaemon.java (line 193) Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:131)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
> NPE from migration manager
> --------------------------
>
> Key: CASSANDRA-5815
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5815
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.1.12
> Reporter: Vishy Kasar
> Assignee: Brandon Williams
> Priority: Minor
>
> In one of our production clusters we see this error often. Looking through the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is returning null for some end point. De we need any config change on our end to resolve this? In any case, cassandra should be updated to protect against this NPE.
> ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.NullPointerException
> at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> It turned out that the reason for NPE was we bootstrapped a node with the same token as another node. Cassandra should not throw an NPE here but log a meaningful error message.
--
This message was sent by Atlassian JIRA
(v6.1#6144)