You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Russell Alexander Spitzer (JIRA)" <ji...@apache.org> on 2013/12/13 20:43:08 UTC

[jira] [Updated] (CASSANDRA-6485) NPE in calculateNaturalEndpoints

     [ https://issues.apache.org/jira/browse/CASSANDRA-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Russell Alexander Spitzer updated CASSANDRA-6485:
-------------------------------------------------

    Description: 
I was running a test where I added a new data center to an existing cluster. 

Test outline:
Start 25 Node DC1
Keyspace Setup Replication 3
Begin insert against DC1 Using Stress
While the inserts are occuring
Start up 25 Node DC2
Alter Keyspace to include Replication in 2nd DC
Run rebuild on DC2
Wait for stress to finish
Run repair on Cluster
... Some other operations

Although there are no issues with smaller clusters or clusters without vnodes, Larger setups with vnodes seem to consistently see the following exception in the logs as well as a write operation failing for each exception. Usually this happens between 1-8 times during an experiment. 

The exceptions/failures are Occurring when DC2 is brought online but *before* any alteration of the Keyspace. All of the exceptions are happening on DC1 nodes. One of the exceptions occurred on a seed node though this doesn't seem to be the case most of the time. 

While the test was running, nodetool was run every second to get cluster status. At no time did any nodes report themselves as down. 


{code}
ystem_logs-107.21.186.208/system.log-ERROR [Thrift:1] 2013-12-13 06:19:52,647 CustomTThreadPoolServer.java (line 217) Error occurred during processing of message.
system_logs-107.21.186.208/system.log:java.lang.NullPointerException
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:128)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2624)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:190)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678)
system_logs-107.21.186.208/system.log-	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
system_logs-107.21.186.208/system.log-	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
system_logs-107.21.186.208/system.log-	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
system_logs-107.21.186.208/system.log-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
system_logs-107.21.186.208/system.log-	at java.lang.Thread.run(Thread.java:724)
{code}

  was:
I was running a test where I added a new data center to an existing cluster. 

Test outline:
Start 25 Node DC1
Keyspace Setup Replication 3
Begin insert against DC1 Using Stress
While the inserts are occuring
Start up 25 Node DC2
Alter Keyspace to include Replication in 2nd DC
Run rebuild on DC2
Wait for stress to finish
Run repair on Cluster
... Some other operations

Although there are no issues with smaller clusters or clusters without vnodes, Larger setups with vnodes seem to consistently see the following exception in the logs as well as a write operation failing for each exception. 

The exceptions/failures are Occurring when DC2 is brought online but *before* any alteration of the Keyspace. All of the exceptions are happening on DC1 nodes. One of the exceptions occurred on a seed node though this doesn't seem to be the case most of the time. 

While the test was running, nodetool was run every second to get cluster status. At no time did any nodes report themselves as down. 


{code}
ystem_logs-107.21.186.208/system.log-ERROR [Thrift:1] 2013-12-13 06:19:52,647 CustomTThreadPoolServer.java (line 217) Error occurred during processing of message.
system_logs-107.21.186.208/system.log:java.lang.NullPointerException
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:128)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2624)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:190)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678)
system_logs-107.21.186.208/system.log-	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
system_logs-107.21.186.208/system.log-	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
system_logs-107.21.186.208/system.log-	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
system_logs-107.21.186.208/system.log-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
system_logs-107.21.186.208/system.log-	at java.lang.Thread.run(Thread.java:724)
{code}


> NPE in calculateNaturalEndpoints
> --------------------------------
>
>                 Key: CASSANDRA-6485
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6485
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Russell Alexander Spitzer
>
> I was running a test where I added a new data center to an existing cluster. 
> Test outline:
> Start 25 Node DC1
> Keyspace Setup Replication 3
> Begin insert against DC1 Using Stress
> While the inserts are occuring
> Start up 25 Node DC2
> Alter Keyspace to include Replication in 2nd DC
> Run rebuild on DC2
> Wait for stress to finish
> Run repair on Cluster
> ... Some other operations
> Although there are no issues with smaller clusters or clusters without vnodes, Larger setups with vnodes seem to consistently see the following exception in the logs as well as a write operation failing for each exception. Usually this happens between 1-8 times during an experiment. 
> The exceptions/failures are Occurring when DC2 is brought online but *before* any alteration of the Keyspace. All of the exceptions are happening on DC1 nodes. One of the exceptions occurred on a seed node though this doesn't seem to be the case most of the time. 
> While the test was running, nodetool was run every second to get cluster status. At no time did any nodes report themselves as down. 
> {code}
> ystem_logs-107.21.186.208/system.log-ERROR [Thrift:1] 2013-12-13 06:19:52,647 CustomTThreadPoolServer.java (line 217) Error occurred during processing of message.
> system_logs-107.21.186.208/system.log:java.lang.NullPointerException
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:128)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2624)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:190)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:866)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:849)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:749)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3690)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3678)
> system_logs-107.21.186.208/system.log-	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> system_logs-107.21.186.208/system.log-	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> system_logs-107.21.186.208/system.log-	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
> system_logs-107.21.186.208/system.log-	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> system_logs-107.21.186.208/system.log-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> system_logs-107.21.186.208/system.log-	at java.lang.Thread.run(Thread.java:724)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)