You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Shalom (Jira)" <ji...@apache.org> on 2019/09/23 20:24:00 UTC

[jira] [Commented] (CASSANDRA-15172) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException

    [ https://issues.apache.org/jira/browse/CASSANDRA-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936164#comment-16936164 ] 

Shalom commented on CASSANDRA-15172:
------------------------------------

Hi [~benedict]

I tried the patch. It didn't do the trick, however, I was able to fully reproduce the bug.

tl;dr

The bug does occur when running queries on range tombstones, but on top of that, those queries have to specifically be range queries.

 

*+Steps to Reproduce:
+* 

CREATE KEYSPACE ks1 WITH replication = \{'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = true;

+*TABLE:*+ 
CREATE TABLE ks1.table1 (
 col1 text,
 col2 text,
 col3 text,
 col4 text,
 col5 text,
 col6 timestamp,
 data text,
 PRIMARY KEY ((col1, col2, col3), col4, col5, col6)
);

 

Inserted ~4 million rows and created range tombstones by deleting ~1 million rows.

 

+*Create Data*+

_insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '1', 'a', 12312312300000, 'data');_
_insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '2', 'a', 12312312300000, 'data');_
_insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '3', 'a', 12312312300000, 'data');_
_insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '4', 'a', 12312312300000, 'data');_
_insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '5', 'a', 12312312300000, 'data');_

 

+*Create Range Tombstones*+

delete from ks1.table1 where col1='1' and col2='11' and col3='21' and col4='1';

 

+*Query Live Rows (no tombstones)*+

_select * from ks1.table1 where col1='1' and col2='201' and col3='21' and col4='1' and col5='a' and *col6>12312312300000*;_

No issues found, everything is running properly.

 

+*Query Range Tombstones*+

_select * from ks1.table1 where col1='1' and col2='11' and col3='21' and col4='1' and col5='a' and *col6=12312312300000*;_

No issues found, everything is running properly.

 

+BUT when running range queries:+

_select * from ks1.table1 where col1='1' and col2='11' and col3='21' and col4='1' and col5='a' and *col6>12312312200000;*_

WARN [ReadStage-1] 2019-09-23 14:17:10,281 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-1,5,main]: {}
java.lang.ArrayIndexOutOfBoundsException: 2
 at org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
 at org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545)
 at org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522)
 at org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565)
 at org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446)
 at org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352)
 at org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171)
 at org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77)
 at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802)
 at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953)
 at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929)
 at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:62)
 at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
 at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
 at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114)
 at java.lang.Thread.run(Thread.java:745)

 

This WARN is constantly generated until I stop the range queries script.

Hope this helps..

Thanks!

 

> LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-15172
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15172
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Other
>            Reporter: Shalom
>            Assignee: Benedict
>            Priority: Normal
>             Fix For: 3.0.19, 3.11.5
>
>
> Hi All,
> This is the first time I open an issue, so apologies if I'm not following the rules properly.
>  
> After upgrading a node from version 2.1.21 to 3.11.4, we've started seeing a lot of AbstractLocalAwareExecutorService exceptions. This happened right after the node successfully started up with the new 3.11.4 binaries. 
> {noformat}
> INFO  [main] 2019-06-05 04:41:37,730 Gossiper.java:1715 - No gossip backlog; proceeding
> INFO  [main] 2019-06-05 04:41:38,036 NativeTransportService.java:70 - Netty using native Epoll event loop
> INFO  [main] 2019-06-05 04:41:38,117 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2019-06-05 04:41:38,118 Server.java:156 - Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
> INFO  [main] 2019-06-05 04:41:38,179 CassandraDaemon.java:556 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
> INFO  [Native-Transport-Requests-21] 2019-06-05 04:41:39,145 AuthCache.java:161 - (Re)initializing PermissionsCache (validity period/update interval/max entries) (2000/2000/1000)
> INFO  [OptionalTasks:1] 2019-06-05 04:41:39,729 CassandraAuthorizer.java:409 - Converting legacy permissions data
> INFO  [HANDSHAKE-/10.10.10.8] 2019-06-05 04:41:39,808 OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.8
> INFO  [HANDSHAKE-/10.10.10.9] 2019-06-05 04:41:39,808 OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.9
> INFO  [HANDSHAKE-dc1_02/10.10.10.6] 2019-06-05 04:41:39,809 OutboundTcpConnection.java:561 - Handshaking version with dc1_02/10.10.10.6
> WARN  [ReadStage-2] 2019-06-05 04:41:39,857 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-2,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: 1
>         at org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>         at org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545)
>         at org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522)
>         at org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565)
>         at org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446)
>         at org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352)
>         at org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171)
>         at org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77)
>         at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802)
>         at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953)
>         at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929)
>         at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:62)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114)
>         at java.lang.Thread.run(Thread.java:745)
>  {noformat}
>  
> After several of the above warnings, the following warning appeared as well:
>  {noformat}
> WARN  [ReadStage-9] 2019-06-05 04:42:04,369 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-9,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: null
> WARN  [ReadStage-11] 2019-06-05 04:42:04,381 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-11,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: null
> WARN  [ReadStage-10] 2019-06-05 04:42:04,396 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-10,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: null
> WARN  [ReadStage-2] 2019-06-05 04:42:04,443 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-2,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: null
>  {noformat}
>  
> Then suddenly, Validation errors appeared although *no repair was running on any of the nodes*! Checked with {{ps -ef}} command and {{nodetool compactionstats}} on the entire cluster.
>  
>  {noformat}
> ERROR [ValidationExecutor:2] 2019-06-05 04:42:47,979 Validator.java:268 - Failed creating a merkle tree for [repair #e54b4090-876d-11e9-a3f4-c33d22c45471 on ks1/table1, []], /
> 10.10.10.6 (see log for details)
> ERROR [ValidationExecutor:2] 2019-06-05 04:42:47,979 CassandraDaemon.java:228 - Exception in thread Thread[ValidationExecutor:2,1,main]
> java.lang.NullPointerException: null
>         at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1363)
>         at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:83)
>         at org.apache.cassandra.db.compaction.CompactionManager$13.call(CompactionManager.java:977)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>         at java.lang.Thread.run(Thread.java:745)
>  {noformat}
>  
> Following those, client requests started to fail and NTR tasks started to pile up and get blocked and GC was impacted.
>  {noformat}
> INFO  [ScheduledTasks:1] 2019-06-05 04:43:11,660 StatusLogger.java:51 - Native-Transport-Requests       128       197         594810        65              2725
>  {noformat}
>  
> FWIW, these are the warnings I found during startup: 
>  {noformat}
> -WARN in net.logstash.logback.encoder.LogstashEncoder@140e5a13 - Logback version is prior to 1.2.0.  Enabling backwards compatible encoding.  Logback 1.2.1 or greater is recommended.
>  {noformat}
>  
>  {noformat}
> WARN  [main] 2019-06-05 08:44:18,568 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
> WARN  [main] 2019-06-05 08:44:18,569 StartupChecks.java:136 - jemalloc shared library could not be preloaded to speed up memory allocations
>  
> WARN  [main] 2019-06-05 08:44:20,225 Optional.java:159 - Legacy auth tables credentials, users, permissions in keyspace system_auth still exist and have not been properly migrated.
> WARN  [MessagingService-Outgoing-dc1_03/10.10.10.4-Gossip] 2019-06-05 08:44:49,582 OutboundTcpConnection.java:486 - Seed gossip version is 8; will not connect with that version
> WARN  [MessagingService-Outgoing-dc2_02/10.20.20.4-Gossip] 2019-06-05 08:44:49,620 OutboundTcpConnection.java:486 - Seed gossip version is 8; will not connect with that version
> WARN  [MessagingService-Outgoing-dc2_01/10.20.20.1-Gossip] 2019-06-05 08:44:49,621 OutboundTcpConnection.java:486 - Seed gossip version is 8; will not connect with that version
> WARN  [MessagingService-Outgoing-dc2_03/10.20.20.5-Gossip] 2019-06-05 08:44:49,621 OutboundTcpConnection.java:486 - Seed gossip version is 8; will not connect with that version
> WARN  [GossipTasks:1] 2019-06-05 08:44:51,631 FailureDetector.java:278 - Not marking nodes down due to local pause of 30943606906 > 5000000000
>  {noformat}
>  
> We've naturally stopped the upgrade but we still wish to upgrade from 2.1.21 and hopefully find the root cause of this matter. 
> I'll be happy to provide additional details if needs be.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org