You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Xihui He <xi...@gmail.com> on 2015/12/04 17:43:59 UTC

Infinite loop in SliceQueryFilter

Dear All,

Recently one of node in our cluster has high cpu load ~100%. It seems to me there is a infinite loop in SliceQueryFilter.

The below log is repeated in 5000ms (range_request_timeout_in_ms).
TRACE [SharedPool-Worker-11] 2015-12-04 19:25:33,418 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:b608719e728d11e5812b57f4c5416142:false:62@1444838867382000

Our version is 2.19. Here is the bt:
org.apache.cassandra.db.composites.AbstractComposite.isEmpty(AbstractComposite.java:30)
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:76)
org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52)
org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:173)
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264)
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:108)
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:314)
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2033)
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1876)
org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357)
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
java.lang.Thread.run(Thread.java:745)

Appreciate if anyone could help.

Thanks in advance,
Xihui

Re: Infinite loop in SliceQueryFilter

Posted by Xihui He <xi...@gmail.com>.
Hi Chris,

cfstats shows large partitions.

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             5.00             60.00            770.00               924                17
75%             7.00             86.00           1109.00              1109                17
95%            14.00            103.00           4768.00              2759                17
98%            17.00            124.00           6866.00              5722                17
99%            17.00            149.00          20501.00              8239                17
Min             0.00              6.00             25.00                21                 0
Max            20.00          24601.00        5839588.00          30130992            126934

We use primary key in most our query, however I still don’t figure out the query patter hanging.
“Images" is a list of url in our tables. It seems to me that the cell has huge image urls, but I can’t find the key of that row.
Below is a piece of logs, which shows it repeatedly collected images with the same timestamp and increasing uuid in several seconds…

Many Thanks,
Xihui

TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,574 SliceQueryFilter.java:269 - collecting 0 of 2147483647: :false:0@1449255672527000!86400000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,575 SliceQueryFilter.java:269 - collecting 1 of 2147483647: author:false:31@1448270734894000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,576 SliceQueryFilter.java:269 - collecting 1 of 2147483647: content:true:4@1448270734894000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,576 SliceQueryFilter.java:269 - collecting 1 of 2147483647: created_at:false:8@1448270734894000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,577 SliceQueryFilter.java:269 - collecting 1 of 2147483647: excerpt:false:0@1448270734894000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,125 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:f676b9416e6b11e5989c57f4c5416142:false:39@1444384568020000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,125 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84072dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,126 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84172dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,127 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84272dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,127 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84372dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,128 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84472dc11e589dd1f1c8f5df499:false:58@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,129 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84572dc11e589dd1f1c8f5df499:false:58@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,129 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84672dc11e589dd1f1c8f5df499:false:58@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,130 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84772dc11e589dd1f1c8f5df499:false:58@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,131 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84872dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,131 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84972dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,132 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84a72dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,132 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84b72dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,133 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84c72dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,133 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84d72dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,134 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84e72dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,134 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84f72dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,135 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85072dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,135 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85172dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,136 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85272dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,136 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85372dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,137 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85472dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,137 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85572dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,138 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85672dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,138 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85772dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,139 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85872dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,139 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85972dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,140 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85a72dc11e589dd1f1c8f5df499:false:39@1444872636831000
TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,140 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85b72dc11e589dd1f1c8f5df499:false:39@1444872636831000


> On Dec 5, 2015, at 4:33 AM, Chris Lohfink <cl...@gmail.com> wrote:
> 
> May just be going over a lot of data. Does output of 'nodetool cfstats' show large partitions? (partition maximum bytes). "collecting 1 of 2147483647" is suspicious. Are your queries using ALLOW FILTERING or have very high limits? If trying to read 2 billion entries in 1 query you will have memory issues. May want to check with jvmtop/htop to make sure its not GCs using CPU as well. Is there a sane amount of sstables? Providing some more details can help (cfstats, cfhistograms, queries your making, schema)
> 
> Chris
> 
> On Fri, Dec 4, 2015 at 10:43 AM, Xihui He <xihuihe@gmail.com <ma...@gmail.com>> wrote:
> Dear All,
> 
> Recently one of node in our cluster has high cpu load ~100%. It seems to me there is a infinite loop in SliceQueryFilter.
> 
> The below log is repeated in 5000ms (range_request_timeout_in_ms).
> TRACE [SharedPool-Worker-11] 2015-12-04 19:25:33,418 SliceQueryFilter.java:269 - collecting 1 of 2147483647 <tel:2147483647>: images:b608719e728d11e5812b57f4c5416142:false:62@1444838867382000
> 
> Our version is 2.19. Here is the bt:
> org.apache.cassandra.db.composites.AbstractComposite.isEmpty(AbstractComposite.java:30)
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:76)
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52)
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46)
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:173)
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264)
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:108)
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:314)
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2033)
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1876)
> org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357)
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> java.lang.Thread.run(Thread.java:745)
> 
> Appreciate if anyone could help.
> 
> Thanks in advance,
> Xihui
> 


Re: Infinite loop in SliceQueryFilter

Posted by Chris Lohfink <cl...@gmail.com>.
May just be going over a lot of data. Does output of 'nodetool cfstats'
show large partitions? (partition maximum bytes). "collecting 1 of 2147483647"
is suspicious. Are your queries using ALLOW FILTERING or have very high
limits? If trying to read 2 billion entries in 1 query you will have memory
issues. May want to check with jvmtop/htop to make sure its not GCs using
CPU as well. Is there a sane amount of sstables? Providing some more
details can help (cfstats, cfhistograms, queries your making, schema)

Chris

On Fri, Dec 4, 2015 at 10:43 AM, Xihui He <xi...@gmail.com> wrote:

> Dear All,
>
> Recently one of node in our cluster has high cpu load ~100%. It seems to
> me there is a infinite loop in SliceQueryFilter.
>
> The below log is repeated in 5000ms (range_request_timeout_in_ms).
> TRACE [SharedPool-Worker-11] 2015-12-04 19:25:33,418
> SliceQueryFilter.java:269 - collecting 1 of 2147483647:
> images:b608719e728d11e5812b57f4c5416142:false:62@1444838867382000
>
> Our version is 2.19. Here is the bt:
>
> org.apache.cassandra.db.composites.AbstractComposite.isEmpty(AbstractComposite.java:30)
>
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:76)
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52)
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46)
>
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
>
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
>
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:173)
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
>
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
>
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
>
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
>
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264)
>
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:108)
>
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
>
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
>
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:314)
>
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2033)
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1876)
> org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357)
>
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>
> java.lang.Thread.run(Thread.java:745)
>
> Appreciate if anyone could help.
>
> Thanks in advance,
> Xihui
>