You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2015/11/25 22:53:10 UTC
[jira] [Commented] (CASSANDRA-10688) Stack overflow from SSTableReader$InstanceTidier.runOnClose in Leak Detector

    [ https://issues.apache.org/jira/browse/CASSANDRA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027675#comment-15027675 ] 

Ariel Weisberg commented on CASSANDRA-10688:
--------------------------------------------

Near as I can tell the stack overflow is being used as a bound for something that is walking an object graph looking for a path from the outgoing references of an object to itself doing a depth first search. That isn't a stack trace it's the graph that it walked (up until it overflowed). I suspect the overflow is due to the depth of the graph since it's depth first and an any moderately large linked list is going to overflow pretty quickly.

It's also using Stack which extends Vector which we should probably replace with ArrayDeque.

This is debug code that only runs if {{-Dcassandra.debugrefcount=true}} so this isn't an issue in production deployments. [~jjordan] any idea why that would be set in your experiment?

For debug purposes the code works as designed and it can recover from the stack overflow and continue searching the graph. It prunes the graph at the point where the stack overflows. The only real issue is if the error is too noisy.

I think we might want to rate limit it using the first N entries in the graph as a key. I'll put that together.

> Stack overflow from SSTableReader$InstanceTidier.runOnClose in Leak Detector
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10688
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10688
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Ariel Weisberg
>             Fix For: 3.0.1, 3.1
>
>
> Running some tests against cassandra-3.0 9fc957cf3097e54ccd72e51b2d0650dc3e83eae0
> The tests are just running cassandra-stress write and read while adding and removing nodes from the cluster.  After the test runs when I go back through logs I find the following Stackoverflow fairly often:
> ERROR [Strong-Reference-Leak-Detector:1] 2015-11-11 00:04:10,638  Ref.java:413 - Stackoverflow [private java.lang.Runnable org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.runOnClose, final java.lang.Runnable org.apache.cassandra.io.sstable.format.SSTableReader$DropPageCache.andThen, final org.apache.cassandra.cache.InstrumentingCache org.apache.cassandra.io.sstable.SSTableRewriter$InvalidateKeys.cache, private final org.apache.cassandra.cache.ICache org.apache.cassandra.cache.InstrumentingCache.map, private final com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap org.apache.cassandra.cache.ConcurrentLinkedHashCache.map, final com.googlecode.concurrentlinkedhashmap.LinkedDeque com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.evictionDeque, com.googlecode.concurrentlinkedhashmap.Linked com.googlecode.concurrentlinkedhashmap.LinkedDeque.first, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> ....... (repeated a whole bunch more) .... 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, final java.lang.Object com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.key, public final byte[] org.apache.cassandra.cache.KeyCacheKey.key



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)