You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Chris Goffinet (JIRA)" <ji...@apache.org> on 2009/09/01 06:15:32 UTC

[jira] Created: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Race condition with ConcurrentLinkedHashMap
-------------------------------------------

                 Key: CASSANDRA-405
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.4
            Reporter: Chris Goffinet


We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Ben Manes (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750257#action_12750257 ] 

Ben Manes commented on CASSANDRA-405:
-------------------------------------

When performing a postmortem on this issue, please review how the ConcurrentLinkedHashMap was added.  The project page stated:

> Note: The algorithm needs further testing and is not deemed production ready. It is functional under concurrent tests, but needs additional load testing to assert correctness.

That load testing, provided in the standard unit test runs, uncovered the issue and thus it was not promoted to a release status.  I haven't had time in the last few months to work on this project, but even the last check-in notes that its leaving debug code to help resolve it later.  The project states on the front page and FAQ that the goal is more educational than formal usage, hence I avoided known algorithms (which would be the correct approach if it was work-related).

The ConcurrentLRUCache uses a watermark approach which is valid, but suffers from stampeding and is an offline algorithm.  Its still an excellent approach and one of many possibilities described in the FAQ.  I am personally a fan of soft-reference based caching for global data, which is evicted in LRU order, because it allows the GC to manage what it does best (memory!) and promotes not overburdening the application server.

Please treat this as an issue where the blame is both 3p as I did not stress heavily enough not to use this in production and internal for not evaluating a 3p project enough to recognize that it warned about its production status.  I will update the project page to better communicate and provide a performant modification that is thread-safe for those that need a solution.  Please re-evaluate your own internal processes to determine why the bad call was made.

I am not trying to shift blame, but my pet peeve is when firefighting production and no one learns because then it just happens again.  Its very frustrating, even more so if I actually work there! ;-)

Cheers!
Ben

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-405:
-------------------------------------

    Attachment: stack.log.gz

Our stack trace from the running node in question. 

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>         Attachments: stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749758#action_12749758 ] 

Sammy Yu commented on CASSANDRA-405:
------------------------------------

In this stack trace there are two sets of threads that are stuck in the iterator
pool-1-thread-{63,62,61,59,58,54,53,51,49,47} and ROW-READ-STAGE{8,7,5,4,3,2,1}:
"ROW-READ-STAGE:8" prio=10 tid=0x00007f1b78b52000 nid=0x1945 runnable [0x0000000046532000]
   java.lang.Thread.State: RUNNABLE
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$Node.appendToTail(ConcurrentLinkedHashMap.java:536)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.putIfAbsent(ConcurrentLinkedHashMap.java:281)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.put(ConcurrentLinkedHashMap.java:256)
	at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
	at org.apache.cassandra.db.filter.SSTableNamesIterator. (SSTableNamesIterator.java:46)
	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:69)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1445)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1398)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.Table.getRow(Table.java:589)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

ROW-READ-STAGE6 is a little different:
"ROW-READ-STAGE:6" prio=10 tid=0x00007f1b78b4e000 nid=0x1943 runnable [0x0000000046330000]
   java.lang.Thread.State: RUNNABLE
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$Node.appendToTail(ConcurrentLinkedHashMap.java:540)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.putIfAbsent(ConcurrentLinkedHashMap.java:281)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.put(ConcurrentLinkedHashMap.java:256)
	at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
	at org.apache.cassandra.db.filter.SSTableNamesIterator. (SSTableNamesIterator.java:46)
	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:69)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1445)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1398)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.Table.getRow(Table.java:589)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>         Attachments: stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755670#action_12755670 ] 

Jonathan Ellis commented on CASSANDRA-405:
------------------------------------------

CASSANDRA-423 is the cache-for-0.5 ticket.

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Ben Manes (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750535#action_12750535 ] 

Ben Manes commented on CASSANDRA-405:
-------------------------------------

No "pissy"-ness, and I agree with you.  I've just been in too many firefighting sessions where the result is moving on and hitting the same problem 3 months later, and then 3 months later again.  People are usually so charged up fixing the issue that I find, perhaps incorrectly, that you have to be blunt to get their attention.  So no worries on my side. :-)

But yep, definately my fault for a good chunk of this.  Sorry for the inconvenience.

Cheers,
Ben

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750999#action_12750999 ] 

Hudson commented on CASSANDRA-405:
----------------------------------

Integrated in Cassandra #186 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/186/])
    remove buggy concurrentlinkedhashmap library and lru cache.  too late in 0.4 to try to debug the library -- will revisit for 0.5.  patch by jbellis; reviewed by Chris Goffinet for 


> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750242#action_12750242 ] 

Chris Goffinet commented on CASSANDRA-405:
------------------------------------------

+1 This patch looks good. Let's remove from 0.4, and work on a better implementation on 0.5

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750461#action_12750461 ] 

Jonathan Ellis commented on CASSANDRA-405:
------------------------------------------

Actually, I did read your svn logs enough to note that the bug referenced on the front page found by the ehcache people was fixed.  Too bad I missed the other, but it's not the end of the world.

In general, I suggest not getting all pissy with the people actually performing the "additional load testing" you claim is needed.  Early adopters are an important asset. :P

(Note that this bug is against a beta version of Cassandra, so at the lowest level the process worked: the bug was uncovered before the final release.  Although if the front page of CLRU weren't so out of date, or there had actually been an issue in the tracker for a bug known for months, the pain could have been avoided.)

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750249#action_12750249 ] 

Sammy Yu commented on CASSANDRA-405:
------------------------------------

Yes we have this running fine in production now.  Mentioned to jbellis other Concurrent LRU cache implementation:
http://svn.apache.org/viewvc/lucene/solr/trunk/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java?view=log
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/search/FastLRUCache.java?view=log
that we could use in 0.5

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-405:
-------------------------------------

    Attachment: 405.patch

remove LRU key cache from 0.4; the ConcurrentLinkedHashMap library is buggy (see http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=9)

I will revisit some kind of key cache early in 0.5.

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749755#action_12749755 ] 

Jonathan Ellis commented on CASSANDRA-405:
------------------------------------------

have you checked the other nodes for stuck threads like this?

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>         Attachments: stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749758#action_12749758 ] 

Sammy Yu edited comment on CASSANDRA-405 at 8/31/09 9:59 PM:
-------------------------------------------------------------

We did multiple stack dumps they are all the same.  There is no progression. 

In this stack trace there are two sets of stack traces
pool-1-thread-{63,62,61,59,58,54,53,51,49,47} and ROW-READ-STAGE{8,7,5,4,3,2,1}:
"ROW-READ-STAGE:8" prio=10 tid=0x00007f1b78b52000 nid=0x1945 runnable [0x0000000046532000]
   java.lang.Thread.State: RUNNABLE
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$Node.appendToTail(ConcurrentLinkedHashMap.java:536)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.putIfAbsent(ConcurrentLinkedHashMap.java:281)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.put(ConcurrentLinkedHashMap.java:256)
	at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
	at org.apache.cassandra.db.filter.SSTableNamesIterator. (SSTableNamesIterator.java:46)
	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:69)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1445)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1398)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.Table.getRow(Table.java:589)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

ROW-READ-STAGE6 is a little different:
"ROW-READ-STAGE:6" prio=10 tid=0x00007f1b78b4e000 nid=0x1943 runnable [0x0000000046330000]
   java.lang.Thread.State: RUNNABLE
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$Node.appendToTail(ConcurrentLinkedHashMap.java:540)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.putIfAbsent(ConcurrentLinkedHashMap.java:281)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.put(ConcurrentLinkedHashMap.java:256)
	at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
	at org.apache.cassandra.db.filter.SSTableNamesIterator. (SSTableNamesIterator.java:46)
	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:69)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1445)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1398)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.Table.getRow(Table.java:589)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

All the other pool-1-threads are in the readLock waiting state due to the writeLock
"pool-1-thread-12425" prio=10 tid=0x00007f1b7857e000 nid=0x3fe6 waiting on condition [0x00007f1892528000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  0x00007f1b8534e848> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:877)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1197)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1412)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1398)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.Table.getRow(Table.java:589)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
	at org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:609)
	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:320)
	at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:92)
	at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:173)
	at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:213)
	at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:551)
	at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:539)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

      was (Author: sammy.yu):
    In this stack trace there are two sets of threads that are stuck in the iterator
pool-1-thread-{63,62,61,59,58,54,53,51,49,47} and ROW-READ-STAGE{8,7,5,4,3,2,1}:
"ROW-READ-STAGE:8" prio=10 tid=0x00007f1b78b52000 nid=0x1945 runnable [0x0000000046532000]
   java.lang.Thread.State: RUNNABLE
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$Node.appendToTail(ConcurrentLinkedHashMap.java:536)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.putIfAbsent(ConcurrentLinkedHashMap.java:281)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.put(ConcurrentLinkedHashMap.java:256)
	at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
	at org.apache.cassandra.db.filter.SSTableNamesIterator. (SSTableNamesIterator.java:46)
	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:69)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1445)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1398)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.Table.getRow(Table.java:589)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

ROW-READ-STAGE6 is a little different:
"ROW-READ-STAGE:6" prio=10 tid=0x00007f1b78b4e000 nid=0x1943 runnable [0x0000000046330000]
   java.lang.Thread.State: RUNNABLE
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$Node.appendToTail(ConcurrentLinkedHashMap.java:540)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.putIfAbsent(ConcurrentLinkedHashMap.java:281)
	at com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.put(ConcurrentLinkedHashMap.java:256)
	at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
	at org.apache.cassandra.db.filter.SSTableNamesIterator. (SSTableNamesIterator.java:46)
	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:69)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1445)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1398)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
	at org.apache.cassandra.db.Table.getRow(Table.java:589)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
  
> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>         Attachments: stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.