You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Gary Dusbabek (JIRA)" <ji...@apache.org> on 2010/02/08 19:49:32 UTC

[jira] Updated: (CASSANDRA-778) Gossiper thread deadlock

     [ https://issues.apache.org/jira/browse/CASSANDRA-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek updated CASSANDRA-778:
------------------------------------

    Attachment: 0001-fix-deadlock.patch

> Gossiper thread deadlock
> ------------------------
>
>                 Key: CASSANDRA-778
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-778
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>             Fix For: 0.6
>
>         Attachments: 0001-fix-deadlock.patch
>
>
> Found this while attempting to bootstrap a node with more than a trivial amount of data:
> Found one Java-level deadlock:
> =============================
> "GMFD:1":
>   waiting to lock monitor 0x0000000100861d60 (object 0x00000001066a7ed8, a org.apache.cassandra.service.StorageService),
>   which is held by "main"
> "main":
>   waiting to lock monitor 0x0000000100860710 (object 0x0000000106c7c968, a org.apache.cassandra.gms.Gossiper),
>   which is held by "GMFD:1"
> Java stack information for the threads listed above:
> ===================================================
> "GMFD:1":
> 	at org.apache.cassandra.service.StorageService.getReplicationStrategy(StorageService.java:226)
> 	- waiting to lock <0x00000001066a7ed8> (a org.apache.cassandra.service.StorageService)
> 	at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:634)
> 	at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:502)
> 	at org.apache.cassandra.service.StorageService.onChange(StorageService.java:445)
> 	at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:812)
> 	at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:607)
> 	at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:582)
> 	at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:649)
> 	- locked <0x0000000106c7c968> (a org.apache.cassandra.gms.Gossiper)
> 	at org.apache.cassandra.gms.Gossiper$GossipDigestAck2VerbHandler.doVerb(Gossiper.java:1061)
> 	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:637)
> "main":
> 	at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:861)
> 	- waiting to lock <0x0000000106c7c968> (a org.apache.cassandra.gms.Gossiper)
> 	at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:347)
> 	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:318)
> 	- locked <0x00000001066a7ed8> (a org.apache.cassandra.service.StorageService)
> 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99)
> 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:174)
> Found 1 deadlock.
> main acquires SS lock and doesn't release it before attempting to acquire the Gossiper lock.  Meanwhile, the gossip stage acquires the Gossiper lock and then attempts to acquire the SS lock.
> Solution is to have finer-grained locking on the resource in SS (map of replication strategies), or to move the collection to a different class (DD maybe?).  This was introduced in CASSANDRA-620.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.