You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Anilkumar Gingade (JIRA)" <ji...@apache.org> on 2019/03/26 18:49:00 UTC
[jira] [Resolved] (GEODE-6526) deadlock between tombstone gc and
region destroy threads
[ https://issues.apache.org/jira/browse/GEODE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anilkumar Gingade resolved GEODE-6526.
--------------------------------------
Resolution: Fixed
Fix Version/s: 1.10.0
> deadlock between tombstone gc and region destroy threads
> --------------------------------------------------------
>
> Key: GEODE-6526
> URL: https://issues.apache.org/jira/browse/GEODE-6526
> Project: Geode
> Issue Type: Bug
> Components: regions
> Reporter: Anilkumar Gingade
> Assignee: Anilkumar Gingade
> Priority: Major
> Labels: SmallFeature
> Fix For: 1.10.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> There is a potential for tombstoneGC thread to be dead-locked with region.destroy thread due to the order of locking with RegionEntry and RegionSize lock (below stack dump).
> Scenario:
> On Member1:
> Thread1 - Destroys key1 (say region version 3)
> Thread2 - Destroys key2 (say region version 4)
> Thread3 - Starts tombstone GC (this records the tombstone gc version for member1 as 4)
> All the above three messages are sent/replicated member2; concurrently.
> On Member2:
> Thread1 - Destroys key1 (Region version 1 for this member)
> On Member3:
> – The destroy of key2 finishes first; and
> – The destroy for k1 from member2, this creates a Tombstone and adds it Tombstone Queue (for gc)
> – The destroy of key1 from member1, under region-entry lock updates the entry version (which is in tombstone queue); and checks to see if its region version is smaller than the recorded gc version and calls tombstone removal, which tries to take region-size lock (held by tombstone gc thread).
> – And concurrently the tombstoneGC message gets processed; this will record the GC versions to be 4 for member2 and collects "key1"s region entry for removal. While removal this takes a region-size lock and tries to take region-entry lock (waits for lock).
> The above action from destroy and tomstone-gc threads results in deadlock.
> The solution is to, not remove the tombstone during region.destroy; this will be removed as part the next tombstoneGC processing.
>
>
> {noformat}
> Found one Java-level deadlock:
> =============================
> "Pooled Message Processor 118":
> waiting to lock monitor 0x00007f02bdcfd0a8 (object 0x00007f0a569075d8, a java.lang.Object),
> which is held by "Pooled Message Processor 24"
> "Pooled Message Processor 24":
> waiting to lock monitor 0x00007f02bd8f99d8 (object 0x00007f099bb69270, a org.apache.geode.internal.cache.entries.VersionedThinDiskLRURegionEntryHeapObjectKey),
> which is held by "P2P message reader for 169.84.85.56(psin9p197_cache2:58016)<v3>:1026 shared ordered uid=9 port=32780"
> "P2P message reader for 169.84.85.56(psin9p197_cache2:58016)<v3>:1026 shared ordered uid=9 port=32780":
> waiting to lock monitor 0x00007f02bd8f9928 (object 0x00007f11f8f27d18, a java.lang.String),
> which is held by "Pooled Message Processor 24"
> Java stack information for the threads listed above:
> ===================================================
> "Pooled Message Processor 118":
> at org.apache.geode.internal.cache.TombstoneService.gcTombstones(TombstoneService.java:209)
> - waiting to lock <0x00007f0a569075d8> (a java.lang.Object)
> at org.apache.geode.internal.cache.LocalRegion.expireTombstones(LocalRegion.java:3293)
> at org.apache.geode.internal.cache.DistributedTombstoneOperation$TombstoneMessage.operateOnRegion(DistributedTombstoneOperation.java:169)
> at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1191)
> at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1091)
> at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
> at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
> at org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
> at org.apache.geode.distributed.internal.ClusterDistributionManager$4$1.run(ClusterDistributionManager.java:791)
> at java.lang.Thread.run(Thread.java:748)
> "Pooled Message Processor 24":
> at org.apache.geode.internal.cache.AbstractRegionMap.removeTombstone(AbstractRegionMap.java:3321)
> - waiting to lock <0x00007f099bb69270> (a org.apache.geode.internal.cache.entries.VersionedThinDiskLRURegionEntryHeapObjectKey)
> - locked <0x00007f11f8f27d18> (a java.lang.String)
> at org.apache.geode.internal.cache.TombstoneService.gcTombstones(TombstoneService.java:259)
> - locked <0x00007f0a569075d8> (a java.lang.Object)
> at org.apache.geode.internal.cache.LocalRegion.expireTombstones(LocalRegion.java:3293)
> at org.apache.geode.internal.cache.DistributedTombstoneOperation$TombstoneMessage.operateOnRegion(DistributedTombstoneOperation.java:169)
> at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1191)
> at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1091)
> at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
> at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
> at org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
> at org.apache.geode.distributed.internal.ClusterDistributionManager$4$1.run(ClusterDistributionManager.java:791)
> at java.lang.Thread.run(Thread.java:748)
> "P2P message reader for 169.84.85.56(psin9p197_cache2:58016)<v3>:1026 shared ordered uid=9 port=32780":
> at org.apache.geode.internal.cache.AbstractRegionMap.removeTombstone(AbstractRegionMap.java:3320)
> - waiting to lock <0x00007f11f8f27d18> (a java.lang.String)
> at org.apache.geode.internal.cache.entries.AbstractRegionEntry.makeTombstone(AbstractRegionEntry.java:273)
> at org.apache.geode.internal.cache.entries.AbstractRegionEntry.destroy(AbstractRegionEntry.java:904)
> at org.apache.geode.internal.cache.map.RegionMapDestroy.destroyEntry(RegionMapDestroy.java:723)
> at org.apache.geode.internal.cache.map.RegionMapDestroy.destroyExistingEntry(RegionMapDestroy.java:387)
> at org.apache.geode.internal.cache.map.RegionMapDestroy.handleExistingRegionEntry(RegionMapDestroy.java:238)
> - locked <0x00007f099bb69270> (a org.apache.geode.internal.cache.entries.VersionedThinDiskLRURegionEntryHeapObjectKey)
> at org.apache.geode.internal.cache.map.RegionMapDestroy.destroy(RegionMapDestroy.java:149)
> at org.apache.geode.internal.cache.AbstractRegionMap.destroy(AbstractRegionMap.java:1093)
> at org.apache.geode.internal.cache.LocalRegion.mapDestroy(LocalRegion.java:6504)
> at org.apache.geode.internal.cache.LocalRegion.mapDestroy(LocalRegion.java:6478)
> at org.apache.geode.internal.cache.LocalRegionDataView.destroyExistingEntry(LocalRegionDataView.java:56)
> at org.apache.geode.internal.cache.LocalRegion.basicDestroy(LocalRegion.java:6430)
> at org.apache.geode.internal.cache.DistributedRegion.basicDestroy(DistributedRegion.java:1599)
> at org.apache.geode.internal.cache.DestroyOperation$DestroyMessage.operateOnRegion(DestroyOperation.java:87)
> at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1191)
> at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1091)
> at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
> at org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:436)
> at org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:3250)
> at org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2912)
> at org.apache.geode.distributed.internal.ClusterDistributionManager.access$1500(ClusterDistributionManager.java:109)
> at org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:4038)
> at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1120)
> at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1039)
> at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:402)
> at org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:731)
> at org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:868)
> at org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3965)
> at org.apache.geode.internal.tcp.Connection.runOioReader(Connection.java:2112)
> at org.apache.geode.internal.tcp.Connection.run(Connection.java:1690)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)