You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "nabarun (JIRA)" <ji...@apache.org> on 2018/10/03 21:38:42 UTC

[jira] [Closed] (GEODE-5186) set operation in a client transaction could cause the transaction to hang

     [ https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nabarun closed GEODE-5186.
--------------------------

> set operation in a client transaction could cause the transaction to hang
> -------------------------------------------------------------------------
>
>                 Key: GEODE-5186
>                 URL: https://issues.apache.org/jira/browse/GEODE-5186
>             Project: Geode
>          Issue Type: Bug
>          Components: transactions
>    Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0, 1.7.0
>            Reporter: Eric Shu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During an entry operation in a client transaction, server connection could be lost. In this case, client will failover to another server and try to resume the transaction and retry the operation if the original transaction host node is found. 
> If this operation happens to be a keySet operation (or other set operations) on a partitioned region, the transaction could hang due to a deadlock.
> The scenario is the original tx host node holds its transactional lock when sending fetchKey request to other nodes hosting the partitioned region data. The node on which the client transaction failed over, will hold its transactional lock while sending the FetchKey message to transaction hosting node.
> These two FetchKeyMessage will not be able to be processed as processing these tx message requires to hold the lock. But the locks are already been held by the nodes handing the client message of the transaction.
> {noformat}
> vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) state=WAITING
>         waiting to lock <ja...@453d49bb>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
>         at org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
>         at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
>         at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
>         at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4
> vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 ID=0x128(296) state=TIMED_WAITING
>         waiting to lock <ja...@226dbb4>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
>         at org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
>         at org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
>         at org.apache.geode.internal.cache.TXStateStub.getBucketKeys(TXStateStub.java:644)
>         at org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
>         at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
>         at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
>         at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@3ca60534
> java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb
> vm_0_bridge1_latvia_25064:PartitionedRegion Message Processor4 ID=0x2b8(696) state=WAITING
>         waiting to lock <ja...@33b1b785>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
>         at org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
>         at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
>         at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
>         at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@71b1b4c5
> vm_0_bridge1_latvia_25064:ServerConnection on port 24946 Thread 0 ID=0x29b(667) state=TIMED_WAITING
>         waiting to lock <ja...@41e6d28f>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
>         at org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
>         at org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
>         at org.apache.geode.internal.cache.TXState.getBucketKeys(TXState.java:1852)
>         at org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
>         at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
>         at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
>         at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785
> java.util.concurrent.ThreadPoolExecutor$Worker@51e84752
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)