You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/05/23 20:18:00 UTC

[jira] [Commented] (GEODE-5186) set operation in a client transaction could cause the transaction to hang

    [ https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487971#comment-16487971 ] 

ASF subversion and git services commented on GEODE-5186:
--------------------------------------------------------

Commit d77744a4c87ddcc157584119c93fbd300410d442 in geode's branch refs/heads/develop from [~demery]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d77744a ]

GEODE-5186: set operation in a client transaction could cause the transaction to hang (#1967)

GEODE-5186: Do not retry set operation on partitioned region if in a transaction.
FetchKeyMessage will be sent to peer members as non transactional if it is sent from the transaction host member.


> set operation in a client transaction could cause the transaction to hang
> -------------------------------------------------------------------------
>
>                 Key: GEODE-5186
>                 URL: https://issues.apache.org/jira/browse/GEODE-5186
>             Project: Geode
>          Issue Type: Bug
>          Components: transactions
>    Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0
>            Reporter: Eric Shu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During an entry operation in a client transaction, server connection could be lost. In this case, client will failover to another server and try to resume the transaction and retry the operation if the original transaction host node is found. 
> If this operation happens to be a keySet operation (or other set operations) on a partitioned region, the transaction could hang due to a deadlock.
> The scenario is the original tx host node holds its transactional lock when sending fetchKey request to other nodes hosting the partitioned region data. The node on which the client transaction failed over, will hold its transactional lock while sending the FetchKey message to transaction hosting node.
> These two FetchKeyMessage will not be able to be processed as processing these tx message requires to hold the lock. But the locks are already been held by the nodes handing the client message of the transaction.
> {noformat}
> vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) state=WAITING
>         waiting to lock <ja...@453d49bb>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
>         at org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
>         at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
>         at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
>         at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4
> vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 ID=0x128(296) state=TIMED_WAITING
>         waiting to lock <ja...@226dbb4>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
>         at org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
>         at org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
>         at org.apache.geode.internal.cache.TXStateStub.getBucketKeys(TXStateStub.java:644)
>         at org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
>         at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
>         at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
>         at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@3ca60534
> java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb
> vm_0_bridge1_latvia_25064:PartitionedRegion Message Processor4 ID=0x2b8(696) state=WAITING
>         waiting to lock <ja...@33b1b785>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
>         at org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
>         at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
>         at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
>         at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@71b1b4c5
> vm_0_bridge1_latvia_25064:ServerConnection on port 24946 Thread 0 ID=0x29b(667) state=TIMED_WAITING
>         waiting to lock <ja...@41e6d28f>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
>         at org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
>         at org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
>         at org.apache.geode.internal.cache.TXState.getBucketKeys(TXState.java:1852)
>         at org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
>         at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
>         at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
>         at org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
>         at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
>         at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
>         at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785
> java.util.concurrent.ThreadPoolExecutor$Worker@51e84752
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)