You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Vinogradov (JIRA)" <ji...@apache.org> on 2018/08/10 13:42:00 UTC

[jira] [Commented] (IGNITE-9053) testReentrantLockConstantTopologyChangeNonFailoverSafe can hang in case of broken tx

    [ https://issues.apache.org/jira/browse/IGNITE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576299#comment-16576299 ] 

Anton Vinogradov commented on IGNITE-9053:
------------------------------------------

Looks like we have deadlock here 

first thread waits for ack 

{noformat}
"sys-stripe-4-#226264%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%" #250984 prio=5 os_prio=0 tid=0x00007f273c018000 nid=0x2de6 waiting on condition [0x00007f274aeee000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
	at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
	at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:1168)
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:890)
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$600(CacheContinuousQueryHandler.java:85)
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onEntryUpdated(CacheContinuousQueryHandler.java:430)
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:400)
	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1079)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:652)
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:795)
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:583)
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:464)
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:505)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:942)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:821)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:777)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:99)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:191)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:189)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
	at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:496)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

It sent the event, and node received it, but failed after that.

fut can be completed only at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.DiscoveryListener on EVT_NODE_FAILED

but EVT_NODE_FAILED can't be handled since we're trying to removeExplicitNodeLocks at previous listener :( 

{noformat}
"disco-event-worker-#226410%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%" #251148 prio=5 os_prio=0 tid=0x00007f273c226000 nid=0x2e88 waiting on condition [0x00007f2672d97000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000094e6f670> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:4324)
	at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheEntry.removeExplicitNodeLocks(GridDistributedCacheEntry.java:266)
	at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.removeExplicitNodeLocks(GridCacheMvccManager.java:361)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onNodeLeft(GridDhtPartitionsExchangeFuture.java:3726)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.notifyNodeFail(GridCachePartitionExchangeManager.java:303)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.onDiscoveryEvent(GridCachePartitionExchangeManager.java:548)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$200(GridCachePartitionExchangeManager.java:140)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$1.onEvent(GridCachePartitionExchangeManager.java:281)
	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$DiscoveryListenerWrapper.onEvent(GridEventStorageManager.java:1434)
	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:873)
	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:858)
	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341)
	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307)
	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2712)
	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2929)
	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2741)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
	at java.lang.Thread.run(Thread.java:745)
{noformat}



> testReentrantLockConstantTopologyChangeNonFailoverSafe can hang in case of broken tx
> ------------------------------------------------------------------------------------
>
>                 Key: IGNITE-9053
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9053
>             Project: Ignite
>          Issue Type: Bug
>          Components: data structures
>    Affects Versions: 2.5
>            Reporter: Anton Vinogradov
>            Assignee: Anton Vinogradov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.7
>
>
> -GridCachePartitionedDataStructuresFailoverSelfTest#testReentrantLockConstantTopologyChangeNonFailoverSafe
> -GridCachePartitionedDataStructuresFailoverSelfTest#testCountDownLatchConstantTopologyChange 
> can hang in case of broken tx
> {noformat}
>  Pending transactions:
> [2018-07-15 14:13:41,210][WARN ][exchange-worker-#1596354%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%][diagnostic] >>> [txVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], exchWait=true, tx=GridDhtTxLocal [nearNodeId=1392b1bd-c807-4479-9bfe-fc9f70500000, nearFutId=14ffca0a461-999e75d0-a333-4bd6-a2a2-7f143d0af773, nearMiniId=1, nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion [topVer=143133203, order=1531653200153, nodeOrder=1], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[1968300681], recovery=false, txMap=[IgniteTxEntry [key=KeyCacheObjectImpl [part=494, val=GridCacheInternalKeyImpl [name=structure, grpName=default-volatile-ds-group], hasValBytes=true], cacheId=1968300681, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=494, val=GridCacheInternalKeyImpl [name=structure, grpName=default-volatile-ds-group], hasValBytes=true], cacheId=1968300681], val=[op=NOOP, val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[], filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[], part=494, super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=494, val=GridCacheInternalKeyImpl [name=structure, grpName=default-volatile-ds-group], hasValBytes=true], val=CacheObjectImpl [val=null, hasValBytes=true], ver=GridCacheVersion [topVer=143133201, order=1531653200154, nodeOrder=2], hash=2095426867, extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc [locs=[GridCacheMvccCandidate [nodeId=1bf28b00-feed-412b-a20b-ca9fc1100001, ver=GridCacheVersion [topVer=143133203, order=1531653200157, nodeOrder=2], threadId=1947290, id=31143709, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], reentry=null, otherNodeId=1392b1bd-c807-4479-9bfe-fc9f70500000, otherVer=GridCacheVersion [topVer=143133203, order=1531653200153, nodeOrder=1], mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=494, val=GridCacheInternalKeyImpl [name=structure, grpName=default-volatile-ds-group], hasValBytes=true], masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], rmts=null]], flags=2]]], prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=143133203, order=1531653200157, nodeOrder=2]]]], super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=143133203, order=1531653200157, nodeOrder=2], writeVer=null, implicit=false, loc=true, threadId=1947290, startTime=1531653200578, nodeId=1bf28b00-feed-412b-a20b-ca9fc1100001, startVer=GridCacheVersion [topVer=143133203, order=1531653200157, nodeOrder=2], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=true, plc=2, commitVer=null, finalizing=NONE, invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], duration=20632ms, onePhaseCommit=false], size=1]]]]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)