You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Kalashnikov (Jira)" <ji...@apache.org> on 2021/01/25 12:29:00 UTC

[jira] [Created] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'

Anton Kalashnikov created IGNITE-14055:
------------------------------------------

             Summary: Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'
                 Key: IGNITE-14055
                 URL: https://issues.apache.org/jira/browse/IGNITE-14055
             Project: Ignite
          Issue Type: Bug
            Reporter: Anton Kalashnikov
            Assignee: Anton Kalashnikov


Cluster hangs after jvm pauses on one of server nodes.
Scenario:
1. Start three server nodes with put operations using StartServerWithTxPuts.
2. Emulate jvm freezes on one server node by running the attached script:
{{*sh freeze.sh  <jvm process pid>*}}
3. Wait until the script has finished.

Result:
The cluster hangs on tx put operations.

The first server node continuously prints:
{{{noformat}}}
{{}}{{[2020-11-03 09:36:01,719][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:01,922][INFO ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:02,124][INFO ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:02,326][INFO ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:02,528][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
{{{noformat}}}{{}}
The second node prints long running transactions in prepared state ignoring the default tx timeout:

{{{noformat}}}
{{1}}{{[2020-11-03 09:36:46,199][WARN ][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future [startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture [futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=833, extras=null, flags=0]]], prepared=0, locked=false, nodeId=07583a9d-36c8-4100-a69c-8cbd26ca82c9, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2]]], explicitLock=false, queryUpdate=false, dhtVer=null, last=false, nearEntries=0, clientFirst=false, node=07583a9d-36c8-4100-a69c-8cbd26ca82c9]], nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false, trackTimeout=false, systemTime=39312675, systemStartTime=0, prepareStartTime=340253826207680, prepareTime=39265601, commitOrRollbackStartTime=340253865473498, commitOrRollbackTime=46297, lb=null, mvccOp=false, qryId=-1, crdVer=0, thread=pool-2-thread-2, mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=833, extras=null, flags=0]]], prepared=0, locked=false, nodeId=07583a9d-36c8-4100-a69c-8cbd26ca82c9, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2]]], explicitLock=false, queryUpdate=false, dhtVer=null, last=false, nearEntries=0, clientFirst=false, node=07583a9d-36c8-4100-a69c-8cbd26ca82c9]], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false, useMvccCaching=false], super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2], writeVer=null, implicit=true, loc=true, threadId=123, startTime=1604385188496, nodeId=61733533-7c90-4383-ab98-93f41d7059f4, isolation=READ_COMMITTED, concurrency=OPTIMISTIC, timeout=10000, sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2], finalizing=NONE, invalidParts=null, state=COMMITTED, timedOut=false, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], mvccSnapshot=null, skipCompletedVers=false, parentTx=null, duration=217695ms, onePhaseCommit=false], size=1]]], commit=true, mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=833, extras=null, flags=0]]], prepared=0, locked=false, nodeId=07583a9d-36c8-4100-a69c-8cbd26ca82c9, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2]]], explicitLock=false, queryUpdate=false, dhtVer=null, last=false, nearEntries=0, clientFirst=false, node=07583a9d-36c8-4100-a69c-8cbd26ca82c9]], trackable=true, finishOnePhaseCalled=false, innerFuts=TransformCollectionView [FinishFuture[node=07583a9d-36c8-4100-a69c-8cbd26ca82c9, loc=false, done=false]], super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=AlwaysTrueReducer [], initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=TransformCollectionView [false]]]]]}}
{{{noformat}}}{{}}
The third node indicated starvation in the striped pool:
{{{noformat}}}
{{}}{{[2020-11-03 09:39:25,852][WARN ][grid-timeout-worker-#22%6d42f793-e8d7-4fc1-812a-5336f850d9ff%][G] >>> Possible starvation in striped pool.Thread name: sys-stripe-4-#5%6d42f793-e8d7-4fc1-812a-5336f850d9ff%Queue: []Deadlock: falseCompleted: 359Thread [name="sys-stripe-4-#5%6d42f793-e8d7-4fc1-812a-5336f850d9ff%", id=16, state=TIMED_WAITING, blockCnt=33, waitCnt=1229]        at sun.misc.Unsafe.park(Native Method)        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)        at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:227)        at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:168)        at o.a.i.spi.communication.tcp.internal.ConnectionClientPool.reserveClient(ConnectionClientPool.java:290)        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1146)        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1095)        at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1812)        at o.a.i.i.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1927)        at o.a.i.i.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)        at o.a.i.i.processors.cache.distributed.dht.GridDhtTxFinishFuture.finish(GridDhtTxFinishFuture.java:528)        at o.a.i.i.processors.cache.distributed.dht.GridDhtTxFinishFuture.finish(GridDhtTxFinishFuture.java:329)        at o.a.i.i.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:504)        at o.a.i.i.processors.cache.distributed.dht.GridDhtTxLocal.rollbackDhtLocalAsync(GridDhtTxLocal.java:586)        at o.a.i.i.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1102)        at o.a.i.i.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:970)        at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:925)        at o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:134)        at o.a.i.i.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:231)        at o.a.i.i.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:229)        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721)        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1328)        at o.a.i.i.managers.communication.GridIoManager.access$4600(GridIoManager.java:157)        at o.a.i.i.managers.communication.GridIoManager$8.execute(GridIoManager.java:1213)        at o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:54)        at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)        at java.lang.Thread.run(Thread.java:748)}}
{{{noformat}}}
{{}}
{{The request is sent from the timeout object processor. During this request, the connection hangs. To resolve the hanging, the handshake timeout object should be executed but it is impossible due to it awaits the finishing of the initial request.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)