You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Ilya Lantukh (JIRA)" <ji...@apache.org> on 2018/08/08 13:41:00 UTC

[jira] [Created] (IGNITE-9236) Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular)

Ilya Lantukh created IGNITE-9236:
------------------------------------

             Summary: Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular)
                 Key: IGNITE-9236
                 URL: https://issues.apache.org/jira/browse/IGNITE-9236
             Project: Ignite
          Issue Type: Bug
            Reporter: Ilya Lantukh
            Assignee: Ilya Lantukh


In GridCacheReplicatedFailoverSelfTest one thread tries to establish TCP connection and hangs on handshake forever, holding lock on RebalanceFuture:
{code}
[11:51:55] :	 [Step 3/4]     Locked synchronizers:
[11:51:55] :	 [Step 3/4]         java.util.concurrent.ThreadPoolExecutor$Worker@5b17b883
[11:51:55] :	 [Step 3/4] Thread [name="sys-#68921%new-node-topology-change-thread-1%", id=77410, state=RUNNABLE, blockCnt=3, waitCnt=0]
[11:51:55] :	 [Step 3/4]         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
[11:51:55] :	 [Step 3/4]         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
[11:51:55] :	 [Step 3/4]         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
[11:51:55] :	 [Step 3/4]         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
[11:51:55] :	 [Step 3/4]         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
[11:51:55] :	 [Step 3/4]         - locked java.lang.Object@23aaa756
[11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3647)
[11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
[11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2967)
[11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2850)
[11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2693)
[11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2652)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1643)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1750)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.GridCacheIoManager.sendOrderedMessage(GridCacheIoManager.java:1231)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cleanupRemoteContexts(GridDhtPartitionDemander.java:1111)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1041)
[11:51:55] :	 [Step 3/4]         - locked o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.lambda$null$2(GridDhtPartitionDemander.java:534)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$$Lambda$41/603501511.run(Unknown Source)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6800)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
[11:51:55] :	 [Step 3/4]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[11:51:55] :	 [Step 3/4]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[11:51:55] :	 [Step 3/4]         at java.lang.Thread.run(Thread.java:748)
{code}

Because of that, exchange worker hangs forever while trying to acquire that lock:
{code}
[11:51:55] :	 [Step 3/4] Thread [name="exchange-worker-#68894%new-node-topology-change-thread-1%", id=77379, state=BLOCKED, blockCnt=11, waitCnt=7]
[11:51:55] :	 [Step 3/4]     Lock [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150, ownerName=sys-#68921%new-node-topology-change-thread-1%, ownerId=77410]
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1033)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.addAssignments(GridDhtPartitionDemander.java:302)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPreloader.addAssignments(GridDhtPreloader.java:441)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2659)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2377)
[11:51:55] :	 [Step 3/4]         at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
[11:51:55] :	 [Step 3/4]         at java.lang.Thread.run(Thread.java:748)
{code}

Timeout is explicitly set to Integer.MAX_VALUE in the GridCacheAbstractSelfTest.getConfiguration(...) method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)