You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/08/08 13:51:00 UTC

[jira] [Commented] (IGNITE-9236) Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular)

    [ https://issues.apache.org/jira/browse/IGNITE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573252#comment-16573252 ] 

ASF GitHub Bot commented on IGNITE-9236:
----------------------------------------

GitHub user ilantukh opened a pull request:

    https://github.com/apache/ignite/pull/4499

    IGNITE-9236 : Removed setting failureDetectionTimeout to Integer.MAX_VALUE in tests.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-9236

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/4499.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4499
    
----
commit 1106650a21386f24cc5b1cc9087883779385a47e
Author: Ilya Lantukh <il...@...>
Date:   2018-08-08T13:49:42Z

    IGNITE-9236 : Removed setting failureDetectionTimeout to Integer.MAX_VALUE in tests.

----


> Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-9236
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9236
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ilya Lantukh
>            Assignee: Ilya Lantukh
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>
> In GridCacheReplicatedFailoverSelfTest one thread tries to establish TCP connection and hangs on handshake forever, holding lock on RebalanceFuture:
> {code}
> [11:51:55] :	 [Step 3/4]     Locked synchronizers:
> [11:51:55] :	 [Step 3/4]         java.util.concurrent.ThreadPoolExecutor$Worker@5b17b883
> [11:51:55] :	 [Step 3/4] Thread [name="sys-#68921%new-node-topology-change-thread-1%", id=77410, state=RUNNABLE, blockCnt=3, waitCnt=0]
> [11:51:55] :	 [Step 3/4]         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> [11:51:55] :	 [Step 3/4]         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> [11:51:55] :	 [Step 3/4]         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> [11:51:55] :	 [Step 3/4]         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> [11:51:55] :	 [Step 3/4]         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> [11:51:55] :	 [Step 3/4]         - locked java.lang.Object@23aaa756
> [11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3647)
> [11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
> [11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2967)
> [11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2850)
> [11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2693)
> [11:51:55] :	 [Step 3/4]         at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2652)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1643)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1750)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.GridCacheIoManager.sendOrderedMessage(GridCacheIoManager.java:1231)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cleanupRemoteContexts(GridDhtPartitionDemander.java:1111)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1041)
> [11:51:55] :	 [Step 3/4]         - locked o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.lambda$null$2(GridDhtPartitionDemander.java:534)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$$Lambda$41/603501511.run(Unknown Source)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6800)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
> [11:51:55] :	 [Step 3/4]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [11:51:55] :	 [Step 3/4]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [11:51:55] :	 [Step 3/4]         at java.lang.Thread.run(Thread.java:748)
> {code}
> Because of that, exchange worker hangs forever while trying to acquire that lock:
> {code}
> [11:51:55] :	 [Step 3/4] Thread [name="exchange-worker-#68894%new-node-topology-change-thread-1%", id=77379, state=BLOCKED, blockCnt=11, waitCnt=7]
> [11:51:55] :	 [Step 3/4]     Lock [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150, ownerName=sys-#68921%new-node-topology-change-thread-1%, ownerId=77410]
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1033)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.addAssignments(GridDhtPartitionDemander.java:302)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPreloader.addAssignments(GridDhtPreloader.java:441)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2659)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2377)
> [11:51:55] :	 [Step 3/4]         at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
> [11:51:55] :	 [Step 3/4]         at java.lang.Thread.run(Thread.java:748)
> {code}
> Timeout is explicitly set to Integer.MAX_VALUE in the GridCacheAbstractSelfTest.getConfiguration(...) method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)