You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/09/01 10:28:00 UTC

[jira] [Commented] (IGNITE-6071) Client may detect necessity for reconnect for too long

    [ https://issues.apache.org/jira/browse/IGNITE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150343#comment-16150343 ] 

ASF GitHub Bot commented on IGNITE-6071:
----------------------------------------

GitHub user alamar opened a pull request:

    https://github.com/apache/ignite/pull/2575

    IGNITE-6071 Finite number of attempts in reserveClient()

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-6071

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/2575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2575
    
----
commit 1e8e082ced9fc99b20867d05ede08e881a0c503b
Author: Ilya Kasnacheev <il...@gmail.com>
Date:   2017-09-01T10:24:52Z

    IGNITE-6071 Finite number of attempts in reserveClient()

----


> Client may detect necessity for reconnect for too long
> ------------------------------------------------------
>
>                 Key: IGNITE-6071
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6071
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Yakov Zhdanov
>            Assignee: Ilya Kasnacheev
>
> There was a GC pause on client that caused servers to drop client due to inability to establish TCP communication connection. Then it took some time for client to detect that it has been dropped. During that time client many times attempted to connect to server which can be seen in the logs. After client detected its drop and reconnected servers fired node added event and no log flood can be found any more.
> We need to find out why client was reconnecting via communication and did not detect the drop for such a long time.
> I hope this can be reproduced in test:
> * start 2 servers
> * start client
> * suspend all client threads with Thread.suspend() - just filter threads of current JVM by name and suspend ones belonging to the client.
> {noformat}
> [10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f, addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, XXX.com/10.44.3.14:0], discPort=0, order=2, intOrder=2, lastExchangeTime=1502269008673, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
> [10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB]
> [10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
> [10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false]
> [10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=5, minorTopVer=0], evt=NODE_FAILED, node=dd71479c-41ba-443e-b25c-3803a2a94f4f]
> [10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52474]
> [10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52482]
> [10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52506]
> [10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52522]
> [10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52538]
> [10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52554]
> [10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52570]
> [10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52586]
> [10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52602]
> [10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52618]
> [10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52634]
> [10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52650]
> [10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52666]
> ...
> [10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43604]
> [10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43620]
> [10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43636]
> [10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0], discPort=0, order=6, intOrder=5, lastExchangeTime=1502270322805, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
> [10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB]
> [10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
> [10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false]
> [10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=0], evt=NODE_JOINED, node=2e80b0f0-21db-451d-a264-34ba16e00ffa]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)