You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Yakov Zhdanov (JIRA)" <ji...@apache.org> on 2017/08/15 16:40:00 UTC

[jira] [Created] (IGNITE-6071) Client may detect necessity for reconnect for too long

Yakov Zhdanov created IGNITE-6071:
-------------------------------------

             Summary: Client may detect necessity for reconnect for too long
                 Key: IGNITE-6071
                 URL: https://issues.apache.org/jira/browse/IGNITE-6071
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.1
            Reporter: Yakov Zhdanov


There was a GC pause on client that caused servers to drop client due to inability to establish TCP communication connection. Then it took some time for client to detect that it has been dropped. During that time client many times attempted to connect to server which can be seen in the logs. After client detected its drop and reconnected servers fired node added event and no log flood can be found any more.

We need to find out why client was reconnecting via communication and did not detect the drop for such a long time.

I hope this can be reproduced in test:
* start 2 servers
* start client
* suspend all client threads with Thread.suspend() - just filter threads of current JVM by name and suspend ones belonging to the client.

{noformat}
[10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f, addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, XXX.com/10.44.3.14:0], discPort=0, order=2, intOrder=2, lastExchangeTime=1502269008673, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
[10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB]
[10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
[10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false]
[10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=5, minorTopVer=0], evt=NODE_FAILED, node=dd71479c-41ba-443e-b25c-3803a2a94f4f]
[10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52474]
[10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52482]
[10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52506]
[10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52522]
[10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52538]
[10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52554]
[10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52570]
[10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52586]
[10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52602]
[10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52618]
[10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52634]
[10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52650]
[10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52666]

...

[10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43604]
[10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43620]
[10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43636]
[10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0], discPort=0, order=6, intOrder=5, lastExchangeTime=1502270322805, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
[10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB]
[10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null]
[10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false]
[10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=0], evt=NODE_JOINED, node=2e80b0f0-21db-451d-a264-34ba16e00ffa]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)