You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ilya Kasnacheev (Jira)" <ji...@apache.org> on 2021/03/30 12:12:00 UTC

[jira] [Commented] (IGNITE-14445) "Remote node does not observe current" after failure by not receiving metrics from client

    [ https://issues.apache.org/jira/browse/IGNITE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311465#comment-17311465 ] 

Ilya Kasnacheev commented on IGNITE-14445:
------------------------------------------

I have created a reproducer for this behavior which will simulate client segmentation due to metrics update. Apply the patch (attached) and run some client-using test, such as:

mvn surefire:test -DIGNITE_STOP_NODE=100 -Dtest=IgniteCacheManyClientsTest\#testManyClientsClientDiscovery -pl :ignite-core

You can use different values with IGNITE_STOP_NODE, such as 0, 20, 50, 100 - to segment a client node on various stages of life cycle.

> "Remote node does not observe current" after failure by not receiving metrics from client
> -----------------------------------------------------------------------------------------
>
>                 Key: IGNITE-14445
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14445
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.9.1
>            Reporter: Ilya Kasnacheev
>            Priority: Major
>         Attachments: ignite-server-impl.patch
>
>
> A server node might fail a client node due to pauses in the network connection:
> [15:07:16,330][WARNING][tcp-disco-msg-worker-[11cf0c06 10.212.120.71:57500 crd]-#2%hh_DynamicGrid_v2%][TcpDiscoverySpi] Failing client node due to not receiving metrics updates from client node within 'IgniteConfiguration.clientFailureDetectionTimeout' (consider increasing configuration property) [timeout=120000, node=TcpDiscoveryNode [id=9dbcfb86-a60e-4382-904f-57bffbe18c5c,consistentId=73B5811B-9644-48FD-A533-B4609FDAD591, addrs=ArrayList [10.212.120.190], sockAddrs=HashSet [VWNV02AX07080.HH.com/10.212.120.190:0], discPort=0, order=488, intOrder=248, lastExchangeTime=1612397142960, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true]]
> Then, the client node will never understand that it is dropped by cluster and will be endlessly trying to connect. I'm not sure what does discovery do on the client node:
> {code}
> [15:07:42,689][SEVERE][Thread-219][TcpCommunicationSpi] Failed to send message to remote node [node=TcpDiscoveryNode [id=83fd7c70-839d-46ca-969f-bbb9661d6ab2, consistentId=127.1.1.1:57500, addrs=ArrayList [127.1.1.1], sockAddrs=HashSet [test.com/127.1.1.1:57500], discPort=57500, order=1, intOrder=1, lastExchangeTime=1612397256785, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicFullUpdateRequest [keys=ArrayList [UserKeyCacheObjectImpl [part=292, val=TestModel:TEST|bbf4da4d-c3d7-4b46-98b6-0de70c30f668, hasValBytes=true]], conflictTtls=null, conflictExpireTimes=null, expiryPlc=org.apache.ignite.internal.processors.platform.cache.expiry.PlatformExpiryPolicy@3fb1b76e, initSize=1, filter=null, parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=keepBinary]]]]
> class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Remote node does not observe current node in topology : 83fd7c70-839d-46ca-969f-bbb9661d6ab2
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
> at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
> at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1296)
> at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:312)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)