You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Andrey Gura (JIRA)" <ji...@apache.org> on 2016/12/07 00:22:59 UTC

[jira] [Comment Edited] (IGNITE-4003) Slow or faulty client can stall the whole cluster.

    [ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727182#comment-15727182 ] 

Andrey Gura edited comment on IGNITE-4003 at 12/7/16 12:22 AM:
---------------------------------------------------------------

Outgoing connection establishing reimplemented in asynchronous manner, so user thread should not be blocked. Most changes related with {{TcpCommunicationSpi}} and {{GridNioServer}} classes. 

Handshake functionality ({{TcpCommunicationSpi.safeHandshake()}} method) was keep for shmem IPC but rewritten for real network communication. For the last case it was moved to server listener's {{onMessage}} and {{onFirstMessage}} methods.

Also, now communication worker makes additional job in order to provide session management in async way.

See [PR #1318|https://github.com/apache/ignite/pull/1318]


was (Author: agura):
Outgoing connection establishing reimplemented in asynchronous manner, so user thread should not be blocked. Most changes related with {{TcpCommunicationSpi}} and {{GridNioServer}} classes. 

Handshake functionality ({{TcpCommunicationSpi.safeHandshake()}} method) was keep for shmem IPC but rewritten for real network communication. For the last case it was moved to server listener's {{onMessage}} and {{onFirstMessageListener}}.

Also, now communication worker makes additional job in order to provide session management in async way.

See [PR #1318|https://github.com/apache/ignite/pull/1318]

> Slow or faulty client can stall the whole cluster.
> --------------------------------------------------
>
>                 Key: IGNITE-4003
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4003
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, general
>    Affects Versions: 1.7
>            Reporter: Vladimir Ozerov
>            Assignee: Andrey Gura
>            Priority: Critical
>             Fix For: 2.0
>
>
> Steps to reproduce:
> 1) Start two server nodes and some data to cache.
> 2) Start a client from Docker subnet, which is not visible from the outside. Client will join the cluster.
> 3) Try to put something to cache or start another node to force rabalance.
> Cluster is stuck at this moment. Root cause - servers are constantly trying to establish outgoing connection to the client, but fail as Docker subnet is not visible from the outside. It may stop virtually all cluster operations.
> Typical thread dump:
> {code}
> org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, isClient=true], topic=T4 [topic=TOPIC_CACHE, id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, data=null, futId=null], policy=2]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:202) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:200) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:877) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:859) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799) [ignite-core-1.5.23.jar:1.5.23]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_51]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_51]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
> Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, isClient=true]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1986) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124) [ignite-core-1.5.23.jar:1.5.23]
> 	... 32 common frames omitted
> Caused by: org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each GridComputeTask and GridCacheTransaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[/172.17.0.6:47100, /127.0.0.1:47100]]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2489) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2130) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2024) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1960) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) [ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1213) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onResult(GridDhtLockFuture.java:529) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processDhtLockResponse(GridDhtTransactionalCacheAdapter.java:639) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$100(GridDhtTransactionalCacheAdapter.java:89) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:151) ~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:149) ~[ignite-core-1.5.23.jar:1.5.23]
> 	... 12 common frames omitted
> 	Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect to address: /172.17.0.6:47100
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494) ~[ignite-core-1.5.23.jar:1.5.23]
> 		... 35 common frames omitted
> 	Caused by: java.net.SocketTimeoutException: null
> 		at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2353)
> 		... 35 common frames omitted
> 	Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect to address: /127.0.0.1:47100
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494) ~[ignite-core-1.5.23.jar:1.5.23]
> 		... 35 common frames omitted
> 	Caused by: org.apache.ignite.IgniteCheckedException: Remote node ID is not as expected [expected=a15d74c2-1ec2-4349-9640-aeacd70d8714, rcvd=48cccf25-7c29-4048-bd52-704acdb552e6]
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2604)
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2361)
> 		... 35 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)