You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vladislav Pyatkov (JIRA)" <ji...@apache.org> on 2016/12/26 11:30:58 UTC
[jira] [Updated] (IGNITE-4491) Commutation loss between two nodes
leads to hang whole cluster
[ https://issues.apache.org/jira/browse/IGNITE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladislav Pyatkov updated IGNITE-4491:
--------------------------------------
Summary: Commutation loss between two nodes leads to hang whole cluster (was: Commutation loss between two nodes leads to hang whole cluster.)
> Commutation loss between two nodes leads to hang whole cluster
> --------------------------------------------------------------
>
> Key: IGNITE-4491
> URL: https://issues.apache.org/jira/browse/IGNITE-4491
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 1.8
> Reporter: Vladislav Pyatkov
> Priority: Critical
> Attachments: Segmentation.7z
>
>
> Reproduction steps:
> 1) Start nodes:
> {noformat}
> DC1 DC2
> 1 (10.116.172.1) 8 (10.116.64.11)
> 2 (10.116.172.2) 7 (10.116.64.12)
> 3 (10.116.172.3) 6 (10.116.64.13)
> 4 (10.116.172.4) 5 (10.116.64.14)
> {noformat}
> each node have client which run in same host with server (look source in attachment).
> 2) Drop connection
> Between 1-8,
> {noformat}
> 1 (10.116.172.1) 8 (10.116.64.11)
> {noformat}
> Drop all input and output traffic
> Invoke from 10.116.172.1
> {code}
> iptables -A INPUT -s 10.116.64.11 -j DROP
> iptables -A OUTPUT -d 10.116.64.11 -j DROP
> {code}
> Between 4-5
> {noformat}
> 4 (10.116.172.4) 5 (10.116.64.14)
> {noformat}
> Invoke from 10.116.172.4
> {code}
> iptables -A INPUT -s 10.116.64.14 -j DROP
> iptables -A OUTPUT -d 10.116.64.14 -j DROP
> {code}
> 3) Stop the grid, after several seconds
> If you are looking into logs, you can find which node was segmented (pay attention, which clients did not segmented), after drop traffic:
> {noformat}
> [12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager] Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]
> {noformat}
> And all operations stopped at the same time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)