You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Vladislav Pyatkov (JIRA)" <ji...@apache.org> on 2016/12/26 11:30:58 UTC

[jira] [Updated] (IGNITE-4491) Commutation loss between two nodes leads to hang whole cluster

     [ https://issues.apache.org/jira/browse/IGNITE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladislav Pyatkov updated IGNITE-4491:
--------------------------------------
    Summary: Commutation loss between two nodes leads to hang whole cluster  (was: Commutation loss between two nodes leads to hang whole cluster.)

> Commutation loss between two nodes leads to hang whole cluster
> --------------------------------------------------------------
>
>                 Key: IGNITE-4491
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4491
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 1.8
>            Reporter: Vladislav Pyatkov
>            Priority: Critical
>         Attachments: Segmentation.7z
>
>
> Reproduction steps:
> 1) Start nodes:
> {noformat}
> DC1                       DC2
> 1 (10.116.172.1)      8 (10.116.64.11)
> 2 (10.116.172.2)      7 (10.116.64.12)
> 3 (10.116.172.3)      6 (10.116.64.13)
> 4 (10.116.172.4)      5 (10.116.64.14)
> {noformat}
> each node have client which run in same host with server (look source in attachment).
> 2) Drop connection
> Between 1-8,
> {noformat}
> 1 (10.116.172.1)      8 (10.116.64.11)
> {noformat}
> Drop all input and output traffic
> Invoke from 10.116.172.1
> {code}
> iptables -A INPUT -s 10.116.64.11 -j DROP
> iptables -A OUTPUT -d 10.116.64.11 -j DROP
> {code}
> Between  4-5
> {noformat}
> 4 (10.116.172.4)      5 (10.116.64.14)
> {noformat}
> Invoke from 10.116.172.4
> {code}
> iptables -A INPUT -s 10.116.64.14 -j DROP
> iptables -A OUTPUT -d 10.116.64.14 -j DROP
> {code}
> 3) Stop the grid, after several seconds
> If you are looking into logs, you can find which node was segmented (pay attention, which clients did not segmented), after drop traffic:
> {noformat}
> [12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager] Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]
> {noformat}
> And all operations stopped at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)