You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Lokesh Sharma <lo...@gmail.com> on 2018/08/28 14:39:08 UTC

What happens during network failure?

What is the expected behaviour of Ignite when a node is unreachable from
other nodes of the cluster due to *network failure *(and not due to node
failure), and after a while the node is again accessible from the cluster?
In other words, one node gets dropped from the cluster due to network
failure and then again joins the cluster but it does not reboots.

Re: What happens during network failure?

Posted by luqmanahmad <lu...@gmail.com>.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: What happens during network failure?

Posted by Lokesh Sharma <lo...@gmail.com>.
Luqman, I understand what you mean now. I need to use a
SegmentationResolver which is implemented in your plugin which will
generate EVT_NODE_SEGMENTED and in turn restart the nodes which are in
invalid segments.

Many thanks for writing the plugin :D

On Fri, Aug 31, 2018 at 6:17 PM Lokesh Sharma <lo...@gmail.com>
wrote:

> Luqman, if I subscribe to EVT_NODE_FAILED, how do I figure out which node
> to restart? For example in case of 2 nodes separating from each other, both
> would receive that event.
>
> On Thu, Aug 30, 2018 at 4:30 AM luqmanahmad <lu...@gmail.com> wrote:
>
>> Lokesh, looking at the javadocs of [1] you can subscribe for
>> EVT_NODE_FAILED
>> and EVT_NODE_SEGMENTED events.
>>
>> From my personal experience SegmentationPolicy.NOOP is required in very
>> rare
>> cases. My approach would be to STOP the node in most of the cases when the
>> segment happens or you can RESTART depending on your use case.
>>
>> [1]  EVT_NODE_FAILED
>> <
>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/EventType.html#EVT_NODE_FAILED>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: What happens during network failure?

Posted by Lokesh Sharma <lo...@gmail.com>.
Luqman, if I subscribe to EVT_NODE_FAILED, how do I figure out which node
to restart? For example in case of 2 nodes separating from each other, both
would receive that event.

On Thu, Aug 30, 2018 at 4:30 AM luqmanahmad <lu...@gmail.com> wrote:

> Lokesh, looking at the javadocs of [1] you can subscribe for
> EVT_NODE_FAILED
> and EVT_NODE_SEGMENTED events.
>
> From my personal experience SegmentationPolicy.NOOP is required in very
> rare
> cases. My approach would be to STOP the node in most of the cases when the
> segment happens or you can RESTART depending on your use case.
>
> [1]  EVT_NODE_FAILED
> <
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/EventType.html#EVT_NODE_FAILED>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: What happens during network failure?

Posted by luqmanahmad <lu...@gmail.com>.
Lokesh, looking at the javadocs of [1] you can subscribe for EVT_NODE_FAILED
and EVT_NODE_SEGMENTED events.

From my personal experience SegmentationPolicy.NOOP is required in very rare
cases. My approach would be to STOP the node in most of the cases when the
segment happens or you can RESTART depending on your use case.

[1]  EVT_NODE_FAILED
<https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/EventType.html#EVT_NODE_FAILED>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: What happens during network failure?

Posted by Lokesh Sharma <lo...@gmail.com>.
Thanks for the docs.

The docs says "EVT_NODE_SEGMENTED" is

Generated when node determines that it runs in invalid network segment.


But when I disconnected 1 node from a cluster of 2 nodes, none of the 2
nodes generated that event. They both released only "EVT_NODE_FAIL" event.

On Wed, Aug 29, 2018 at 5:38 PM luqmanahmad <lu...@gmail.com> wrote:

> Lokesh, see [1] how node will act when the segmentation will occur. You
> have
> to register for EVT_NODE_SEGMENTED events. Ignite plugins contains all this
> information as well.
>
> [1]  SegmentationPolicy
> <
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/plugin/segmentation/SegmentationPolicy.html>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: What happens during network failure?

Posted by luqmanahmad <lu...@gmail.com>.
Lokesh, see [1] how node will act when the segmentation will occur. You have
to register for EVT_NODE_SEGMENTED events. Ignite plugins contains all this
information as well.

[1]  SegmentationPolicy
<https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/plugin/segmentation/SegmentationPolicy.html>  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: What happens during network failure?

Posted by Lokesh Sharma <lo...@gmail.com>.
Thanks Luqman

This is what I'm looking for. I'm stuck on one thing though. What's the
difference between EVT_NODE_FAIL and EVT_NODE_SEGMENTED? I want to restart
the detached node. I created 2 nodes to experiment. I detached them and
both the nodes received EVT_NODE_FAIL but none of them received
EVT_NODE_SEGMENTED.

On Wed, Aug 29, 2018 at 2:34 PM luqmanahmad <lu...@gmail.com> wrote:

> Lokesh, this is a pure split brain scenarios see [1] how you can overcome
> this problem.
>
> [1]  Ignite plugins <https://github.com/luqmanahmad/ignite-plugins>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: What happens during network failure?

Posted by luqmanahmad <lu...@gmail.com>.
Lokesh, this is a pure split brain scenarios see [1] how you can overcome
this problem.

[1]  Ignite plugins <https://github.com/luqmanahmad/ignite-plugins>  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: What happens during network failure?

Posted by Lokesh Sharma <lo...@gmail.com>.
>
> I mainly want to know whether the detached need would join the cluster
> automatically after the network is back or not?
>
*need = node

I tests such a situation, and found that the detached node is not joining
the cluster. Instead it produces following errors:

2018-08-29 12:35:25.625 ERROR 3129 --- [5%igniteserver%]
> .s.d.t.i.m.TcpDiscoveryMulticastIpFinder : Failed to request nodes
> addresses.


> java.net.SocketException: Cannot assign requested address (Error setting
> socket option)

at java.net.PlainDatagramSocketImpl.socketSetOption0(Native Method)
> ~[na:1.8.0_151]

at
> java.net.PlainDatagramSocketImpl.socketSetOption(PlainDatagramSocketImpl.java:74)
> ~[na:1.8.0_151]

at
> java.net.AbstractPlainDatagramSocketImpl.setOption(AbstractPlainDatagramSocketImpl.java:309)
> ~[na:1.8.0_151]

at java.net.MulticastSocket.setInterface(MulticastSocket.java:471)
> ~[na:1.8.0_151]

at
> org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder.requestAddresses(TcpDiscoveryMulticastIpFinder.java:565)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]

at
> org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder.access$700(TcpDiscoveryMulticastIpFinder.java:80)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]

at
> org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder$AddressReceiver.body(TcpDiscoveryMulticastIpFinder.java:780)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]

at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]


> 2018-08-29 12:35:25.827 ERROR 3129 --- [5%igniteserver%]
> .s.d.t.i.m.TcpDiscoveryMulticastIpFinder : Failed to request nodes
> addresses.


> java.net.SocketException: Cannot assign requested address (Error setting
> socket option)

at java.net.PlainDatagramSocketImpl.socketSetOption0(Native Method)
> ~[na:1.8.0_151]

at
> java.net.PlainDatagramSocketImpl.socketSetOption(PlainDatagramSocketImpl.java:74)
> ~[na:1.8.0_151]

at
> java.net.AbstractPlainDatagramSocketImpl.setOption(AbstractPlainDatagramSocketImpl.java:309)
> ~[na:1.8.0_151]

at java.net.MulticastSocket.setInterface(MulticastSocket.java:471)
> ~[na:1.8.0_151]

at
> org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder.requestAddresses(TcpDiscoveryMulticastIpFinder.java:565)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]

at
> org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder.access$700(TcpDiscoveryMulticastIpFinder.java:80)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]

at
> org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder$AddressReceiver.body(TcpDiscoveryMulticastIpFinder.java:780)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]

at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [ignite-core-2.7.0-SNAPSHOT.jar!/:2.7.0-SNAPSHOT]


Any idea how to fix this? I also tried with the flag
"-Djava.net.preferIPv4Stack=true" but it didn't make a difference.

>


On Tue, Aug 28, 2018 at 9:04 PM Lokesh Sharma <lo...@gmail.com>
wrote:

> I mainly want to know whether the detached need would join the cluster
> automatically after the network is back or not?
>
> On Tue, Aug 28, 2018, 8:09 PM Lokesh Sharma <lo...@gmail.com>
> wrote:
>
>> What is the expected behaviour of Ignite when a node is unreachable from
>> other nodes of the cluster due to *network failure *(and not due to node
>> failure), and after a while the node is again accessible from the cluster?
>> In other words, one node gets dropped from the cluster due to network
>> failure and then again joins the cluster but it does not reboots.
>>
>

Re: What happens during network failure?

Posted by Lokesh Sharma <lo...@gmail.com>.
I mainly want to know whether the detached need would join the cluster
automatically after the network is back or not?

On Tue, Aug 28, 2018, 8:09 PM Lokesh Sharma <lo...@gmail.com>
wrote:

> What is the expected behaviour of Ignite when a node is unreachable from
> other nodes of the cluster due to *network failure *(and not due to node
> failure), and after a while the node is again accessible from the cluster?
> In other words, one node gets dropped from the cluster due to network
> failure and then again joins the cluster but it does not reboots.
>