You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Actarus <ma...@grassvalley.com> on 2020/06/16 13:17:03 UTC

How to fix Ignite node segmentation without restart

Hello,

I'm running Apache Ignite (2.4.0) embedded into a java application that runs
in a master/slave architecture. This means that there are only ever two
nodes in a grid, in FULL_SYNC, REPLICATED mode. Only the master application
writes to the grid, the slave only reads from it when it gets promoted to
master on a failover.

In such an architecture, network segmentation issues mean different things.
Typically I see that for handling segmentation, the node that experienced
the issue would need to be restarted. However in this scenario if the master
is segmented, I do not want to restart it and I cannot do a failover because
a network issue just happened and the stand-by may be invalid. The fix is to
always restart the slave.

However I notice that regardless of handling the EVT_NODE_SEGMENTED event,
adding a SegmentationProcess, running with SegmentationPolicy.NOOP and
having a segmentation plugin and always returning true/OK, I find that the
node that runs in master always remains in segmented state, and it is
impossible for it to re-join a cluster after restarting the slave node.

Is there some mechanism I can use to tell the node within my master process
to completely ignore segmentation? Or tell it that it is fine so that
discovery can still happen after I restart the slave node? Currently I used
port 4444 with TcpDiscoverySpi with hard-coded addresses (master and slave
IP addresses). When the master node is segmented (by simulating network
issues on the command-line) it appears there's no way for the discovery to
recover - port 4444 is shut down, and the slave node always comes up blind
to the master.

I would appreciate any insights on this issue. Thank you.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: How to fix Ignite node segmentation without restart

Posted by Evgenii Zhuravlev <e....@gmail.com>.

>I do not want to restart it and I cannot do a failover because a network
issue just happened and the stand-by may be invalid. The fix is to always
restart the slave.
You can enable CacheWriteSynchronizationMode.FULL_SYNC and there will be no
differences between primary and backup partitions. In this case, you can
just restart your master node - the backup node will have valid data.

There is no way to join nodes after segmentation without restarting one of
the nodes.

Evgenii



вт, 16 июн. 2020 г. в 06:26, Actarus <ma...@grassvalley.com>:

> Hello,
>
> I'm running Apache Ignite (2.4.0) embedded into a java application that
> runs
> in a master/slave architecture. This means that there are only ever two
> nodes in a grid, in FULL_SYNC, REPLICATED mode. Only the master application
> writes to the grid, the slave only reads from it when it gets promoted to
> master on a failover.
>
> In such an architecture, network segmentation issues mean different things.
> Typically I see that for handling segmentation, the node that experienced
> the issue would need to be restarted. However in this scenario if the
> master
> is segmented, I do not want to restart it and I cannot do a failover
> because
> a network issue just happened and the stand-by may be invalid. The fix is
> to
> always restart the slave.
>
> However I notice that regardless of handling the EVT_NODE_SEGMENTED event,
> adding a SegmentationProcess, running with SegmentationPolicy.NOOP and
> having a segmentation plugin and always returning true/OK, I find that the
> node that runs in master always remains in segmented state, and it is
> impossible for it to re-join a cluster after restarting the slave node.
>
> Is there some mechanism I can use to tell the node within my master process
> to completely ignore segmentation? Or tell it that it is fine so that
> discovery can still happen after I restart the slave node? Currently I used
> port 4444 with TcpDiscoverySpi with hard-coded addresses (master and slave
> IP addresses). When the master node is segmented (by simulating network
> issues on the command-line) it appears there's no way for the discovery to
> recover - port 4444 is shut down, and the slave node always comes up blind
> to the master.
>
> I would appreciate any insights on this issue. Thank you.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>