You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "privacyfirst@codesandnotes.be" <pr...@codesandnotes.be> on 2021/10/28 13:15:49 UTC

Reliability of Apache Ignite as a Multi-Tier Storage

Dear,

One of the "Core Features" listed by ignite.apache.org is the capability 
of Ignite to be a Multi-Tier Storage. However, unless I have 
misunderstood something, I am worried that this storage is not reliable...

I currently have an application that uses an Ignite cluster as a DB. The 
cluster contains two nodes at the moment: the second node backs up the 
first. Each Ignite node is on a VPS server at OVH.

Lately OVH had a series of issues which apparently brought down the 
communication between those VPS servers. The consequence was that the 
Ignite nodes couldn't talk to each other and therefore split, each node 
upgrading to a new Baseline Topology and each one seeing the other node 
as being offline.

Restarting the nodes would result in an error on one of them:
Caused by: class org.apache.ignite.spi.IgniteSpiException: 
BaselineTopology of joining node (72fbc939-bf09-42cb-a7e4-12896046cfc0) 
is not compatible with BaselineTopology in the cluster. Branching 
history of cluster BlT ([1060612220]) doesn't contain branching point 
hash of joining node BlT (173037243). Consider cleaning persistent 
storage of the node and adding it to the cluster again.
         at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2052)
         at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1197)
         at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:472)
         at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2154)
         at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278) 


At this point, the only option I found was to destroy my backup node 
(deleting the "/work" folder"), restart it and add it back to the cluster.

Obviously this becomes a real problem when scaling up and having one's 
data distributed among multiple Ignite nodes. If a major network issue 
occurs (such as days of network outage) Ignite nodes might (will?) end 
up in the same state than my backup node in my example, therefore losing 
data.

So, is my theory above correct or have I misunderstood Ignite's 
capabilities as a Multi-Tier storage solution?
Is a cluster node able to re-join a cluster it's been disconnected from 
for a significant amount of time?
And if a node is disconnected and starts giving a "BaselineTopology of 
joining node is not compatible with BaselineTopology in the cluster" 
then how can I recover my data ?

Thanks for your help,

Diego


Re: Reliability of Apache Ignite as a Multi-Tier Storage

Posted by Stephen Darlington <st...@gridgain.com>.
As far as I know, “multi-tier” refers to the ability to store some data in memory, the rest on disk.

You experienced “split brain,” which is a difficult problem to solve in any distributed system. From your description my guess is that you’ve enabled baseline auto-adjust, which is generally not a good idea when you have persistence turned on.

With a catastrophic network failure like that, I would expect that you would need to restart some nodes. With a correctly configured cluster, you shouldn’t need to “destroy” a node.

> On 28 Oct 2021, at 14:15, privacyfirst@codesandnotes.be wrote:
> 
> Dear,
> 
> One of the "Core Features" listed by ignite.apache.org is the capability of Ignite to be a Multi-Tier Storage. However, unless I have misunderstood something, I am worried that this storage is not reliable...
> 
> I currently have an application that uses an Ignite cluster as a DB. The cluster contains two nodes at the moment: the second node backs up the first. Each Ignite node is on a VPS server at OVH.
> 
> Lately OVH had a series of issues which apparently brought down the communication between those VPS servers. The consequence was that the Ignite nodes couldn't talk to each other and therefore split, each node upgrading to a new Baseline Topology and each one seeing the other node as being offline.
> 
> Restarting the nodes would result in an error on one of them:
> Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (72fbc939-bf09-42cb-a7e4-12896046cfc0) is not compatible with BaselineTopology in the cluster. Branching history of cluster BlT ([1060612220]) doesn't contain branching point hash of joining node BlT (173037243). Consider cleaning persistent storage of the node and adding it to the cluster again.
>         at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2052)
>         at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1197)
>         at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:472)
>         at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2154)
>         at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278) 
> 
> At this point, the only option I found was to destroy my backup node (deleting the "/work" folder"), restart it and add it back to the cluster.
> 
> Obviously this becomes a real problem when scaling up and having one's data distributed among multiple Ignite nodes. If a major network issue occurs (such as days of network outage) Ignite nodes might (will?) end up in the same state than my backup node in my example, therefore losing data.
> 
> So, is my theory above correct or have I misunderstood Ignite's capabilities as a Multi-Tier storage solution?
> Is a cluster node able to re-join a cluster it's been disconnected from for a significant amount of time?
> And if a node is disconnected and starts giving a "BaselineTopology of joining node is not compatible with BaselineTopology in the cluster" then how can I recover my data ?
> 
> Thanks for your help,
> 
> Diego
>