You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" <mr...@bloomberg.net> on 2020/02/05 16:10:22 UTC

Issue with BaselineTopology Branching History

We have recently encountered the following:

Caused by: org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (404d8988-6c2d-4612-ab17-fde635b9da8f) is not compatible with BaselineTopology in the cluster.
Branching history of cluster BlT ([-205608975, 383765073, 1797002251, -1091313502]) doesn't contain branching point hash of joining node BlT (-1295062797). Consider cleaning persistent storage of the node and adding it to the cluster again.
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1946) ~[stormjar.jar:?]
at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:969) ~[stormjar.jar:?]
at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:391) ~[stormjar.jar:?]
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2020) ~[stormjar.jar:?]
at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297) ~[stormjar.jar:?]
... 41 more

We were running a cluster with 4 nodes. Each node in the cluster has a couple of LOCAL caches, there are currently no replicated/partitioned caches. Looking at https://cwiki.apache.org/confluence/display/IGNITE/Automatic+activation+design+-+draft, it seems that this can happen when "there are different versions of the same data". However, since we have only LOCAL caches, I'm not sure how that could happen. So a couple of questions:

1. Why does this happen for our use case? How is the "branching point hash" of a node calculated?

2. Is there any documentation that talks about BaselineTopology in depth, including versioning/branching history?

3. As I mentioned, we are currently relying on LOCAL caches. The reason that we are doing this is that we don't have a need for the caches to be distributed across processes at this point, but still want the off-heap/persistence functionality, and potentially will have client nodes for a given server node as well. https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist shows that there are plans to remove LOCAL caches in Ignite 3.0. Since they are being deprecated, is there an equivalent way to achieve isolated caches with PARTITIONED/REPLICATED caches? If number of partitions is 1 and number of backups is 0, is this the same thing?

Re: Issue with BaselineTopology Branching History

Posted by akurbanov <an...@gmail.com>.

Hi Mitchell,

I'm not really sure whether versioning/branching history is covered anywhere
and it looks like it is worth covering.

Branching point hash = sum of hashcodes of BLT nodes consistent id's (long).

Each time baseline topology changes, the previous value is added to the
branching history, id is increased.

The joining node is rejected when couple of things happen (most of them are
baseline changes while being not a part of the cluster):

1. Joining node has greater BLT id than cluster.

2. Cluster BLT id is equals to joining node BLT id, but is not compatible.
That means that cluster branching history does not contains joining node
current BLT hash.

3. Joining node has lesser BLT id than cluster and branching history for
current id does not contain BLT hash of joining node.

PARTITIONED cache with node filter is an alternative to LOCAL cache.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Issue with BaselineTopology Branching History

Posted by rakshita04 <ra...@siemens.com>.

What if my second node changed due to hardware failure or something at
runtime.
Is there a way that i start new node first , delete baseline history of
first node somehow so that i can add older node to new node somehow?
I am asking this because in our software this scenario can occur and we
cannot control whether new node starts first or older node?
Is there a way we can make this scenario work?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Issue with BaselineTopology Branching History

Posted by andrei <ae...@gmail.com>.

Just try not to change the baseline after each server node restart to 
avoid this problem. The base topology will wait for this node.

BR,
Andrei

12/12/2020 4:44 PM, rakshita04 пишет:
> If by any chance, someone messes up this sequence, sometimes ignite is
> throwing error which is great on which we can take some action but sometimes
> its getting stuck and making our process also stuck.
> Is there a way that the node(new node) does not get stuck and throws some
> error or exception after a certain time?
>
> Regards,
> Rakshita
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Issue with BaselineTopology Branching History

Posted by rakshita04 <ra...@siemens.com>.

If by any chance, someone messes up this sequence, sometimes ignite is
throwing error which is great on which we can take some action but sometimes
its getting stuck and making our process also stuck.
Is there a way that the node(new node) does not get stuck and throws some
error or exception after a certain time?

Regards,
Rakshita 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Issue with BaselineTopology Branching History

Posted by andrei <ae...@gmail.com>.

No, you cannot start a new node first, because it will have a new 
baseline that will be different from the old nodes. Please start old 
nodes first and then add new node using the API mentioned above.

12/8/2020 3:59 PM, rakshita04 пишет:
> If i want to add a fresh node to cluster.
> Is it possible to start the fresh node first and then start the older node?
> How do i make sure that fresh node has persistence intact?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Issue with BaselineTopology Branching History

Posted by rakshita04 <ra...@siemens.com>.

If i want to add a fresh node to cluster.
Is it possible to start the fresh node first and then start the older node?
How do i make sure that fresh node has persistence intact?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/