You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Vyacheslav Koptilin (Jira)" <ji...@apache.org> on 2020/06/29 08:41:00 UTC

[jira] [Created] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

Vyacheslav Koptilin created IGNITE-13193:
--------------------------------------------

             Summary: Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL
                 Key: IGNITE-13193
                 URL: https://issues.apache.org/jira/browse/IGNITE-13193
             Project: Ignite
          Issue Type: Improvement
    Affects Versions: 2.8.1
            Reporter: Vyacheslav Koptilin
            Assignee: Vyacheslav Koptilin


Historical rebalance may fail for several reasons:
1) WAL on supplier node is corrupted - the supplier will trigger a failure handler in the current implementation.
2) After iteration over WAL demander node didn't receive all updates to make MOVING partition up-to-date (resulting update counter didn't converge with expected update counter of OWNING partition) - demander will silently ignore lack of updates in the current implementation.
Such behavior negatively affects the stability of the cluster: an inappropriate state of historical WAL is not a reason to fail a supplier node.
The more proper way to handle this scenario is:
 - Either try to rebalance partition historically from another supplier
 - Or use full partition rebalance for problem partition

Once the supplier fails to provide data from part of the WAL, its corresponding sequence of checkpoints should be marked as inapplicable for historical rebalance in order to prevent further errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)