You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Vinogradov (Jira)" <ji...@apache.org> on 2022/10/03 19:40:00 UTC
[jira] [Updated] (IGNITE-17793) Historical rebalance must use HWM instead of LWM to seek the proper checkpoint

     [ https://issues.apache.org/jira/browse/IGNITE-17793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Vinogradov updated IGNITE-17793:
--------------------------------------
    Description: 
Currently, historical rebalance at {{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest checkpoint with counter less that lowest entry has to be rebalanced.

Unfortunately, we may have more that one checkpoint with the same counter and it's impossible to use the newest one as a rebalance start point.

For example, we have partition with LWM=100, some gaps and HWM=200.
Checkpoint will have the counter == 100.
Then we may close some gaps, exluding 101 (to keep LWM == 100).
And again, checkpoint will have counter == 100.
Newest checkpoint (marked with counter 100) will not cointain all committed entries with counter > 100.
Then closing the rest of the gaps to make historical rebalance possible.
And after the rebalance finish, we'll see a warning "Some partition entries were missed during historical rebalance" and inconsistent cluster state.

Possible solution is to use HWM instead of LWM during the search.

  was:
Currently, historical rebalance at {{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest checkpoint with counter less that lowest entry has to be rebalanced.

Unfortunately, we may have more that one checkpoint with the same counter and it's impossible to use the newest one as a rebalance start point.

For example, we have partition with LWM=100, some gaps and HWM=200.
Checkpoint will have the counter == 100.
Then we may close some gaps, exluding 101 (to keep LWM == 100).
And again, checkpoint will have counter == 100.
Newest checkpoint marked with counter 100 will not cointain all committed entries with counter > 100.
Then closing the rest of the gaps to make historical rebalance possible.
And after the rebalance finish, we'll see a warning "Some partition entries were missed during historical rebalance" and inconsistent cluster state.

Possible solution is to use HWM instead of LWM during the search.


> Historical rebalance must use HWM instead of LWM to seek the proper checkpoint
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-17793
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17793
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Vinogradov
>            Priority: Major
>              Labels: iep-31, ise
>
> Currently, historical rebalance at {{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest checkpoint with counter less that lowest entry has to be rebalanced.
> Unfortunately, we may have more that one checkpoint with the same counter and it's impossible to use the newest one as a rebalance start point.
> For example, we have partition with LWM=100, some gaps and HWM=200.
> Checkpoint will have the counter == 100.
> Then we may close some gaps, exluding 101 (to keep LWM == 100).
> And again, checkpoint will have counter == 100.
> Newest checkpoint (marked with counter 100) will not cointain all committed entries with counter > 100.
> Then closing the rest of the gaps to make historical rebalance possible.
> And after the rebalance finish, we'll see a warning "Some partition entries were missed during historical rebalance" and inconsistent cluster state.
> Possible solution is to use HWM instead of LWM during the search.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)