You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/09/12 08:07:00 UTC

[jira] [Commented] (GEODE-10409) Rebalance Model Missing Collocated Regions At Server Startup

    [ https://issues.apache.org/jira/browse/GEODE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602982#comment-17602982 ] 

ASF subversion and git services commented on GEODE-10409:
---------------------------------------------------------

Commit 0852113f1b8086203ffdd99bae1afa250c2eaa3e in geode's branch refs/heads/develop from WeijieEST
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=0852113f1b ]

GEODE-10409: Fix rebalance load model missing collocated regions at s… (#7839)

* GEODE-10409: Fix rebalance load model missing collocated regions at server startup

Assume region A collocated with A1 and A2, and a is the leader region, when rebalance at startup,
rebalance will happened after the 3 region collocation completed, generally this happened in region A2.
And when calculate rebalance load model from view of region A2, only leader region A and A2 itself will
be added to the model, this commit fix the issue and make A1 also be added to the model.

* add test cases to test rebalance model and remove the static mock

* change test case to avoid changing existing methods for testing

* improve test case

> Rebalance Model Missing Collocated Regions At Server Startup
> ------------------------------------------------------------
>
>                 Key: GEODE-10409
>                 URL: https://issues.apache.org/jira/browse/GEODE-10409
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Weijie Xu
>            Assignee: Weijie Xu
>            Priority: Major
>              Labels: needsTriage, pull-request-available
>         Attachments: server2.log, test.tar.gz
>
>
> Following steps reproduce the issue:
> Run the start.gfsh in the attached example, which configures a geode system with a partitioned region, a gateway sender and a collocated region with the partitioned region. So there are three regions totally, the leader region, the collcated region and the queue region.
> Then run the example code, which will source ~400M data and 5 times amount of events into the system.
> Then stop one of the server, and revoke the disk file of the server.
> Then start the server, which will trigger a bucket recovery.
> From the attached log line596, line598 and line5958, we can see that the queue region is not included in the rebalance model, either in the data size colum nor in the max size colum.
> Then do a manual rebalance after the server is up, this time log shows the queue region is added to the model.(line6010, line6012, lin6014 and line6028)
>  
> The inconsistent behavior will lead to 2 negative results:
> 1) Different result of rebalance between server startup phase and manual trigger, startup rebalance tells everything is OK, rebalance finished, but manual trigger rebalance tells space not enough since it included the queue region into the model which has 5 times data size as the leader region.
> 2) A dismatch between the rebalance model and the actual data being rebalanced(Actually the queue region data is rebalanced although the region is not included in the model at server startup phase).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)