You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/01/05 22:38:45 UTC

[jira] Created: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
--------------------------------------------------------------------------------------------------

                 Key: HBASE-3422
                 URL: https://issues.apache.org/jira/browse/HBASE-3422
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.90.0
            Reporter: stack


See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.

If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008124#comment-13008124 ] 

Ted Yu commented on HBASE-3422:
-------------------------------

How about introducing hbase.balancer.maxregions.perround ?


> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3422) Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008673#comment-13008673 ] 

Ted Yu commented on HBASE-3422:
-------------------------------

Summary of chat with Stack on IRC (see http://pastebin.com/4uK9M1Z7):
Since it is difficult to estimate the appropriate number of regions to balance in one invocation of balance(), I resort to respecting the hbase.balancer.period
Another option would be to limit execution time of balance() to certain percentage of hbase.balancer.period
But that would introduce another parameter which complicates our scenario.


> Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>         Attachments: hbase-3422.txt
>
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3422:
--------------------------

    Attachment: hbase-3422.txt

First attempt of using heuristics to decide whether executing the next RegionPlan would make single balancer() call too long.

> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>         Attachments: hbase-3422.txt
>
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3422) Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008702#comment-13008702 ] 

Ted Yu commented on HBASE-3422:
-------------------------------

Related unit tests: TestMasterObserver, TestMultiParallel, TestLoadBalancer and TestRegionRebalancing all pass.

> Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>         Attachments: hbase-3422.txt
>
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (HBASE-3422) Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3422.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.92.0
     Hadoop Flags: [Reviewed]

Lets try it Ted.  On commit I added in being able to set in config. an explicit limit on how long balancer would run but that default is that this is not specified.  I also added logging (DEBUG) for when balancer is cutoff because it ran out of time.

Thanks for the patch Ted.  Committed to TRUNK.

> Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.92.0
>
>         Attachments: hbase-3422.txt
>
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-3422:
-----------------------------

    Assignee: Ted Yu

> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008131#comment-13008131 ] 

stack commented on HBASE-3422:
------------------------------

Sounds good Ted.  Should not apply to the bulk assign on startup though.  Good stuff.

> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008447#comment-13008447 ] 

Ted Yu commented on HBASE-3422:
-------------------------------

Currently it is possible for one HMaster.balance() call to last longer than hbase.balancer.period

We should limit the execution time of HMaster.balance() by hbase.balancer.period
Is this equivalent to introducing hbase.balancer.maxregions.perround ?

> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3422) Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3422:
--------------------------

    Component/s: master
        Summary: Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.  (was: Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.)

> Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>         Attachments: hbase-3422.txt
>
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008286#comment-13008286 ] 

stack commented on HBASE-3422:
------------------------------

@Ted I like that idea.

> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Work started: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-3422 started by Ted Yu.

> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>         Attachments: hbase-3422.txt
>
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3422) Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008890#comment-13008890 ] 

Hudson commented on HBASE-3422:
-------------------------------

Integrated in HBase-TRUNK #1798 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1798/])
    HBASE-3422 Balancer will try to rebalance thousands of regions in one go; needs an upper bound added


> Balancer will try to rebalance thousands of regions in one go; needs an upper bound added.
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.92.0
>
>         Attachments: hbase-3422.txt
>
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008132#comment-13008132 ] 

Ted Yu commented on HBASE-3422:
-------------------------------

In terms of putting upper bound on the time it takes per call to HMaster.balance(), I think master should establish some metric about the execution time of plan execution.
Here is related code:
{code}
      List<RegionPlan> plans = this.balancer.balanceCluster(assignments);
      if (plans != null && !plans.isEmpty()) {
        for (RegionPlan plan: plans) {
          LOG.info("balance " + plan);
          this.assignmentManager.balance(plan);
{code}
If the metric is collected for assignmentManager.balance() calls, balancer.balanceCluster() can make use of the metric and adjust the maximum number of regions assigned in one round.

> Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3422
>                 URL: https://issues.apache.org/jira/browse/HBASE-3422
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: Ted Yu
>
> See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and < 1k on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.
> If a load of rebalancing to be done, should be done somewhat piecemeal.  We need maximum regions to rebalance at a time upper bound at a minimum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira