You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/08/03 23:11:14 UTC

[jira] Created: (HBASE-1737) Regions unbalanced when adding new node

Regions unbalanced when adding new node
---------------------------------------

                 Key: HBASE-1737
                 URL: https://issues.apache.org/jira/browse/HBASE-1737
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Jonathan Gray
            Priority: Blocker
             Fix For: 0.20.0


When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.

To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.

Master will get itself into a broken, stuck state where it continuously outputs a line like this:

{noformat}
2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
{noformat}

This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.

Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HBASE-1737) Regions unbalanced when adding new node

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray reopened HBASE-1737:
----------------------------------


Commit of HBASE-1743 undid this patch, need to reapply.

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1737-v1.patch
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1737) Regions unbalanced when adding new node

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738624#action_12738624 ] 

stack commented on HBASE-1737:
------------------------------

+1 on patch.

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1737-v1.patch
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1737) Regions unbalanced when adding new node

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1737:
---------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed.  Thanks for review stack.

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1737-v1.patch
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1737) Regions unbalanced when adding new node

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1737:
---------------------------------

    Attachment: HBASE-1737-v1.patch

Fixes small bug in RegionManager.

When looking at the most loaded server, we check to see if there is another server that is underloaded.  When we determine we should unassign from it (numRegionsToClose = 0) then we determine how many to unassign.  However, we don't re-set numRegionsToClose to the number determine for reassignment (so it stays at 0, and thus 0 are reassigned).

Also has a few small formatting changes and an extra variable in log line.

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1737-v1.patch
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1737) Regions unbalanced when adding new node

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1737:
---------------------------------

    Assignee: Jonathan Gray
      Status: Patch Available  (was: Open)

Newly added regionserver does not reach the exact level as others, but with this patch all RS load are within slop.

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1737-v1.patch
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1737) Regions unbalanced when adding new node

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738675#action_12738675 ] 

stack commented on HBASE-1737:
------------------------------

Backported to 0.20 branch.

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1737-v1.patch
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1737) Regions unbalanced when adding new node

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray resolved HBASE-1737.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0

Recommitted to trunk and branch.

This patch was not included in RC2.  I'm going to -1 RC2 as I consider this a blocker.

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: HBASE-1737-v1.patch
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1737) Regions unbalanced when adding new node

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738618#action_12738618 ] 

Jonathan Gray commented on HBASE-1737:
--------------------------------------

The errant section of code was introduced with HBASE-1017

> Regions unbalanced when adding new node
> ---------------------------------------
>
>                 Key: HBASE-1737
>                 URL: https://issues.apache.org/jira/browse/HBASE-1737
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> When adding a new RegionServer to a cluster, the new RS will receive some regions but not enough to actually be considered balanced.
> To recreate, just take an RS offline, allow regions to be reassigned, and then bring it back up.
> Master will get itself into a broken, stuck state where it continuously outputs a line like this:
> {noformat}
> 2009-08-03 12:54:57,812 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server dn4,60020,1249329081079 will be unloaded for balance. Server load: 341 avg: 318.0, regions can be moved: 55
> {noformat}
> This line is output every 3 seconds and never stops until another RS joins/leaves the cluster.
> Making this a blocker because when your new RS only gets some regions (in my case, about half as many as it should have), then all new regions will be assigned to that RS.  This basically destroys any possibility for good load distribution with new data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.