You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2008/11/22 04:12:44 UTC

[jira] Created: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Region balancing does not bring newly added node within acceptable range
------------------------------------------------------------------------

                 Key: HBASE-1017
                 URL: https://issues.apache.org/jira/browse/HBASE-1017
             Project: Hadoop HBase
          Issue Type: Improvement
    Affects Versions: 0.19.0
            Reporter: Jonathan Gray
            Priority: Minor
             Fix For: 0.20.0


With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.

Starting up the 10th node, master log showed:

{code}
2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
{code}

The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.

This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: loadbalance2.0.patch

loadbalance2.0.patch is for my mega cool low-centralised load balance algorithm... 
but it is prototype yet... just to show my new ideas :)
and it's independent from other patches here

What was idea:

 * Region Servers knows better what what regions to unassignee ... and can make own decisions about it.
 * For such decisions HRS will use LoadBalancer thread
 * To make such decisions HRS need to know current load situation in cluster (LoadMetrics)
 * HRS reading LoadMetrics record from ZK
 * If HRS can't get LoadMetrics record, it makes LoadBalance Slip
 * If HRS founds out that is is overloaded it closes some Regions

 * Master can update and put in ZK new LoadMetrics record with some frequency
 * LoadMetrics record contains: avgLoad, maxLoad, upLoadBound, lowLoadBound, uderloadinFactor
 * LoadMetrics is a class with that attributes and can be serialised to bytes and read from bytes 
 * LoadMetrics record is a data of some special Ephemeral zNode in ZK, created by Master
 
* Master still assigning closed regions to HRS, so balance if half-centralised (unnasigne is distributed and assignee is centralised)
 * in future master wil use a flag in LoadMetrics to stop unassigning if there too much closed Regions 



> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Status: Patch Available  (was: In Progress)

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v9.patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Status: Open  (was: Patch Available)

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v6.patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v5.patch

Added new assertion for TestRegionRebalancing scenario.

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment:     (was: hbase1017_v3.patch)

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-1017 started by Evgeny Ryabitskiy.

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v4.patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, hbase1017_v3.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Status: Patch Available  (was: Open)

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699325#action_12699325 ] 

Evgeny Ryabitskiy commented on HBASE-1017:
------------------------------------------

About refactoring

Server manager has mapping: 
 - serverName 2 serverInfo,
 - serverAddr 2 serverInfo,
 - serverName 2 load, 
 - load 2 severName

1) serverName 2 load - not necessary if you have  serverName 2 serverInfo
2) All mappings are encapsulated in ServersInfo class (inner class of ServerManager)
3) ServersInfo has operations for adding, updating and removing information of HRS


About Load Balance Algorithm

Previous check: If HRS load more then avg Load Plus Slop, the HRS is overloaded, close some regions (numToClose = currentRegions - avgLoad)

Added check: If HRS is most loaded and lowest loaded HRS are loaded less then avgLoadMinusSlop then close some regions from most loaded (numToClose = min(currentRegions - avgLoad, (avgLoadMinusSlop - lowestLoad) * numLowestLoadedHRS)  )



Changes to JUnit for Region Balance:

Assert check if loads of all HRS are in slop range after rebalnce.

Number of HRS upped to 10 from 4.

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1017:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Tested latest version of patch.  Had to do it on quiesced cluster because under load region count is all over place.  Also, killing servers, didn't kill regionserver hosting meta because that makes a mess of counts too.

But, killing non-catalog hosting regionserver, balance came back promptly.  Adding in a new node after, balance again came back quickly.   Did this a few times.  Had enough regions that I should have had Jon's original issue if it had not been fixed.

Thanks for the patch Evgeny.

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch, HBASE-1017_v12_FINAL.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: hbase1017_v3.patch

Extract balancing to LoadBalancer class + more re-factoring + sync wit SVN

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, hbase1017_v3.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v1.patch

First version of this logic. It's outline that can be improved.

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v7.patch

Last tested version. should do everything  :)

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v8.patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment:     (was: HBASE-1017_v10.patch)

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711199#action_12711199 ] 

Evgeny Ryabitskiy commented on HBASE-1017:
------------------------------------------

thx for revising my patch!

 * patch regenerated, ^Ms. removed
 * yes, getLoadToServers changed from public to default visibility
 * added more detailed doc for Load balancer

yes, my fault..... sory :( 
While removing ServerManager refactor I forgot about several necessary changes there (that is why I started that refactor)

ServerManager changes:

 * getAverageLoad returns accurate avg load (without round as it was before)

 * clean up garbage in loadToServers mapping (if there no servers with such load, then there no record with such load key). before there was record for every old load with empty (size ==0) servers value.

First one need for more accurate balancing, second is because new LoadBalance maintains on  loadToServers mapping and garbage from old loads making this logic to do wrong decisions

now it should work in HBASE-1017_v12_FINAL.patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch, HBASE-1017_v12_FINAL.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v12_FINAL.patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch, HBASE-1017_v12_FINAL.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Comment: was deleted

(was: regenerated patch)

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699327#action_12699327 ] 

Evgeny Ryabitskiy commented on HBASE-1017:
------------------------------------------

Patch is ready for revision.
All JUnit tests are passed.

Need testing on a real cluster. If anyone can help me on it?

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy reassigned HBASE-1017:
----------------------------------------

    Assignee: Evgeny Ryabitskiy

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v10.patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v10.patch

regenerated patch

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710503#action_12710503 ] 

stack commented on HBASE-1017:
------------------------------

I took a look at this patch:

+ Remove the ^Ms.
+ getLoadToServers in ServerManager doesn't need to be public, right?
+ Test looks good and I like making a class to encapsulate load balancing logic.  I'd suggest adding javadoc to the load balancer explaining how it works.

I tried the code.  I loaded up a bunch of regions, then shut it down.  Restarted.  All came up balanced after a little while.  I then tried adding a server to the cluster which seems to be what Jon was doing above but it never got any regions:

aa0-000-12.u.powerset.com:60031	1242680796620	requests=0, regions=0, usedHeap=27, maxHeap=1244
aa0-000-13.u.powerset.com:60031	1242680136542	requests=0, regions=21, usedHeap=158, maxHeap=1244
aa0-000-14.u.powerset.com:60031	1242680136673	requests=0, regions=20, usedHeap=71, maxHeap=1244
aa0-000-15.u.powerset.com:60031	1242680136162	requests=0, regions=19, usedHeap=106, maxHeap=1244

It stayed at zero.  Wasn't this patch supposed to address that?



> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v2.patch

Same algorithm + some code reorganisation + some refactoring to  ServerManager

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v2.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range

Posted by "Evgeny Ryabitskiy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Evgeny Ryabitskiy updated HBASE-1017:
-------------------------------------

    Attachment: HBASE-1017_v11_FINAL.patch

Without refactor to Server manager.
final version
maybe can seemed not so small change..... but don't have idea how to make it smaller

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions, each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions.  It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.