You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by David Koch <og...@googlemail.com> on 2012/09/04 16:56:01 UTC

Fixing badly distributed table manually.

Hello,

A couple of questions regarding balancing of a table's data in HBase.

a) What is the easiest way to get an overview of how a table is distributed
across regions of a cluster? I guess I could search .META. but I haven't
figured out how to use filters from shell.
b) What constitutes a "badly distributed" table and how can I re-balance
manually?
c) Is b) needed at all? I know that HBase does its balancing automatically
behind the scenes.

As for a) I tried running this script:

https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb

like so:

hbase org.jruby.Main ./list_regions.rb <_my_table>

but I get

ArgumentError: wrong number of arguments (1 for 2)
  (root) at ./list_regions.rb:60

If someone more proficient notices an obvious fix, I'd be glad to hear
about it.

Why do I ask? I have the impression that one of the tables on our HBase
cluster is not well distributed. When running a Map Reduce job on this
table, the load average on a single node is very high, whereas all other
nodes are almost idling. It is the only table where this behavior is
observed. Other Map Reduce jobs result in slightly elevated load averages
on several machines.

Thank you,

/David

RE: Fixing badly distributed table manually.

Posted by Pablo Musa <pa...@psafe.com>.

> a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster?

I usually see by the web interface (host:60010).
Click on a table and scroll down. There will be a region count of this table across the cluster.

> b) What constitutes a "badly distributed" table and how can I re-balance manually?

I think the answer to this questions is manually split. There is a chapter in the book talking about it.
I am looking forward for an answer from the experienced guys ;)

> c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes.

>From my experience yes. HBase does not balance as much as I need. In the worst case I have
a difference of 16 regions (32 against 48) in a 10 machine cluster.

Hoping for a great answer so I don't have to do manual splits ;)

Regards,
Pablo

-----Original Message-----
From: David Koch [mailto:ogdude@googlemail.com] 
Sent: terça-feira, 4 de setembro de 2012 11:56
To: user@hbase.apache.org
Subject: Fixing badly distributed table manually.

Hello,

A couple of questions regarding balancing of a table's data in HBase.

a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell.
b) What constitutes a "badly distributed" table and how can I re-balance manually?
c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes.

As for a) I tried running this script:

https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb

like so:

hbase org.jruby.Main ./list_regions.rb <_my_table>

but I get

ArgumentError: wrong number of arguments (1 for 2)
  (root) at ./list_regions.rb:60

If someone more proficient notices an obvious fix, I'd be glad to hear about it.

Why do I ask? I have the impression that one of the tables on our HBase cluster is not well distributed. When running a Map Reduce job on this table, the load average on a single node is very high, whereas all other nodes are almost idling. It is the only table where this behavior is observed. Other Map Reduce jobs result in slightly elevated load averages on several machines.

Thank you,

/David

Re: Fixing badly distributed table manually.

Posted by Ted Yu <yu...@gmail.com>.

Can you tell us the version of HBase you're using.

The following feature (per table region balancing) isn't in 0.92.x:
https://issues.apache.org/jira/browse/HBASE-3373

On table.jsp page, you should see region count per region server.

Cheers

On Tue, Sep 4, 2012 at 7:56 AM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> A couple of questions regarding balancing of a table's data in HBase.
>
> a) What is the easiest way to get an overview of how a table is distributed
> across regions of a cluster? I guess I could search .META. but I haven't
> figured out how to use filters from shell.
> b) What constitutes a "badly distributed" table and how can I re-balance
> manually?
> c) Is b) needed at all? I know that HBase does its balancing automatically
> behind the scenes.
>
> As for a) I tried running this script:
>
> https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb
>
> like so:
>
> hbase org.jruby.Main ./list_regions.rb <_my_table>
>
> but I get
>
> ArgumentError: wrong number of arguments (1 for 2)
>   (root) at ./list_regions.rb:60
>
> If someone more proficient notices an obvious fix, I'd be glad to hear
> about it.
>
> Why do I ask? I have the impression that one of the tables on our HBase
> cluster is not well distributed. When running a Map Reduce job on this
> table, the load average on a single node is very high, whereas all other
> nodes are almost idling. It is the only table where this behavior is
> observed. Other Map Reduce jobs result in slightly elevated load averages
> on several machines.
>
> Thank you,
>
> /David
>

Re: Fixing badly distributed table manually.

Posted by Vincent Barat <vi...@gmail.com>.

Hi,

Balancing regions between RS is correctly handled by HBase : I mean 
that your RSs always manage the same number of regions (the balancer 
takes care of it).

Unfortunately, balancing all the regions of one particular table 
between the RS of your cluster is not always easy, since HBase (as 
for 0.90.3) when it comes to splitting a region, create the new one 
always on the same RS. This means that if you start with a 1 region 
only table, and then you insert lots of data into it, new regions 
will always be created to the same RS (if you insert is a M/R job, 
you saturate this RS). Eventually, the balancer at a time will 
decide to balance one of these regions to other RS, limiting the 
issue, but it is not controllable.

Here at Capptain, we solved this problem by developing a special 
Python script, based on the HBase shell, allowing to entirely 
balance all the regions of all tables to all RS. It ensure that 
regions of tables are uniformly deployed on all RS of the cluster, 
with a minimum region transitions.

It is fast, and even if it can trigger a lot of region transitions, 
there is very few impact at runtime and it can be run safely.

If you are interested, just let me know, I can share it.

Regards,

Le 04/09/12 23:42, David Koch a écrit :
> Hello,
>
> Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the
> web interface. The port is blocked so I never really got a chance to test
> it. As far as manual re-balancing is concerned I will check the book.
>
> /David
>
>
> On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey <
> guillaume.gardey@mendeley.com> wrote:
>
>> Hello,
>>
>>> a) What is the easiest way to get an overview of how a table is
>> distributed
>>> across regions of a cluster? I guess I could search .META. but I haven't
>>> figured out how to use filters from shell.
>>> b) What constitutes a "badly distributed" table and how can I re-balance
>>> manually?
>>> c) Is b) needed at all? I know that HBase does its balancing
>> automatically
>>> behind the scenes.
>> I have found that
>> http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/  is a good
>> source of information/tools to look at regions balancing in the cluster and
>> investigate it.
>>
>>> As for a) I tried running this script:
>>>
>>> https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb
>>>
>>> like so:
>>>
>>> hbase org.jruby.Main ./list_regions.rb <_my_table>
>>>
>>> but I get
>>>
>>> ArgumentError: wrong number of arguments (1 for 2)
>>>   (root) at ./list_regions.rb:60
>>>
>>> If someone more proficient notices an obvious fix, I'd be glad to hear
>>> about it.
>> Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that
>> this is a repository that is no longer maintained and was written for old
>> releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer
>> releases.
>>
>> Cheers
>> ---
>> Guillaume

Re: Fixing badly distributed table manually.

Posted by Vincent Barat <vi...@gmail.com>.

Hi,

Sorry for not responding: I'm not on the list very often.

It seems to be of interest for some of you, so we will publish this 
script on GitHub, so that everybody can test and improve it.
More info latter...

Regards,

Le 24/12/12 21:23, anil gupta a écrit :
> Hi Vincent,
>
> I dont know python but i am interested in learning about your solution. It
> would be great If you could also share the logic for balancing the cluster.
>
> Thanks,
> Anil Gupta
>
> On Mon, Dec 24, 2012 at 9:53 AM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>> On Mon, Dec 24, 2012 at 8:27 AM, Ivan Balashov <ib...@gmail.com>
>> wrote:
>>
>>> Vincent Barat <vb...@...> writes:
>>>
>>>> Hi,
>>>>
>>>> Balancing regions between RS is correctly handled by HBase : I mean
>>>> that your RSs always manage the same number of regions (the balancer
>>>> takes care of it).
>>>>
>>>> Unfortunately, balancing all the regions of one particular table
>>>> between the RS of your cluster is not always easy, since HBase (as
>>>> for 0.90.3) when it comes to splitting a region, create the new one
>>>> always on the same RS. This means that if you start with a 1 region
>>>> only table, and then you insert lots of data into it, new regions
>>>> will always be created to the same RS (if you insert is a M/R job,
>>>> you saturate this RS). Eventually, the balancer at a time will
>>>> decide to balance one of these regions to other RS, limiting the
>>>> issue, but it is not controllable.
>>>>
>>>> Here at Capptain, we solved this problem by developing a special
>>>> Python script, based on the HBase shell, allowing to entirely
>>>> balance all the regions of all tables to all RS. It ensure that
>>>> regions of tables are uniformly deployed on all RS of the cluster,
>>>> with a minimum region transitions.
>>>>
>> Is it possible to describe the logic at high level on what you did?
>>
>>>> It is fast, and even if it can trigger a lot of region transitions,
>>>> there is very few impact at runtime and it can be run safely.
>>>>
>>>> If you are interested, just let me know, I can share it.
>>>>
>>>> Regards,
>>>>
>>> Vincent,
>>>
>>> I would much like to see and possibly use the script that you
>>> mentioned. We've just run  into the same issue (after the table
>>> has been truncated it was re-created with only 1 region, and
>>> after data loading and manual splits we ended up having all
>>> regions within the same RS).
>>>
>>> If you could share the script, it will be really appreciated,
>>> I believe not only by me.
>>>
>>> Thanks,
>>> Ivan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>

Re: Fixing badly distributed table manually.

Posted by anil gupta <an...@gmail.com>.

Hi Vincent,

I dont know python but i am interested in learning about your solution. It
would be great If you could also share the logic for balancing the cluster.

Thanks,
Anil Gupta

On Mon, Dec 24, 2012 at 9:53 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> On Mon, Dec 24, 2012 at 8:27 AM, Ivan Balashov <ib...@gmail.com>
> wrote:
>
> >
> > Vincent Barat <vb...@...> writes:
> >
> > >
> > > Hi,
> > >
> > > Balancing regions between RS is correctly handled by HBase : I mean
> > > that your RSs always manage the same number of regions (the balancer
> > > takes care of it).
> > >
> > > Unfortunately, balancing all the regions of one particular table
> > > between the RS of your cluster is not always easy, since HBase (as
> > > for 0.90.3) when it comes to splitting a region, create the new one
> > > always on the same RS. This means that if you start with a 1 region
> > > only table, and then you insert lots of data into it, new regions
> > > will always be created to the same RS (if you insert is a M/R job,
> > > you saturate this RS). Eventually, the balancer at a time will
> > > decide to balance one of these regions to other RS, limiting the
> > > issue, but it is not controllable.
> > >
> > > Here at Capptain, we solved this problem by developing a special
> > > Python script, based on the HBase shell, allowing to entirely
> > > balance all the regions of all tables to all RS. It ensure that
> > > regions of tables are uniformly deployed on all RS of the cluster,
> > > with a minimum region transitions.
> > >
> >
>
> Is it possible to describe the logic at high level on what you did?
>
> > > It is fast, and even if it can trigger a lot of region transitions,
> > > there is very few impact at runtime and it can be run safely.
> > >
> > > If you are interested, just let me know, I can share it.
> > >
> > > Regards,
> > >
> >
> > Vincent,
> >
> > I would much like to see and possibly use the script that you
> > mentioned. We've just run  into the same issue (after the table
> > has been truncated it was re-created with only 1 region, and
> > after data loading and manual splits we ended up having all
> > regions within the same RS).
> >
> > If you could share the script, it will be really appreciated,
> > I believe not only by me.
> >
> > Thanks,
> > Ivan
> >
> >
> >
> >
> >
> >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Fixing badly distributed table manually.

Posted by Mohit Anchlia <mo...@gmail.com>.

On Mon, Dec 24, 2012 at 8:27 AM, Ivan Balashov <ib...@gmail.com> wrote:

>
> Vincent Barat <vb...@...> writes:
>
> >
> > Hi,
> >
> > Balancing regions between RS is correctly handled by HBase : I mean
> > that your RSs always manage the same number of regions (the balancer
> > takes care of it).
> >
> > Unfortunately, balancing all the regions of one particular table
> > between the RS of your cluster is not always easy, since HBase (as
> > for 0.90.3) when it comes to splitting a region, create the new one
> > always on the same RS. This means that if you start with a 1 region
> > only table, and then you insert lots of data into it, new regions
> > will always be created to the same RS (if you insert is a M/R job,
> > you saturate this RS). Eventually, the balancer at a time will
> > decide to balance one of these regions to other RS, limiting the
> > issue, but it is not controllable.
> >
> > Here at Capptain, we solved this problem by developing a special
> > Python script, based on the HBase shell, allowing to entirely
> > balance all the regions of all tables to all RS. It ensure that
> > regions of tables are uniformly deployed on all RS of the cluster,
> > with a minimum region transitions.
> >
>

Is it possible to describe the logic at high level on what you did?

> > It is fast, and even if it can trigger a lot of region transitions,
> > there is very few impact at runtime and it can be run safely.
> >
> > If you are interested, just let me know, I can share it.
> >
> > Regards,
> >
>
> Vincent,
>
> I would much like to see and possibly use the script that you
> mentioned. We've just run  into the same issue (after the table
> has been truncated it was re-created with only 1 region, and
> after data loading and manual splits we ended up having all
> regions within the same RS).
>
> If you could share the script, it will be really appreciated,
> I believe not only by me.
>
> Thanks,
> Ivan
>
>
>
>
>
>
>

Re: Fixing badly distributed table manually.

Posted by Ivan Balashov <ib...@gmail.com>.

Vincent Barat <vb...@...> writes:

> 
> Hi,
> 
> Balancing regions between RS is correctly handled by HBase : I mean 
> that your RSs always manage the same number of regions (the balancer 
> takes care of it).
> 
> Unfortunately, balancing all the regions of one particular table 
> between the RS of your cluster is not always easy, since HBase (as 
> for 0.90.3) when it comes to splitting a region, create the new one 
> always on the same RS. This means that if you start with a 1 region 
> only table, and then you insert lots of data into it, new regions 
> will always be created to the same RS (if you insert is a M/R job, 
> you saturate this RS). Eventually, the balancer at a time will 
> decide to balance one of these regions to other RS, limiting the 
> issue, but it is not controllable.
> 
> Here at Capptain, we solved this problem by developing a special 
> Python script, based on the HBase shell, allowing to entirely 
> balance all the regions of all tables to all RS. It ensure that 
> regions of tables are uniformly deployed on all RS of the cluster, 
> with a minimum region transitions.
> 
> It is fast, and even if it can trigger a lot of region transitions, 
> there is very few impact at runtime and it can be run safely.
> 
> If you are interested, just let me know, I can share it.
> 
> Regards,
> 

Vincent,

I would much like to see and possibly use the script that you 
mentioned. We've just run  into the same issue (after the table 
has been truncated it was re-created with only 1 region, and 
after data loading and manual splits we ended up having all 
regions within the same RS).

If you could share the script, it will be really appreciated, 
I believe not only by me.

Thanks,
Ivan

Re: Fixing badly distributed table manually.

Posted by Vincent Barat <vb...@capptain.com>.

Hi,

Balancing regions between RS is correctly handled by HBase : I mean 
that your RSs always manage the same number of regions (the balancer 
takes care of it).

Unfortunately, balancing all the regions of one particular table 
between the RS of your cluster is not always easy, since HBase (as 
for 0.90.3) when it comes to splitting a region, create the new one 
always on the same RS. This means that if you start with a 1 region 
only table, and then you insert lots of data into it, new regions 
will always be created to the same RS (if you insert is a M/R job, 
you saturate this RS). Eventually, the balancer at a time will 
decide to balance one of these regions to other RS, limiting the 
issue, but it is not controllable.

Here at Capptain, we solved this problem by developing a special 
Python script, based on the HBase shell, allowing to entirely 
balance all the regions of all tables to all RS. It ensure that 
regions of tables are uniformly deployed on all RS of the cluster, 
with a minimum region transitions.

It is fast, and even if it can trigger a lot of region transitions, 
there is very few impact at runtime and it can be run safely.

If you are interested, just let me know, I can share it.

Regards,

Le 04/09/12 23:42, David Koch a écrit :
> Hello,
>
> Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the
> web interface. The port is blocked so I never really got a chance to test
> it. As far as manual re-balancing is concerned I will check the book.
>
> /David
>
>
> On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey <
> guillaume.gardey@mendeley.com> wrote:
>
>> Hello,
>>
>>> a) What is the easiest way to get an overview of how a table is
>> distributed
>>> across regions of a cluster? I guess I could search .META. but I haven't
>>> figured out how to use filters from shell.
>>> b) What constitutes a "badly distributed" table and how can I re-balance
>>> manually?
>>> c) Is b) needed at all? I know that HBase does its balancing
>> automatically
>>> behind the scenes.
>> I have found that
>> http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/  is a good
>> source of information/tools to look at regions balancing in the cluster and
>> investigate it.
>>
>>> As for a) I tried running this script:
>>>
>>> https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb
>>>
>>> like so:
>>>
>>> hbase org.jruby.Main ./list_regions.rb <_my_table>
>>>
>>> but I get
>>>
>>> ArgumentError: wrong number of arguments (1 for 2)
>>>   (root) at ./list_regions.rb:60
>>>
>>> If someone more proficient notices an obvious fix, I'd be glad to hear
>>> about it.
>> Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that
>> this is a repository that is no longer maintained and was written for old
>> releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer
>> releases.
>>
>> Cheers
>> ---
>> Guillaume

-- 
*Vincent Barat*
*CTO
* logo
*Contact info *
vbarat@capptain.com <mailto:vbarat@capptain.com%20>
www.capptain.com <http://www.capptain.com>
Cell: +33 6 15 41 15 18 		*Rennes Office *
Office: +33 2 99 65 69 13
10 rue Jean-Marie Duhamel
35000 Rennes
France 		*Paris Office *
Office: +33 1 84 06 13 85
Fax: +33 9 57 72 20 18
18 rue Tronchet
75008 Paris
France

IMPORTANT NOTICE – UBIKOD and CAPPTAIN are registered trademarks of
UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
email and attachments are confidential and may be subject to legal
privilege and/or protected by copyright. Copying or communicating
any part of it to others is prohibited and may be unlawful. If you
are not the intended recipient you must not use, copy, distribute or
rely on this email and should please return it immediately or notify
us by telephone. At present the integrity of email across the
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
accept liability for any claims arising as a result of the use of
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
S.A.R.L. may exercise any of its rights under relevant law, to
monitor the content of all electronic communications. You should
therefore be aware that this communication and any responses might
have been monitored, and may be accessed by UBIKOD S.A.R.L. The
views expressed in this document are that of the individual and may
not necessarily constitute or imply its endorsement or
recommendation by UBIKOD S.A.R.L. The content of this electronic
mail may be subject to the confidentiality terms of a
"Non-Disclosure Agreement" (NDA).

Re: Fixing badly distributed table manually.

Posted by David Koch <og...@googlemail.com>.

Hello,

I also found this fairly recent script here which can be used with Gnuplot
to get a visual representation of data distribution across nodes:

http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/

Again, my JRuby skills are non-existent so just blindly running the script
versus HBase 0.92 results in a:

NoMethodError: private method `load' called for
#<Java::OrgApacheHadoopHbase::ServerName:0x4f5264db>
    main at region_hist.rb:23
    call at org/jruby/RubyProc.java:270
    call at org/jruby/RubyProc.java:220
    each at
file:/usr/lib/hbase/lib/jruby-complete-1.6.5.jar!/builtin/java/java.util.rb:7
    main at region_hist.rb:19
  (root) at region_hist.rb:37

Maybe it has to do with the author's remark:

[Note: I've been advised (thanks ntelford!) that HServerInfo is gone in
newer releases and you now need to get HServerLoad via
ClusterStatus.getLoad(server_name).]

What are the changes that need to be made to the script to get it to run on
HBase 0.92?

Thank you,

/David


On Tue, Sep 4, 2012 at 11:42 PM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the
> web interface. The port is blocked so I never really got a chance to test
> it. As far as manual re-balancing is concerned I will check the book.
>
> /David
>
>
> On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey <
> guillaume.gardey@mendeley.com> wrote:
>
>> Hello,
>>
>> > a) What is the easiest way to get an overview of how a table is
>> distributed
>> > across regions of a cluster? I guess I could search .META. but I haven't
>> > figured out how to use filters from shell.
>> > b) What constitutes a "badly distributed" table and how can I re-balance
>> > manually?
>> > c) Is b) needed at all? I know that HBase does its balancing
>> automatically
>> > behind the scenes.
>>
>> I have found that
>> http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/  is a good
>> source of information/tools to look at regions balancing in the cluster and
>> investigate it.
>>
>> > As for a) I tried running this script:
>> >
>> > https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb
>> >
>> > like so:
>> >
>> > hbase org.jruby.Main ./list_regions.rb <_my_table>
>> >
>> > but I get
>> >
>> > ArgumentError: wrong number of arguments (1 for 2)
>> >  (root) at ./list_regions.rb:60
>> >
>> > If someone more proficient notices an obvious fix, I'd be glad to hear
>> > about it.
>>
>> Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that
>> this is a repository that is no longer maintained and was written for old
>> releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer
>> releases.
>>
>> Cheers
>> ---
>> Guillaume
>
>
>

Re: Fixing badly distributed table manually.

Posted by David Koch <og...@googlemail.com>.

Hello,

Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the
web interface. The port is blocked so I never really got a chance to test
it. As far as manual re-balancing is concerned I will check the book.

/David


On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey <
guillaume.gardey@mendeley.com> wrote:

> Hello,
>
> > a) What is the easiest way to get an overview of how a table is
> distributed
> > across regions of a cluster? I guess I could search .META. but I haven't
> > figured out how to use filters from shell.
> > b) What constitutes a "badly distributed" table and how can I re-balance
> > manually?
> > c) Is b) needed at all? I know that HBase does its balancing
> automatically
> > behind the scenes.
>
> I have found that
> http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/  is a good
> source of information/tools to look at regions balancing in the cluster and
> investigate it.
>
> > As for a) I tried running this script:
> >
> > https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb
> >
> > like so:
> >
> > hbase org.jruby.Main ./list_regions.rb <_my_table>
> >
> > but I get
> >
> > ArgumentError: wrong number of arguments (1 for 2)
> >  (root) at ./list_regions.rb:60
> >
> > If someone more proficient notices an obvious fix, I'd be glad to hear
> > about it.
>
> Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that
> this is a repository that is no longer maintained and was written for old
> releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer
> releases.
>
> Cheers
> ---
> Guillaume

Re: Fixing badly distributed table manually.

Posted by Guillaume Gardey <gu...@mendeley.com>.

Hello,

> a) What is the easiest way to get an overview of how a table is distributed
> across regions of a cluster? I guess I could search .META. but I haven't
> figured out how to use filters from shell.
> b) What constitutes a "badly distributed" table and how can I re-balance
> manually?
> c) Is b) needed at all? I know that HBase does its balancing automatically
> behind the scenes.

I have found that http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/  is a good source of information/tools to look at regions balancing in the cluster and investigate it.

> As for a) I tried running this script:
> 
> https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb
> 
> like so:
> 
> hbase org.jruby.Main ./list_regions.rb <_my_table>
> 
> but I get
> 
> ArgumentError: wrong number of arguments (1 for 2)
>  (root) at ./list_regions.rb:60
> 
> If someone more proficient notices an obvious fix, I'd be glad to hear
> about it.

Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that this is a repository that is no longer maintained and was written for old releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer releases.

Cheers
---
Guillaume