You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Daniel Ploeg <dp...@gmail.com> on 2008/10/15 23:48:11 UTC

HBase and hadoop cluster rebalance

Hi all,

I performed a cluster rebalance on my test cluster yesterday (5 regionserver
/ datanodes each with approx 400GB - total approx 2TB HDFS) and I would like
to know if the mailing lists have seen similar results to what I've seen.

I had a single table with a single column family and loaded it up so that it
just about filled the entire cluster. Actually one or two of the nodes had
run out of space, yet the fifth machine only had 50% of its disks utilised
(which is why I though a rebalance was in order). There are a total of 1475
regions in the cluster. Prior to starting the rebalance the cluster only had
about 250GB left to it's disposal. After the rebalance I now have almost
800GB free.

Furthermore, I was performing read tests prior to the rebalance and getting
a response time of approx 500ms per row (each row has 10000 column instances
of the column family which were deserialised as part of the test). After the
rebalance my read times reduced to around 340ms.

Has anybody experienced something like this, or can anyone explain why I
would see such a benefit? Does anybody regularly run a cluster rebalance on
the hadoop cluster running hbase?

Thanks,
Daniel

Re: HBase and hadoop cluster rebalance

Posted by Slava Gorelik <sl...@gmail.com>.

No :-)
My question is :
I defined Hadoop cluster with 7 datanodes and one namenode.
The cluster capacity (from the Hadoop web admin page) is about 700GB. From
this i understand that default usage for datanode disk space is 100GB /
datanode. Please correct me if i wrong.

Best Regards.

On Fri, Oct 17, 2008 at 1:03 AM, stack <st...@duboce.net> wrote:

> Are you asking about the below Slava?
>
> <property>
>  <name>dfs.block.size</name>
>  <value>67108864</value>
>  <description>The default block size for new files.</description>
> </property>
>
> I do not know of a 100GB configuration in hadoop/hbase?
>
> If so, if configuring for hbase, you need to add the configuration to
> hbase-site.xml or add under your hbase conf an hadoop-site.xml with
> appropriate setting.  See http://wiki.apache.org/hadoop/Hbase/FAQ#12 for
> some discussion.
>
> St.Ack
>
>
>
> Slava Gorelik wrote:
>
>> Hi.Small question, little bit off topic.
>> How can i change the default 100GB datanode size to be something else ?
>>
>> Best Regards.
>>
>> On Thu, Oct 16, 2008 at 10:41 PM, stack <st...@duboce.net> wrote:
>>
>>
>>
>>> Daniel Ploeg wrote:
>>>
>>>
>>>
>>>> Hi all,
>>>>
>>>> I performed a cluster rebalance on my test cluster yesterday (5
>>>> regionserver
>>>> / datanodes each with approx 400GB - total approx 2TB HDFS) and I would
>>>> like
>>>> to know if the mailing lists have seen similar results to what I've
>>>> seen.
>>>>
>>>>
>>>>
>>>>
>>> I talked to the lads running hbase here at powerset.  They believe they
>>> have seen something similar when they grow the cluster by some
>>> significant
>>> percentage (20-30%).  The addition of new machines brings on a
>>> rebalancing
>>> and thereafter hbase runs "faster".
>>>
>>>  I had a single table with a single column family and loaded it up so
>>> that
>>>
>>>
>>>> it
>>>> just about filled the entire cluster. Actually one or two of the nodes
>>>> had
>>>> run out of space, yet the fifth machine only had 50% of its disks
>>>> utilised
>>>> (which is why I though a rebalance was in order). There are a total of
>>>> 1475
>>>> regions in the cluster. Prior to starting the rebalance the cluster only
>>>> had
>>>> about 250GB left to it's disposal. After the rebalance I now have almost
>>>> 800GB free.
>>>>
>>>>
>>>>
>>>>
>>> If 1475 regions, update to 0.18.1 (coming soon).
>>>
>>>  Furthermore, I was performing read tests prior to the rebalance and
>>>
>>>
>>>> getting
>>>> a response time of approx 500ms per row (each row has 10000 column
>>>> instances
>>>> of the column family which were deserialised as part of the test). After
>>>> the
>>>> rebalance my read times reduced to around 340ms.
>>>>
>>>>
>>>>
>>>>
>>>>
>>> If you could have fewer columns in a family column, you'll get a bit
>>> better
>>> performance: HBASE-867.
>>>
>>> Good on you Daniel,
>>> St.Ack
>>>
>>>
>>>
>>
>>
>>
>
>

Re: HBase and hadoop cluster rebalance

Posted by stack <st...@duboce.net>.

Are you asking about the below Slava?

<property>
  <name>dfs.block.size</name>
  <value>67108864</value>
  <description>The default block size for new files.</description>
</property>

I do not know of a 100GB configuration in hadoop/hbase?

If so, if configuring for hbase, you need to add the configuration to 
hbase-site.xml or add under your hbase conf an hadoop-site.xml with 
appropriate setting.  See http://wiki.apache.org/hadoop/Hbase/FAQ#12 for 
some discussion.

St.Ack


Slava Gorelik wrote:
> Hi.Small question, little bit off topic.
> How can i change the default 100GB datanode size to be something else ?
>
> Best Regards.
>
> On Thu, Oct 16, 2008 at 10:41 PM, stack <st...@duboce.net> wrote:
>
>   
>> Daniel Ploeg wrote:
>>
>>     
>>> Hi all,
>>>
>>> I performed a cluster rebalance on my test cluster yesterday (5
>>> regionserver
>>> / datanodes each with approx 400GB - total approx 2TB HDFS) and I would
>>> like
>>> to know if the mailing lists have seen similar results to what I've seen.
>>>
>>>
>>>       
>> I talked to the lads running hbase here at powerset.  They believe they
>> have seen something similar when they grow the cluster by some significant
>> percentage (20-30%).  The addition of new machines brings on a rebalancing
>> and thereafter hbase runs "faster".
>>
>>  I had a single table with a single column family and loaded it up so that
>>     
>>> it
>>> just about filled the entire cluster. Actually one or two of the nodes had
>>> run out of space, yet the fifth machine only had 50% of its disks utilised
>>> (which is why I though a rebalance was in order). There are a total of
>>> 1475
>>> regions in the cluster. Prior to starting the rebalance the cluster only
>>> had
>>> about 250GB left to it's disposal. After the rebalance I now have almost
>>> 800GB free.
>>>
>>>
>>>       
>> If 1475 regions, update to 0.18.1 (coming soon).
>>
>>  Furthermore, I was performing read tests prior to the rebalance and
>>     
>>> getting
>>> a response time of approx 500ms per row (each row has 10000 column
>>> instances
>>> of the column family which were deserialised as part of the test). After
>>> the
>>> rebalance my read times reduced to around 340ms.
>>>
>>>
>>>
>>>       
>> If you could have fewer columns in a family column, you'll get a bit better
>> performance: HBASE-867.
>>
>> Good on you Daniel,
>> St.Ack
>>
>>     
>
>

Re: HBase and hadoop cluster rebalance

Posted by Slava Gorelik <sl...@gmail.com>.

Hi.Small question, little bit off topic.
How can i change the default 100GB datanode size to be something else ?

Best Regards.

On Thu, Oct 16, 2008 at 10:41 PM, stack <st...@duboce.net> wrote:

> Daniel Ploeg wrote:
>
>> Hi all,
>>
>> I performed a cluster rebalance on my test cluster yesterday (5
>> regionserver
>> / datanodes each with approx 400GB - total approx 2TB HDFS) and I would
>> like
>> to know if the mailing lists have seen similar results to what I've seen.
>>
>>
>
> I talked to the lads running hbase here at powerset.  They believe they
> have seen something similar when they grow the cluster by some significant
> percentage (20-30%).  The addition of new machines brings on a rebalancing
> and thereafter hbase runs "faster".
>
>  I had a single table with a single column family and loaded it up so that
>> it
>> just about filled the entire cluster. Actually one or two of the nodes had
>> run out of space, yet the fifth machine only had 50% of its disks utilised
>> (which is why I though a rebalance was in order). There are a total of
>> 1475
>> regions in the cluster. Prior to starting the rebalance the cluster only
>> had
>> about 250GB left to it's disposal. After the rebalance I now have almost
>> 800GB free.
>>
>>
>
> If 1475 regions, update to 0.18.1 (coming soon).
>
>  Furthermore, I was performing read tests prior to the rebalance and
>> getting
>> a response time of approx 500ms per row (each row has 10000 column
>> instances
>> of the column family which were deserialised as part of the test). After
>> the
>> rebalance my read times reduced to around 340ms.
>>
>>
>>
> If you could have fewer columns in a family column, you'll get a bit better
> performance: HBASE-867.
>
> Good on you Daniel,
> St.Ack
>

Re: HBase and hadoop cluster rebalance

Posted by stack <st...@duboce.net>.

Daniel Ploeg wrote:
> Hi all,
>
> I performed a cluster rebalance on my test cluster yesterday (5 regionserver
> / datanodes each with approx 400GB - total approx 2TB HDFS) and I would like
> to know if the mailing lists have seen similar results to what I've seen.
>   

I talked to the lads running hbase here at powerset.  They believe they 
have seen something similar when they grow the cluster by some 
significant percentage (20-30%).  The addition of new machines brings on 
a rebalancing and thereafter hbase runs "faster".

> I had a single table with a single column family and loaded it up so that it
> just about filled the entire cluster. Actually one or two of the nodes had
> run out of space, yet the fifth machine only had 50% of its disks utilised
> (which is why I though a rebalance was in order). There are a total of 1475
> regions in the cluster. Prior to starting the rebalance the cluster only had
> about 250GB left to it's disposal. After the rebalance I now have almost
> 800GB free.
>   

If 1475 regions, update to 0.18.1 (coming soon).

> Furthermore, I was performing read tests prior to the rebalance and getting
> a response time of approx 500ms per row (each row has 10000 column instances
> of the column family which were deserialised as part of the test). After the
> rebalance my read times reduced to around 340ms.
>
>   
If you could have fewer columns in a family column, you'll get a bit 
better performance: HBASE-867.

Good on you Daniel,
St.Ack