You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by David Koch <og...@googlemail.com> on 2012/11/03 16:12:54 UTC

HBase scan performance decreases over time.

Hello,

Every now and then we need to flatten our cluster and re-import all data
from log files (changes in data format, etc.) Afterwards we notice a
significant increase in scan performance. As data is added and shuffled
around between region servers, performance goes down again over time (say a
couple of weeks). Are there any routine operations that one should run
manually, or settings to activate in the HBase configuration to keep the
data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.

Thank you,

/David

Re: HBase scan performance decreases over time.

Posted by Michael Segel <mi...@hotmail.com>.

hdfs-site.xml

Its an HDFS setting that may impact the balancing of HBase as well. 
(I'm sure someone can give a better response by looking at the code. ) 


On Nov 5, 2012, at 12:14 PM, Asaf Mesika <as...@gmail.com> wrote:

> Where is this settings located?
> 
> Sent from my iPhone
> 
> On 5 בנוב 2012, at 15:05, Michael Segel <mi...@hotmail.com> wrote:
> 
>> There's an HDFS bandwidth setting which is set to 10MB/s.
>> 
>> Way too low for even 1GBe.
>> 
>> Have you modified this setting yet?
>> 
>> -Mike
>> 
>> On Nov 3, 2012, at 2:50 PM, David Koch <og...@googlemail.com> wrote:
>> 
>>> Hello Ted,
>>> 
>>> We never initiate major compaction manually. I have not looked at I/O
>>> balance between nodes in detail. We have noticed that after running for a
>>> couple of weeks HBase seems to spend hours pushing blocks between nodes in
>>> order to optimize things. We add data daily in one ~30gb push to several
>>> tables. Sometimes nodes get added to the running system.
>>> 
>>> Where can I get more information on how to carry out performance related
>>> HBase administrative tasks?
>>> 
>>> Thank you,
>>> 
>>> /David
>>> 
>>> 
>>> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <yu...@gmail.com> wrote:
>>> 
>>>> Can you tell us how often you run major compaction after the import ?
>>>> Have you noticed imbalanced read / write requests in the cluster ? Meaning
>>>> subset of region servers receive bulk of the writes.
>>>> 
>>>> We do some manual movement of regions when the above happens.
>>>> 
>>>> Cheers
>>>> 
>>>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <og...@googlemail.com> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> Every now and then we need to flatten our cluster and re-import all data
>>>>> from log files (changes in data format, etc.) Afterwards we notice a
>>>>> significant increase in scan performance. As data is added and shuffled
>>>>> around between region servers, performance goes down again over time
>>>> (say a
>>>>> couple of weeks). Are there any routine operations that one should run
>>>>> manually, or settings to activate in the HBase configuration to keep the
>>>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
>>>>> 
>>>>> Thank you,
>>>>> 
>>>>> /David
>> 
>

Re: HBase scan performance decreases over time.

Posted by Leonid Fedotov <lf...@hortonworks.com>.

There is property dfs.balance.bandwidthPerSec in hdfs-site.xml


 <property>
    <name>dfs.balance.bandwidthPerSec</name>
    <value>6250000</value>
    <description>
        Specifies the maximum amount of bandwidth that each datanode
        can utilize for the balancing purpose in term of
        the number of bytes per second.
  </description>
  </property>


Thank you!

Sincerely,
Leonid Fedotov


On Nov 5, 2012, at 10:14 AM, Asaf Mesika wrote:

> Where is this settings located?
> 
> Sent from my iPhone
> 
> On 5 בנוב 2012, at 15:05, Michael Segel <mi...@hotmail.com> wrote:
> 
>> There's an HDFS bandwidth setting which is set to 10MB/s.
>> 
>> Way too low for even 1GBe.
>> 
>> Have you modified this setting yet?
>> 
>> -Mike
>> 
>> On Nov 3, 2012, at 2:50 PM, David Koch <og...@googlemail.com> wrote:
>> 
>>> Hello Ted,
>>> 
>>> We never initiate major compaction manually. I have not looked at I/O
>>> balance between nodes in detail. We have noticed that after running for a
>>> couple of weeks HBase seems to spend hours pushing blocks between nodes in
>>> order to optimize things. We add data daily in one ~30gb push to several
>>> tables. Sometimes nodes get added to the running system.
>>> 
>>> Where can I get more information on how to carry out performance related
>>> HBase administrative tasks?
>>> 
>>> Thank you,
>>> 
>>> /David
>>> 
>>> 
>>> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <yu...@gmail.com> wrote:
>>> 
>>>> Can you tell us how often you run major compaction after the import ?
>>>> Have you noticed imbalanced read / write requests in the cluster ? Meaning
>>>> subset of region servers receive bulk of the writes.
>>>> 
>>>> We do some manual movement of regions when the above happens.
>>>> 
>>>> Cheers
>>>> 
>>>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <og...@googlemail.com> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> Every now and then we need to flatten our cluster and re-import all data
>>>>> from log files (changes in data format, etc.) Afterwards we notice a
>>>>> significant increase in scan performance. As data is added and shuffled
>>>>> around between region servers, performance goes down again over time
>>>> (say a
>>>>> couple of weeks). Are there any routine operations that one should run
>>>>> manually, or settings to activate in the HBase configuration to keep the
>>>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
>>>>> 
>>>>> Thank you,
>>>>> 
>>>>> /David
>>

Re: HBase scan performance decreases over time.

Posted by Asaf Mesika <as...@gmail.com>.

Where is this settings located?

Sent from my iPhone

On 5 בנוב 2012, at 15:05, Michael Segel <mi...@hotmail.com> wrote:

> There's an HDFS bandwidth setting which is set to 10MB/s.
>
> Way too low for even 1GBe.
>
> Have you modified this setting yet?
>
> -Mike
>
> On Nov 3, 2012, at 2:50 PM, David Koch <og...@googlemail.com> wrote:
>
>> Hello Ted,
>>
>> We never initiate major compaction manually. I have not looked at I/O
>> balance between nodes in detail. We have noticed that after running for a
>> couple of weeks HBase seems to spend hours pushing blocks between nodes in
>> order to optimize things. We add data daily in one ~30gb push to several
>> tables. Sometimes nodes get added to the running system.
>>
>> Where can I get more information on how to carry out performance related
>> HBase administrative tasks?
>>
>> Thank you,
>>
>> /David
>>
>>
>> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Can you tell us how often you run major compaction after the import ?
>>> Have you noticed imbalanced read / write requests in the cluster ? Meaning
>>> subset of region servers receive bulk of the writes.
>>>
>>> We do some manual movement of regions when the above happens.
>>>
>>> Cheers
>>>
>>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <og...@googlemail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Every now and then we need to flatten our cluster and re-import all data
>>>> from log files (changes in data format, etc.) Afterwards we notice a
>>>> significant increase in scan performance. As data is added and shuffled
>>>> around between region servers, performance goes down again over time
>>> (say a
>>>> couple of weeks). Are there any routine operations that one should run
>>>> manually, or settings to activate in the HBase configuration to keep the
>>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
>>>>
>>>> Thank you,
>>>>
>>>> /David
>

Re: HBase scan performance decreases over time.

Posted by Michael Segel <mi...@hotmail.com>.

There's an HDFS bandwidth setting which is set to 10MB/s. 

Way too low for even 1GBe.

Have you modified this setting yet? 

-Mike

On Nov 3, 2012, at 2:50 PM, David Koch <og...@googlemail.com> wrote:

> Hello Ted,
> 
> We never initiate major compaction manually. I have not looked at I/O
> balance between nodes in detail. We have noticed that after running for a
> couple of weeks HBase seems to spend hours pushing blocks between nodes in
> order to optimize things. We add data daily in one ~30gb push to several
> tables. Sometimes nodes get added to the running system.
> 
> Where can I get more information on how to carry out performance related
> HBase administrative tasks?
> 
> Thank you,
> 
> /David
> 
> 
> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <yu...@gmail.com> wrote:
> 
>> Can you tell us how often you run major compaction after the import ?
>> Have you noticed imbalanced read / write requests in the cluster ? Meaning
>> subset of region servers receive bulk of the writes.
>> 
>> We do some manual movement of regions when the above happens.
>> 
>> Cheers
>> 
>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <og...@googlemail.com> wrote:
>> 
>>> Hello,
>>> 
>>> Every now and then we need to flatten our cluster and re-import all data
>>> from log files (changes in data format, etc.) Afterwards we notice a
>>> significant increase in scan performance. As data is added and shuffled
>>> around between region servers, performance goes down again over time
>> (say a
>>> couple of weeks). Are there any routine operations that one should run
>>> manually, or settings to activate in the HBase configuration to keep the
>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
>>> 
>>> Thank you,
>>> 
>>> /David
>>> 
>>

Re: HBase scan performance decreases over time.

Posted by Ted Yu <yu...@gmail.com>.

Have you looked at http://hbase.apache.org/book.html#performance ?

Thanks

On Sat, Nov 3, 2012 at 12:50 PM, David Koch <og...@googlemail.com> wrote:

> Hello Ted,
>
> We never initiate major compaction manually. I have not looked at I/O
> balance between nodes in detail. We have noticed that after running for a
> couple of weeks HBase seems to spend hours pushing blocks between nodes in
> order to optimize things. We add data daily in one ~30gb push to several
> tables. Sometimes nodes get added to the running system.
>
> Where can I get more information on how to carry out performance related
> HBase administrative tasks?
>
> Thank you,
>
> /David
>
>
> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Can you tell us how often you run major compaction after the import ?
> > Have you noticed imbalanced read / write requests in the cluster ?
> Meaning
> > subset of region servers receive bulk of the writes.
> >
> > We do some manual movement of regions when the above happens.
> >
> > Cheers
> >
> > On Sat, Nov 3, 2012 at 8:12 AM, David Koch <og...@googlemail.com>
> wrote:
> >
> > > Hello,
> > >
> > > Every now and then we need to flatten our cluster and re-import all
> data
> > > from log files (changes in data format, etc.) Afterwards we notice a
> > > significant increase in scan performance. As data is added and shuffled
> > > around between region servers, performance goes down again over time
> > (say a
> > > couple of weeks). Are there any routine operations that one should run
> > > manually, or settings to activate in the HBase configuration to keep
> the
> > > data well distributed? We use HBase 0.92 as part of a Cloudera4
> cluster.
> > >
> > > Thank you,
> > >
> > > /David
> > >
> >
>

Re: HBase scan performance decreases over time.

Posted by David Koch <og...@googlemail.com>.

Hello Ted,

We never initiate major compaction manually. I have not looked at I/O
balance between nodes in detail. We have noticed that after running for a
couple of weeks HBase seems to spend hours pushing blocks between nodes in
order to optimize things. We add data daily in one ~30gb push to several
tables. Sometimes nodes get added to the running system.

Where can I get more information on how to carry out performance related
HBase administrative tasks?

Thank you,

/David

On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you tell us how often you run major compaction after the import ?
> Have you noticed imbalanced read / write requests in the cluster ? Meaning
> subset of region servers receive bulk of the writes.
>
> We do some manual movement of regions when the above happens.
>
> Cheers
>
> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <og...@googlemail.com> wrote:
>
> > Hello,
> >
> > Every now and then we need to flatten our cluster and re-import all data
> > from log files (changes in data format, etc.) Afterwards we notice a
> > significant increase in scan performance. As data is added and shuffled
> > around between region servers, performance goes down again over time
> (say a
> > couple of weeks). Are there any routine operations that one should run
> > manually, or settings to activate in the HBase configuration to keep the
> > data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
> >
> > Thank you,
> >
> > /David
> >
>

Re: HBase scan performance decreases over time.

Posted by Ted Yu <yu...@gmail.com>.

Can you tell us how often you run major compaction after the import ?
Have you noticed imbalanced read / write requests in the cluster ? Meaning
subset of region servers receive bulk of the writes.

We do some manual movement of regions when the above happens.

Cheers

On Sat, Nov 3, 2012 at 8:12 AM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> Every now and then we need to flatten our cluster and re-import all data
> from log files (changes in data format, etc.) Afterwards we notice a
> significant increase in scan performance. As data is added and shuffled
> around between region servers, performance goes down again over time (say a
> couple of weeks). Are there any routine operations that one should run
> manually, or settings to activate in the HBase configuration to keep the
> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
>
> Thank you,
>
> /David
>