You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jonathan Bishop <jb...@gmail.com> on 2012/09/26 08:04:29 UTC

Tuning HBase for random reads

Hi,

I am running hbase-0.92.1 and have set up a cluster of 10 machines. Scans
performance seems great, 30K-100K rows per second, but random row reads are
only about 100 rows/second.

My rows are not very big, just a few columns with between 4-100 bytes, but
my table is around 18M rows.

I am pre-splitting my table and using hashing to randomize the row keys, so
I see a nice even load on the region servers.

Any suggestion on things I should try?

Thanks,

Jon

Re: Tuning HBase for random reads

Posted by Stack <st...@duboce.net>.

On Wed, Sep 26, 2012 at 9:05 AM, Jonathan Bishop <jb...@gmail.com> wrote:
> I am using block size in HDFS of 64MB - the default I believe. I'll try
> something smaller, say 16MB or even 4MB.
>
> I'll also give bloom filters a try, but I don't believe that will help
> because I have so few columns. Isn't bloom filtering for quick reject for
> large number of columns in a row?
>
> Thanks for the suggestions everyone.
>

You've had a look at this section of the refguide:
http://hbase.apache.org/book.html#performance ?
St.Ack

Re: Tuning HBase for random reads

Posted by Kevin O'dell <ke...@cloudera.com>.

Jonathan,

 hbase(main):002:0> describe 'states'
DESCRIPTION
                                             ENABLED

 {NAME => 'states', FAMILIES => [{NAME => 'cf', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRE true

 SSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 't

 rue'}]}

BLOCKSIZE => '65536' <--- This is the block size I am referring to.

I would recommend checking page 330 of HBase: The Definitive Guide for
further tuning of that value.

On Wed, Sep 26, 2012 at 3:28 PM, Stack <st...@duboce.net> wrote:

> On Wed, Sep 26, 2012 at 12:01 PM, Jonathan Bishop <jb...@gmail.com>
> wrote:
> > Kevin,
> >
> > So, setting HBase block size is which configuration?
> >
> > Just tried the hadoop shortcircuit option and I see it does improve the
> > performance, perhaps twice as fast, although it is hard to tell whether
> > this was due to some other load on the network/machines changing.
> >
>
> You don't have ganglia or opentsdb or SPM setup on your cluster?  If
> you do, does a study of these cluster graphs not enlighten?
> St.Ack
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Tuning HBase for random reads

Posted by Stack <st...@duboce.net>.

On Wed, Sep 26, 2012 at 12:01 PM, Jonathan Bishop <jb...@gmail.com> wrote:
> Kevin,
>
> So, setting HBase block size is which configuration?
>
> Just tried the hadoop shortcircuit option and I see it does improve the
> performance, perhaps twice as fast, although it is hard to tell whether
> this was due to some other load on the network/machines changing.
>

You don't have ganglia or opentsdb or SPM setup on your cluster?  If
you do, does a study of these cluster graphs not enlighten?
St.Ack

Re: Tuning HBase for random reads

Posted by Jonathan Bishop <jb...@gmail.com>.

Kevin,

So, setting HBase block size is which configuration?

Just tried the hadoop shortcircuit option and I see it does improve the
performance, perhaps twice as fast, although it is hard to tell whether
this was due to some other load on the network/machines changing.

Jon

Re: Tuning HBase for random reads

Posted by Kevin O'dell <ke...@cloudera.com>.

Jon,

  I am referring to you HBase block size, NOT your HDFS block size.   You
will want to leave that at 64MB or maybe even use 128MB if this is an all
HBase cluster.  Can you do a describe on the table in question and post
that here.

On Wed, Sep 26, 2012 at 12:05 PM, Jonathan Bishop <jb...@gmail.com>wrote:

> I am using block size in HDFS of 64MB - the default I believe. I'll try
> something smaller, say 16MB or even 4MB.
>
> I'll also give bloom filters a try, but I don't believe that will help
> because I have so few columns. Isn't bloom filtering for quick reject for
> large number of columns in a row?
>
> Thanks for the suggestions everyone.
>
> Jon
>
> On Wed, Sep 26, 2012 at 6:06 AM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
>
> > What is your block size you are using?  Typically a smaller block size
> can
> > help with random reads, but will have a longer create time.\
> > -Kevin
> >
> > On Wed, Sep 26, 2012 at 2:18 AM, Anoop Sam John <an...@huawei.com>
> > wrote:
> >
> > > Can you try with bloom filters? This can help in get()
> > > -Anoop-
> > > ________________________________________
> > > From: Jonathan Bishop [jbishop.rwc@gmail.com]
> > > Sent: Wednesday, September 26, 2012 11:34 AM
> > > To: user@hbase.apache.org
> > > Subject: Tuning HBase for random reads
> > >
> > > Hi,
> > >
> > > I am running hbase-0.92.1 and have set up a cluster of 10 machines.
> Scans
> > > performance seems great, 30K-100K rows per second, but random row reads
> > are
> > > only about 100 rows/second.
> > >
> > > My rows are not very big, just a few columns with between 4-100 bytes,
> > but
> > > my table is around 18M rows.
> > >
> > > I am pre-splitting my table and using hashing to randomize the row
> keys,
> > so
> > > I see a nice even load on the region servers.
> > >
> > > Any suggestion on things I should try?
> > >
> > > Thanks,
> > >
> > > Jon
> > >
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
> >
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Tuning HBase for random reads

Posted by Paul Mackles <pm...@adobe.com>.

Though I haven't personally tried it yet, I have been told that the
enabling the shortcut for local-client reads is very effective at speeding
up random reads in hbase. More here:

https://issues.apache.org/jira/browse/HDFS-2246

We are using the cloudera package which includes this patch in versions
greater than cdh3u3+.

Paul

On 9/26/12 12:05 PM, "Jonathan Bishop" <jb...@gmail.com> wrote:

>I am using block size in HDFS of 64MB - the default I believe. I'll try
>something smaller, say 16MB or even 4MB.
>
>I'll also give bloom filters a try, but I don't believe that will help
>because I have so few columns. Isn't bloom filtering for quick reject for
>large number of columns in a row?
>
>Thanks for the suggestions everyone.
>
>Jon
>
>On Wed, Sep 26, 2012 at 6:06 AM, Kevin O'dell
><ke...@cloudera.com>wrote:
>
>> What is your block size you are using?  Typically a smaller block size
>>can
>> help with random reads, but will have a longer create time.\
>> -Kevin
>>
>> On Wed, Sep 26, 2012 at 2:18 AM, Anoop Sam John <an...@huawei.com>
>> wrote:
>>
>> > Can you try with bloom filters? This can help in get()
>> > -Anoop-
>> > ________________________________________
>> > From: Jonathan Bishop [jbishop.rwc@gmail.com]
>> > Sent: Wednesday, September 26, 2012 11:34 AM
>> > To: user@hbase.apache.org
>> > Subject: Tuning HBase for random reads
>> >
>> > Hi,
>> >
>> > I am running hbase-0.92.1 and have set up a cluster of 10 machines.
>>Scans
>> > performance seems great, 30K-100K rows per second, but random row
>>reads
>> are
>> > only about 100 rows/second.
>> >
>> > My rows are not very big, just a few columns with between 4-100 bytes,
>> but
>> > my table is around 18M rows.
>> >
>> > I am pre-splitting my table and using hashing to randomize the row
>>keys,
>> so
>> > I see a nice even load on the region servers.
>> >
>> > Any suggestion on things I should try?
>> >
>> > Thanks,
>> >
>> > Jon
>> >
>>
>>
>>
>> --
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera
>>

Re: Tuning HBase for random reads

Posted by Jonathan Bishop <jb...@gmail.com>.

I am using block size in HDFS of 64MB - the default I believe. I'll try
something smaller, say 16MB or even 4MB.

I'll also give bloom filters a try, but I don't believe that will help
because I have so few columns. Isn't bloom filtering for quick reject for
large number of columns in a row?

Thanks for the suggestions everyone.

Jon

On Wed, Sep 26, 2012 at 6:06 AM, Kevin O'dell <ke...@cloudera.com>wrote:

> What is your block size you are using?  Typically a smaller block size can
> help with random reads, but will have a longer create time.\
> -Kevin
>
> On Wed, Sep 26, 2012 at 2:18 AM, Anoop Sam John <an...@huawei.com>
> wrote:
>
> > Can you try with bloom filters? This can help in get()
> > -Anoop-
> > ________________________________________
> > From: Jonathan Bishop [jbishop.rwc@gmail.com]
> > Sent: Wednesday, September 26, 2012 11:34 AM
> > To: user@hbase.apache.org
> > Subject: Tuning HBase for random reads
> >
> > Hi,
> >
> > I am running hbase-0.92.1 and have set up a cluster of 10 machines. Scans
> > performance seems great, 30K-100K rows per second, but random row reads
> are
> > only about 100 rows/second.
> >
> > My rows are not very big, just a few columns with between 4-100 bytes,
> but
> > my table is around 18M rows.
> >
> > I am pre-splitting my table and using hashing to randomize the row keys,
> so
> > I see a nice even load on the region servers.
> >
> > Any suggestion on things I should try?
> >
> > Thanks,
> >
> > Jon
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>

Re: Tuning HBase for random reads

Posted by Kevin O'dell <ke...@cloudera.com>.

What is your block size you are using?  Typically a smaller block size can
help with random reads, but will have a longer create time.\
-Kevin

On Wed, Sep 26, 2012 at 2:18 AM, Anoop Sam John <an...@huawei.com> wrote:

> Can you try with bloom filters? This can help in get()
> -Anoop-
> ________________________________________
> From: Jonathan Bishop [jbishop.rwc@gmail.com]
> Sent: Wednesday, September 26, 2012 11:34 AM
> To: user@hbase.apache.org
> Subject: Tuning HBase for random reads
>
> Hi,
>
> I am running hbase-0.92.1 and have set up a cluster of 10 machines. Scans
> performance seems great, 30K-100K rows per second, but random row reads are
> only about 100 rows/second.
>
> My rows are not very big, just a few columns with between 4-100 bytes, but
> my table is around 18M rows.
>
> I am pre-splitting my table and using hashing to randomize the row keys, so
> I see a nice even load on the region servers.
>
> Any suggestion on things I should try?
>
> Thanks,
>
> Jon
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

RE: Tuning HBase for random reads

Posted by Anoop Sam John <an...@huawei.com>.

Can you try with bloom filters? This can help in get()
-Anoop-
________________________________________
From: Jonathan Bishop [jbishop.rwc@gmail.com]
Sent: Wednesday, September 26, 2012 11:34 AM
To: user@hbase.apache.org
Subject: Tuning HBase for random reads

Hi,

I am running hbase-0.92.1 and have set up a cluster of 10 machines. Scans
performance seems great, 30K-100K rows per second, but random row reads are
only about 100 rows/second.

My rows are not very big, just a few columns with between 4-100 bytes, but
my table is around 18M rows.

I am pre-splitting my table and using hashing to randomize the row keys, so
I see a nice even load on the region servers.

Any suggestion on things I should try?

Thanks,

Jon