You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Karthik Pattabiraman <pk...@yahoo-inc.com> on 2008/04/22 16:30:42 UTC

HBase Performance question

Hi,

    I am evaluating HBase for a serving system. The requirements are 
fairly simple. Each record comprises a key and a value (size ~4k).

    I set up a small cluster consisting of two boxes and the number of 
records inserted into the table is close to 65K.

   Now I ran a tomcat server on one of the boxes (where the master is 
running). The tomcat server establishes a connection to hbase at start 
and then on each request queries the hbase for the record.

    The benchmarks were not good. I ran the benchmarks for 30 min with 
20 clients (talking to tomcat) and the average response time was 51 ms. 
When I increased the number of clients to 50, the average response time 
increased to 110 ms. To ensure that tomcat is not the bottleneck, I 
logged the time taken for each Hbase request and found that to be 
correlating the benchmarks (time increased when i increased the number 
of clients.).

    Any idea as to why this would happen given that the number of 
records is not huge? (FYI: Both the boxes act as region servers, the box 
where tomcat runs, also runs hbase master and the dfs namenode)

    Could it be due to the way I have set up Hbase?

    Any help would be appreciated.

thanks

karthik

RE: HBase Performance question

Posted by Jim Kellerman <ji...@powerset.com>.
> -----Original Message-----
> From: Karthik Pattabiraman [mailto:pkarthik@yahoo-inc.com]
> Sent: Tuesday, April 22, 2008 7:31 AM
> To: hbase-dev@hadoop.apache.org
> Subject: HBase Performance question
>
> Hi,
>
>     I am evaluating HBase for a serving system. The
> requirements are fairly simple. Each record comprises a key
> and a value (size ~4k).
>
>     I set up a small cluster consisting of two boxes and the
> number of records inserted into the table is close to 65K.
>
>    Now I ran a tomcat server on one of the boxes (where the
> master is running). The tomcat server establishes a
> connection to hbase at start and then on each request queries
> the hbase for the record.
>
>     The benchmarks were not good. I ran the benchmarks for 30
> min with 20 clients (talking to tomcat) and the average
> response time was 51 ms.
> When I increased the number of clients to 50, the average
> response time increased to 110 ms. To ensure that tomcat is
> not the bottleneck, I logged the time taken for each Hbase
> request and found that to be correlating the benchmarks (time
> increased when i increased the number of clients.).
>
>     Any idea as to why this would happen given that the
> number of records is not huge? (FYI: Both the boxes act as
> region servers, the box where tomcat runs, also runs hbase
> master and the dfs namenode)

If you are doing single record reads (either sequential or random)
HBase currently does not perform that well. (see
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation , this
page has not been updated recently - as you can see, the numbers
are improving but there is a long way to go). Scanning a row range
is by far the highest performing operation.

HBase is currently best suited for batch-oriented map/reduce type
operations. There are several installations that use it successfully
for this purpose.

The focus for HBase has and will be (until at least release 0.2.0
which will be released in about one month) reliability and robustness.
(i.e., make it work and then make it work fast)

The release that follows 0.2.0 will focus on performance issues. We
know of two areas where HBase spends most of its time:
- in RPC calls (but we have not broken it down to marshalling,
  unmarshalling or introspection used to make the RPC)
- in the Hadoop FileSystem abstraction. (even on local disk, it is
  not as fast as we'd like)

Hope that helps.

>     Could it be due to the way I have set up Hbase?
>
>     Any help would be appreciated.
>
> thanks
>
> karthik
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.2/1389 - Release
> Date: 4/21/2008 8:34 AM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.3/1391 - Release Date: 4/22/2008 8:15 AM


Re: HBase Performance question

Posted by Karthik Pattabiraman <pk...@yahoo-inc.com>.
replies inline.



stack wrote:
> What Jim said and then....
>
> For sure you are not creating a new HTable per request?
    No I am not. The table is created upfront and records inserted using 
a map/reduce job. This is for reads only.
> How many regions in your table?
    2.
> Which version of HBase?
    hbase-0.1.
>   Why do you have Tomcat in the picture?  Because clients are doing HTTP?
    Yes. the clients are doing http requests only. I also tested using 
the rest api for random reads. Similar benchmarks were obtained.
>    The time for sure is being spent making the request out to HBase 
> and not spent transforming HBase results in and out of HTTP?
       That is something I have to check again. But given that each 
record has just one column and the value size does not exceed 5k, it 
should not be an issue i guess.

thanks

karthik
>
>
> Thanks,
> St.Ack
>
>
> Karthik Pattabiraman wrote:
>> Hi,
>>
>>    I am evaluating HBase for a serving system. The requirements are 
>> fairly simple. Each record comprises a key and a value (size ~4k).
>>
>>    I set up a small cluster consisting of two boxes and the number of 
>> records inserted into the table is close to 65K.
>>
>>   Now I ran a tomcat server on one of the boxes (where the master is 
>> running). The tomcat server establishes a connection to hbase at 
>> start and then on each request queries the hbase for the record.
>>
>>    The benchmarks were not good. I ran the benchmarks for 30 min with 
>> 20 clients (talking to tomcat) and the average response time was 51 
>> ms. When I increased the number of clients to 50, the average 
>> response time increased to 110 ms. To ensure that tomcat is not the 
>> bottleneck, I logged the time taken for each Hbase request and found 
>> that to be correlating the benchmarks (time increased when i 
>> increased the number of clients.).
>>
>>    Any idea as to why this would happen given that the number of 
>> records is not huge? (FYI: Both the boxes act as region servers, the 
>> box where tomcat runs, also runs hbase master and the dfs namenode)
>>
>>    Could it be due to the way I have set up Hbase?
>>
>>    Any help would be appreciated.
>>
>> thanks
>>
>> karthik
>

Re: HBase Performance question

Posted by stack <st...@duboce.net>.
What Jim said and then....

For sure you are not creating a new HTable per request?  How many 
regions in your table?  Which version of HBase?  Why do you have Tomcat 
in the picture?  Because clients are doing HTTP?   The time for sure is 
being spent making the request out to HBase and not spent transforming 
HBase results in and out of HTTP?

Thanks,
St.Ack


Karthik Pattabiraman wrote:
> Hi,
>
>    I am evaluating HBase for a serving system. The requirements are 
> fairly simple. Each record comprises a key and a value (size ~4k).
>
>    I set up a small cluster consisting of two boxes and the number of 
> records inserted into the table is close to 65K.
>
>   Now I ran a tomcat server on one of the boxes (where the master is 
> running). The tomcat server establishes a connection to hbase at start 
> and then on each request queries the hbase for the record.
>
>    The benchmarks were not good. I ran the benchmarks for 30 min with 
> 20 clients (talking to tomcat) and the average response time was 51 
> ms. When I increased the number of clients to 50, the average response 
> time increased to 110 ms. To ensure that tomcat is not the bottleneck, 
> I logged the time taken for each Hbase request and found that to be 
> correlating the benchmarks (time increased when i increased the number 
> of clients.).
>
>    Any idea as to why this would happen given that the number of 
> records is not huge? (FYI: Both the boxes act as region servers, the 
> box where tomcat runs, also runs hbase master and the dfs namenode)
>
>    Could it be due to the way I have set up Hbase?
>
>    Any help would be appreciated.
>
> thanks
>
> karthik