You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Harold Lim <ro...@yahoo.com> on 2011/05/29 22:23:50 UTC

How to improve random read latency?

Are there any configurations that I need to set to improve read latency? I'm running HBase on 10 ec2 m1.large instances (7.5GB RAM).

Also, as the size of the data gets bigger is it normal to get higher latency for reads?

I'm testing out the YCSB benchmark workload.

With a data size of ~40-50GB (~200+ regions), I can get around 10-20ms and I can push the throughput of a read-only workload to around 3000+ operations per second.

However, with a data size of 200GB (~1k+ regions), the smallest latency I can get is 30+ms (with 100 operations per second) and I can't get the throughput to go beyond 400+ operations per second (110+ms latency).  

I tried increasing the hbase.hregion.max.filesize to 2GB to reduce the number of regions and it seems to make it worse.


I also tried increasing the heap size to 4GB,  hbase.regionserver.handler.count = 100, and vm.swappiness = 0. However, it still didn't improve the performance.


I'm also sure that the YCSB client benchmark driver is not becoming the bottleneck because the CPU utilization is low.







Thanks,
Harold

Re: How to improve random read latency?

Posted by Stack <st...@duboce.net>.

See http://hbase.apache.org/book.html#performance and the notes over
in the other thread, "How to improve HBase throughput with YCSB?"
St.Ack

On Sun, May 29, 2011 at 2:28 PM, Sean Bigdatafun
<se...@gmail.com> wrote:
> For pure random read, I do not think there exists a good way to improve
> latency. Essentially, every single read would need to go through disk seek.
> The latency definitely has something to do with server (HBase server/HDFS
> client) rather than client (YCSB)
>
>
> On Sun, May 29, 2011 at 1:23 PM, Harold Lim <ro...@yahoo.com> wrote:
>
>> Are there any configurations that I need to set to improve read latency?
>> I'm running HBase on 10 ec2 m1.large instances (7.5GB RAM).
>>
>> Also, as the size of the data gets bigger is it normal to get higher
>> latency for reads?
>>
>> I'm testing out the YCSB benchmark workload.
>>
>> With a data size of ~40-50GB (~200+ regions), I can get around 10-20ms and
>> I can push the throughput of a read-only workload to around 3000+ operations
>> per second.
>>
>> However, with a data size of 200GB (~1k+ regions), the smallest latency I
>> can get is 30+ms (with 100 operations per second) and I can't get the
>> throughput to go beyond 400+ operations per second (110+ms latency).
>>
>> I tried increasing the hbase.hregion.max.filesize to 2GB to reduce the
>> number of regions and it seems to make it worse.
>>
>>
>> I also tried increasing the heap size to 4GB,
>>  hbase.regionserver.handler.count = 100, and vm.swappiness = 0. However, it
>> still didn't improve the performance.
>>
>>
>> I'm also sure that the YCSB client benchmark driver is not becoming the
>> bottleneck because the CPU utilization is low.
>>
>>
>>
>>
>>
>>
>>
>> Thanks,
>> Harold
>>
>
>
>
> --
> --Sean
>

Re: How to improve random read latency?

Posted by Sean Bigdatafun <se...@gmail.com>.

For pure random read, I do not think there exists a good way to improve
latency. Essentially, every single read would need to go through disk seek.
The latency definitely has something to do with server (HBase server/HDFS
client) rather than client (YCSB)


On Sun, May 29, 2011 at 1:23 PM, Harold Lim <ro...@yahoo.com> wrote:

> Are there any configurations that I need to set to improve read latency?
> I'm running HBase on 10 ec2 m1.large instances (7.5GB RAM).
>
> Also, as the size of the data gets bigger is it normal to get higher
> latency for reads?
>
> I'm testing out the YCSB benchmark workload.
>
> With a data size of ~40-50GB (~200+ regions), I can get around 10-20ms and
> I can push the throughput of a read-only workload to around 3000+ operations
> per second.
>
> However, with a data size of 200GB (~1k+ regions), the smallest latency I
> can get is 30+ms (with 100 operations per second) and I can't get the
> throughput to go beyond 400+ operations per second (110+ms latency).
>
> I tried increasing the hbase.hregion.max.filesize to 2GB to reduce the
> number of regions and it seems to make it worse.
>
>
> I also tried increasing the heap size to 4GB,
>  hbase.regionserver.handler.count = 100, and vm.swappiness = 0. However, it
> still didn't improve the performance.
>
>
> I'm also sure that the YCSB client benchmark driver is not becoming the
> bottleneck because the CPU utilization is low.
>
>
>
>
>
>
>
> Thanks,
> Harold
>



-- 
--Sean