You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Murali Krishna. P" <mu...@yahoo.com> on 2009/08/18 17:35:15 UTC

HBase-0.20.0 randomRead

Hi all,
 (Saw a related thread on performance, but starting a different one because my setup is slightly different).

I have an one node setup with hbase-0.20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.

09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows

 
So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.

The machine config is:
Processors:    2 x Xeon L5420 2.50GHz (8 cores)
Memory:        13.7GB
12 Disks of 1TB each.

Let me know if you need anymore details

Thanks,
Murali Krishna

Re: HBase-0.20.0 randomRead

Posted by Jonathan Gray <jl...@streamy.com>.
Yup, what you experienced is a known issue with RC1 that is fixed in RC2.

Murali Krishna. P wrote:
> Hi Jonathan,
>     I am using RC1 and the issue happens when I upload as mapred job. With --nomapred, it worked fine. Currently, I am uploading 100million rows, will definitely upgrade to RC2 once it gets over.
> 
> Thanks,
> Murali Krishna
> 
> 
> 
> 
> ________________________________
> From: Jonathan Gray <jl...@streamy.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wednesday, 19 August, 2009 10:20:18 PM
> Subject: Re: HBase-0.20.0 randomRead
> 
> Murali,
> 
> Which version of HBase are you running?
> 
> There was a fix that was just committed a few days ago for a bug that 
> manifested as null/empty HRI.
> 
> It has been fixed in RC2, so I recommend upgrading to that and trying 
> your upload again.
> 
> JG
> 
> Murali Krishna. P wrote:
>> Thanks for the clarification. I changed the ROW_LENGTH as you suggested and used SequenceWrite + randomRead combination to benchmark. Initial result was impressive, eventhough I would like to have the last column improved.
>>
>> randomRead
>> =========
>>                 -nclients(--rows)               5 (10000)           50(10000)                100(10000)                 1000 (10000)
>> totalrows                      
>> 800k                                             0.4ms                 3.5ms                         6.5ms                            55ms
>> 2.3m                                             0.45ms               3.5ms                         6.6ms                            56ms
>>
>>  Only change in the config was that the handler count increased to 1000. I think there will be some parameters which can be tweaked to improve this further?
>>
>> My goal is get test it for 10million rows with this  box. For some reason the sequenceWrite job with 5000000row + 2 clients failed,with the following exception:-
>>
>> 09/08/19 00:34:07 INFO mapred.LocalJobRunner: 2000000/2050000/2500000
>> 09/08/19 00:50:38 WARN mapred.LocalJobRunner: job_local_0001
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server for region , row '0002076131', but failed after 11 attempts.
>> Exceptions:
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java..io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>> java.io.IOException: HRegionInfo was null or empty in .META.
>>
>>         at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
>>         at org.apache.hadoop.hbase.client..HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1064)
>>         at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
>>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
>>         at org.evaluation.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:736)
>>         at org..evaluation.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:571)
>>         at org.evaluation.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:804)
>>         at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:350)
>>         at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:326)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
>>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
>>
>>
>>  
>>
>> From region server log:-
>> 2009-08-19 00:47:22,740 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@1e458ae5
>> 2009-08-19 00:48:22,741 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@4b28029c
>> 2009-08-19 00:49:22,743 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@33fb11e0
>> 2009-08-19 00:50:22,745 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@6ceccc3b
>> 2009-08-19 00:51:22,746 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@44a3b5b4
>>
>> Thanks,
>> Murali Krishna
>>
>>
>>
>>
>> ________________________________
>> From: Jonathan Gray <jl...@streamy.com>
>> To: hbase-user@hadoop.apache.org
>> Sent: Wednesday, 19 August, 2009 12:26:55 AM
>> Subject: Re: HBase-0.20.0 randomRead
>>
>> With all that memory, you're likely seeing such good performance because 
>> of filesystem caching.  As you say, 2ms is extraordinarily fast for a 
>> disk read, but since your rows are relatively small, you are loading up 
>> all that data into memory (not only the fs cache, but also hbase's block 
>> cache which makes it even faster).
>>
>> JG
>>
>> Jean-Daniel Cryans wrote:
>>> Well it seems there's something wrong with the way you modified PE. It
>>> is not really testing your table unless the row keys are built the
>>> same way as TestTable is, to me it seems that you are testing on only
>>> 20000 rows so caching is easy. A better test would just be to use PE
>>> the way it currently is but with ROW_LENGTH = 4k.
>>>
>>> WRT Jetty, make sure you optimized it with
>>> http://jetty.mortbay.org/jetty5/doc/optimization.html
>>>
>>> J-D
>>>
>>> On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
>>> P<mu...@yahoo.com> wrote:
>>>> Ahh, mistake, I just took it as seconds.
>>>>
>>>> Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for disk read? ( I have given 8G heapspace for RegionServer, is it caching so much?). Has anyone seen these kind of numbers ?
>>>>
>>>>
>>>> Actually, my initial problem was that I have a jetty infront of this hbase to serve this 4k value and when bench marked, it took 200+milliseconds for each record with 100 clients. That is why decided to benchmark without jetty first.
>>>>
>>>> Thanks,
>>>> Murali Krishna
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Jean-Daniel Cryans <jd...@apache.org>
>>>> To: hbase-user@hadoop.apache.org
>>>> Sent: Tuesday, 18 August, 2009 9:13:40 PM
>>>> Subject: Re: HBase-0..20.0 randomRead
>>>>
>>>> Murali,
>>>>
>>>> I'm not reading the same thing as you.
>>>>
>>>> client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>>>
>>>> That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
>>>>
>>>> J-D
>>>>
>>>> On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
>>>> P<mu...@yahoo.com> wrote:
>>>>> Hi all,
>>>>>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>>>>>
>>>>> I have an one node setup with hbase-0..20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
>>>>> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>>>>>
>>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
>>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
>>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>>>>>
>>>>>
>>>>> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>>>>>
>>>>> The machine config is:
>>>>> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
>>>>> Memory:        13.7GB
>>>>> 12 Disks of 1TB each.
>>>>>
>>>>> Let me know if you need anymore details
>>>>>
>>>>> Thanks,
>>>>> Murali Krishna
> 

Re: HBase-0.20.0 randomRead

Posted by "Murali Krishna. P" <mu...@yahoo.com>.
Hi Jonathan,
    I am using RC1 and the issue happens when I upload as mapred job. With --nomapred, it worked fine. Currently, I am uploading 100million rows, will definitely upgrade to RC2 once it gets over.

Thanks,
Murali Krishna




________________________________
From: Jonathan Gray <jl...@streamy.com>
To: hbase-user@hadoop.apache.org
Sent: Wednesday, 19 August, 2009 10:20:18 PM
Subject: Re: HBase-0.20.0 randomRead

Murali,

Which version of HBase are you running?

There was a fix that was just committed a few days ago for a bug that 
manifested as null/empty HRI.

It has been fixed in RC2, so I recommend upgrading to that and trying 
your upload again.

JG

Murali Krishna. P wrote:
> Thanks for the clarification. I changed the ROW_LENGTH as you suggested and used SequenceWrite + randomRead combination to benchmark. Initial result was impressive, eventhough I would like to have the last column improved.
> 
> randomRead
> =========
>                 -nclients(--rows)               5 (10000)           50(10000)                100(10000)                 1000 (10000)
> totalrows                      
> 800k                                             0.4ms                 3.5ms                         6.5ms                            55ms
> 2.3m                                             0.45ms               3.5ms                         6.6ms                            56ms
> 
>  Only change in the config was that the handler count increased to 1000. I think there will be some parameters which can be tweaked to improve this further?
> 
> My goal is get test it for 10million rows with this  box. For some reason the sequenceWrite job with 5000000row + 2 clients failed,with the following exception:-
> 
> 09/08/19 00:34:07 INFO mapred.LocalJobRunner: 2000000/2050000/2500000
> 09/08/19 00:50:38 WARN mapred.LocalJobRunner: job_local_0001
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server for region , row '0002076131', but failed after 11 attempts.
> Exceptions:
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java..io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> 
>         at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
>         at org.apache.hadoop.hbase.client..HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1064)
>         at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
>         at org.evaluation.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:736)
>         at org..evaluation.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:571)
>         at org.evaluation.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:804)
>         at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:350)
>         at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:326)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
> 
> 
>  
> 
> From region server log:-
> 2009-08-19 00:47:22,740 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@1e458ae5
> 2009-08-19 00:48:22,741 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@4b28029c
> 2009-08-19 00:49:22,743 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@33fb11e0
> 2009-08-19 00:50:22,745 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@6ceccc3b
> 2009-08-19 00:51:22,746 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@44a3b5b4
> 
> Thanks,
> Murali Krishna
> 
> 
> 
> 
> ________________________________
> From: Jonathan Gray <jl...@streamy.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wednesday, 19 August, 2009 12:26:55 AM
> Subject: Re: HBase-0.20.0 randomRead
> 
> With all that memory, you're likely seeing such good performance because 
> of filesystem caching.  As you say, 2ms is extraordinarily fast for a 
> disk read, but since your rows are relatively small, you are loading up 
> all that data into memory (not only the fs cache, but also hbase's block 
> cache which makes it even faster).
> 
> JG
> 
> Jean-Daniel Cryans wrote:
>> Well it seems there's something wrong with the way you modified PE. It
>> is not really testing your table unless the row keys are built the
>> same way as TestTable is, to me it seems that you are testing on only
>> 20000 rows so caching is easy. A better test would just be to use PE
>> the way it currently is but with ROW_LENGTH = 4k.
>>
>> WRT Jetty, make sure you optimized it with
>> http://jetty.mortbay.org/jetty5/doc/optimization.html
>>
>> J-D
>>
>> On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
>> P<mu...@yahoo.com> wrote:
>>> Ahh, mistake, I just took it as seconds.
>>>
>>> Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for disk read? ( I have given 8G heapspace for RegionServer, is it caching so much?). Has anyone seen these kind of numbers ?
>>>
>>>
>>> Actually, my initial problem was that I have a jetty infront of this hbase to serve this 4k value and when bench marked, it took 200+milliseconds for each record with 100 clients. That is why decided to benchmark without jetty first.
>>>
>>> Thanks,
>>> Murali Krishna
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Jean-Daniel Cryans <jd...@apache.org>
>>> To: hbase-user@hadoop.apache.org
>>> Sent: Tuesday, 18 August, 2009 9:13:40 PM
>>> Subject: Re: HBase-0..20.0 randomRead
>>>
>>> Murali,
>>>
>>> I'm not reading the same thing as you.
>>>
>>> client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>>
>>> That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
>>>
>>> J-D
>>>
>>> On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
>>> P<mu...@yahoo.com> wrote:
>>>> Hi all,
>>>>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>>>>
>>>> I have an one node setup with hbase-0..20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
>>>> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>>>>
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>>>>
>>>>
>>>> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>>>>
>>>> The machine config is:
>>>> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
>>>> Memory:        13.7GB
>>>> 12 Disks of 1TB each.
>>>>
>>>> Let me know if you need anymore details
>>>>
>>>> Thanks,
>>>> Murali Krishna
> 

Re: HBase-0.20.0 randomRead

Posted by Jonathan Gray <jl...@streamy.com>.
Murali,

Which version of HBase are you running?

There was a fix that was just committed a few days ago for a bug that 
manifested as null/empty HRI.

It has been fixed in RC2, so I recommend upgrading to that and trying 
your upload again.

JG

Murali Krishna. P wrote:
> Thanks for the clarification. I changed the ROW_LENGTH as you suggested and used SequenceWrite + randomRead combination to benchmark. Initial result was impressive, eventhough I would like to have the last column improved.
> 
> randomRead
> =========
>                 -nclients(--rows)               5 (10000)           50(10000)                100(10000)                 1000 (10000)
> totalrows                       
> 800k                                             0.4ms                 3.5ms                         6.5ms                            55ms
> 2.3m                                             0.45ms               3.5ms                         6.6ms                            56ms
> 
>  Only change in the config was that the handler count increased to 1000. I think there will be some parameters which can be tweaked to improve this further?
> 
> My goal is get test it for 10million rows with this  box. For some reason the sequenceWrite job with 5000000row + 2 clients failed,with the following exception:-
> 
> 09/08/19 00:34:07 INFO mapred.LocalJobRunner: 2000000/2050000/2500000
> 09/08/19 00:50:38 WARN mapred.LocalJobRunner: job_local_0001
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server for region , row '0002076131', but failed after 11 attempts.
> Exceptions:
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java..io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> java.io.IOException: HRegionInfo was null or empty in .META.
> 
>         at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
>         at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1064)
>         at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
>         at org.evaluation.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:736)
>         at org.evaluation.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:571)
>         at org.evaluation.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:804)
>         at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:350)
>         at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:326)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
> 
> 
>  
> 
> From region server log:-
> 2009-08-19 00:47:22,740 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@1e458ae5
> 2009-08-19 00:48:22,741 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@4b28029c
> 2009-08-19 00:49:22,743 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@33fb11e0
> 2009-08-19 00:50:22,745 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@6ceccc3b
> 2009-08-19 00:51:22,746 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@44a3b5b4
> 
> Thanks,
> Murali Krishna
> 
> 
> 
> 
> ________________________________
> From: Jonathan Gray <jl...@streamy.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wednesday, 19 August, 2009 12:26:55 AM
> Subject: Re: HBase-0.20.0 randomRead
> 
> With all that memory, you're likely seeing such good performance because 
> of filesystem caching.  As you say, 2ms is extraordinarily fast for a 
> disk read, but since your rows are relatively small, you are loading up 
> all that data into memory (not only the fs cache, but also hbase's block 
> cache which makes it even faster).
> 
> JG
> 
> Jean-Daniel Cryans wrote:
>> Well it seems there's something wrong with the way you modified PE. It
>> is not really testing your table unless the row keys are built the
>> same way as TestTable is, to me it seems that you are testing on only
>> 20000 rows so caching is easy. A better test would just be to use PE
>> the way it currently is but with ROW_LENGTH = 4k.
>>
>> WRT Jetty, make sure you optimized it with
>> http://jetty.mortbay.org/jetty5/doc/optimization.html
>>
>> J-D
>>
>> On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
>> P<mu...@yahoo.com> wrote:
>>> Ahh, mistake, I just took it as seconds.
>>>
>>> Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for disk read? ( I have given 8G heapspace for RegionServer, is it caching so much?). Has anyone seen these kind of numbers ?
>>>
>>>
>>> Actually, my initial problem was that I have a jetty infront of this hbase to serve this 4k value and when bench marked, it took 200+milliseconds for each record with 100 clients. That is why decided to benchmark without jetty first.
>>>
>>> Thanks,
>>> Murali Krishna
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Jean-Daniel Cryans <jd...@apache.org>
>>> To: hbase-user@hadoop.apache.org
>>> Sent: Tuesday, 18 August, 2009 9:13:40 PM
>>> Subject: Re: HBase-0.20.0 randomRead
>>>
>>> Murali,
>>>
>>> I'm not reading the same thing as you.
>>>
>>> client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>>
>>> That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
>>>
>>> J-D
>>>
>>> On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
>>> P<mu...@yahoo.com> wrote:
>>>> Hi all,
>>>>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>>>>
>>>> I have an one node setup with hbase-0..20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
>>>> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>>>>
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>>>>
>>>>
>>>> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>>>>
>>>> The machine config is:
>>>> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
>>>> Memory:        13.7GB
>>>> 12 Disks of 1TB each.
>>>>
>>>> Let me know if you need anymore details
>>>>
>>>> Thanks,
>>>> Murali Krishna
> 

Re: HBase-0.20.0 randomRead

Posted by "Murali Krishna. P" <mu...@yahoo.com>.
Thanks for the clarification. I changed the ROW_LENGTH as you suggested and used SequenceWrite + randomRead combination to benchmark. Initial result was impressive, eventhough I would like to have the last column improved.

randomRead
=========
                -nclients(--rows)               5 (10000)           50(10000)                100(10000)                 1000 (10000)
totalrows                       
800k                                             0.4ms                 3.5ms                         6.5ms                            55ms
2.3m                                             0.45ms               3.5ms                         6.6ms                            56ms

 Only change in the config was that the handler count increased to 1000. I think there will be some parameters which can be tweaked to improve this further?

My goal is get test it for 10million rows with this  box. For some reason the sequenceWrite job with 5000000row + 2 clients failed,with the following exception:-

09/08/19 00:34:07 INFO mapred.LocalJobRunner: 2000000/2050000/2500000
09/08/19 00:50:38 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server for region , row '0002076131', but failed after 11 attempts.
Exceptions:
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java..io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.

        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1064)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
        at org.evaluation.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:736)
        at org.evaluation.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:571)
        at org.evaluation.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:804)
        at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:350)
        at org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:326)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)


 

From region server log:-
2009-08-19 00:47:22,740 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@1e458ae5
2009-08-19 00:48:22,741 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@4b28029c
2009-08-19 00:49:22,743 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@33fb11e0
2009-08-19 00:50:22,745 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@6ceccc3b
2009-08-19 00:51:22,746 WARN org.apache.hadoop.hbase.regionserver.Store: Not in setorg.apache.hadoop.hbase.regionserver.StoreScanner@44a3b5b4

Thanks,
Murali Krishna




________________________________
From: Jonathan Gray <jl...@streamy.com>
To: hbase-user@hadoop.apache.org
Sent: Wednesday, 19 August, 2009 12:26:55 AM
Subject: Re: HBase-0.20.0 randomRead

With all that memory, you're likely seeing such good performance because 
of filesystem caching.  As you say, 2ms is extraordinarily fast for a 
disk read, but since your rows are relatively small, you are loading up 
all that data into memory (not only the fs cache, but also hbase's block 
cache which makes it even faster).

JG

Jean-Daniel Cryans wrote:
> Well it seems there's something wrong with the way you modified PE. It
> is not really testing your table unless the row keys are built the
> same way as TestTable is, to me it seems that you are testing on only
> 20000 rows so caching is easy. A better test would just be to use PE
> the way it currently is but with ROW_LENGTH = 4k.
> 
> WRT Jetty, make sure you optimized it with
> http://jetty.mortbay.org/jetty5/doc/optimization.html
> 
> J-D
> 
> On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
> P<mu...@yahoo.com> wrote:
>> Ahh, mistake, I just took it as seconds.
>>
>> Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for disk read? ( I have given 8G heapspace for RegionServer, is it caching so much?). Has anyone seen these kind of numbers ?
>>
>>
>> Actually, my initial problem was that I have a jetty infront of this hbase to serve this 4k value and when bench marked, it took 200+milliseconds for each record with 100 clients. That is why decided to benchmark without jetty first.
>>
>> Thanks,
>> Murali Krishna
>>
>>
>>
>>
>> ________________________________
>> From: Jean-Daniel Cryans <jd...@apache.org>
>> To: hbase-user@hadoop.apache.org
>> Sent: Tuesday, 18 August, 2009 9:13:40 PM
>> Subject: Re: HBase-0.20.0 randomRead
>>
>> Murali,
>>
>> I'm not reading the same thing as you.
>>
>> client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>
>> That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
>>
>> J-D
>>
>> On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
>> P<mu...@yahoo.com> wrote:
>>> Hi all,
>>>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>>>
>>> I have an one node setup with hbase-0..20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
>>> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>>>
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>>>
>>>
>>> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>>>
>>> The machine config is:
>>> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
>>> Memory:        13.7GB
>>> 12 Disks of 1TB each.
>>>
>>> Let me know if you need anymore details
>>>
>>> Thanks,
>>> Murali Krishna
> 

Re: HBase-0.20.0 randomRead

Posted by Jonathan Gray <jl...@streamy.com>.
With all that memory, you're likely seeing such good performance because 
of filesystem caching.  As you say, 2ms is extraordinarily fast for a 
disk read, but since your rows are relatively small, you are loading up 
all that data into memory (not only the fs cache, but also hbase's block 
cache which makes it even faster).

JG

Jean-Daniel Cryans wrote:
> Well it seems there's something wrong with the way you modified PE. It
> is not really testing your table unless the row keys are built the
> same way as TestTable is, to me it seems that you are testing on only
> 20000 rows so caching is easy. A better test would just be to use PE
> the way it currently is but with ROW_LENGTH = 4k.
> 
> WRT Jetty, make sure you optimized it with
> http://jetty.mortbay.org/jetty5/doc/optimization.html
> 
> J-D
> 
> On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
> P<mu...@yahoo.com> wrote:
>> Ahh, mistake, I just took it as seconds.
>>
>> Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for disk read? ( I have given 8G heapspace for RegionServer, is it caching so much?). Has anyone seen these kind of numbers ?
>>
>>
>> Actually, my initial problem was that I have a jetty infront of this hbase to serve this 4k value and when bench marked, it took 200+milliseconds for each record with 100 clients. That is why decided to benchmark without jetty first.
>>
>> Thanks,
>> Murali Krishna
>>
>>
>>
>>
>> ________________________________
>> From: Jean-Daniel Cryans <jd...@apache.org>
>> To: hbase-user@hadoop.apache.org
>> Sent: Tuesday, 18 August, 2009 9:13:40 PM
>> Subject: Re: HBase-0.20.0 randomRead
>>
>> Murali,
>>
>> I'm not reading the same thing as you.
>>
>> client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>
>> That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
>>
>> J-D
>>
>> On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
>> P<mu...@yahoo.com> wrote:
>>> Hi all,
>>>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>>>
>>> I have an one node setup with hbase-0..20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
>>> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>>>
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>>>
>>>
>>> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>>>
>>> The machine config is:
>>> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
>>> Memory:        13.7GB
>>> 12 Disks of 1TB each.
>>>
>>> Let me know if you need anymore details
>>>
>>> Thanks,
>>> Murali Krishna
> 

Re: HBase-0.20.0 randomRead

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Well it seems there's something wrong with the way you modified PE. It
is not really testing your table unless the row keys are built the
same way as TestTable is, to me it seems that you are testing on only
20000 rows so caching is easy. A better test would just be to use PE
the way it currently is but with ROW_LENGTH = 4k.

WRT Jetty, make sure you optimized it with
http://jetty.mortbay.org/jetty5/doc/optimization.html

J-D

On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
P<mu...@yahoo.com> wrote:
> Ahh, mistake, I just took it as seconds.
>
> Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for disk read? ( I have given 8G heapspace for RegionServer, is it caching so much?). Has anyone seen these kind of numbers ?
>
>
> Actually, my initial problem was that I have a jetty infront of this hbase to serve this 4k value and when bench marked, it took 200+milliseconds for each record with 100 clients. That is why decided to benchmark without jetty first.
>
> Thanks,
> Murali Krishna
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Tuesday, 18 August, 2009 9:13:40 PM
> Subject: Re: HBase-0.20.0 randomRead
>
> Murali,
>
> I'm not reading the same thing as you.
>
> client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>
> That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
>
> J-D
>
> On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
> P<mu...@yahoo.com> wrote:
>> Hi all,
>>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>>
>> I have an one node setup with hbase-0..20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
>> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>>
>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>>
>>
>> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>>
>> The machine config is:
>> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
>> Memory:        13.7GB
>> 12 Disks of 1TB each.
>>
>> Let me know if you need anymore details
>>
>> Thanks,
>> Murali Krishna
>

Re: HBase-0.20.0 randomRead

Posted by "Murali Krishna. P" <mu...@yahoo.com>.
Ahh, mistake, I just took it as seconds.

Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for disk read? ( I have given 8G heapspace for RegionServer, is it caching so much?). Has anyone seen these kind of numbers ?


Actually, my initial problem was that I have a jetty infront of this hbase to serve this 4k value and when bench marked, it took 200+milliseconds for each record with 100 clients. That is why decided to benchmark without jetty first.

Thanks,
Murali Krishna




________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Tuesday, 18 August, 2009 9:13:40 PM
Subject: Re: HBase-0.20.0 randomRead

Murali,

I'm not reading the same thing as you.

client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows

That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.

J-D

On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
P<mu...@yahoo.com> wrote:
> Hi all,
>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>
> I have an one node setup with hbase-0..20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>
>
> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>
> The machine config is:
> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
> Memory:        13.7GB
> 12 Disks of 1TB each.
>
> Let me know if you need anymore details
>
> Thanks,
> Murali Krishna

Re: HBase-0.20.0 randomRead

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Murali,

I'm not reading the same thing as you.

client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows

That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.

J-D

On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
P<mu...@yahoo.com> wrote:
> Hi all,
>  (Saw a related thread on performance, but starting a different one because my setup is slightly different).
>
> I have an one node setup with hbase-0.20(alpha). It has around 11million rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
> Since my primary concern is randomRead, modified the performanceEvaluation code to read from this particular table. The randomRead test gave following result.
>
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished randomRead in 2813ms at offset 10000 for 10000 rows
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms writing 10000 rows
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms writing 10000 rows
>
>
> So, looks like it is taking around 280ms per record. Looking at the latest hbase performance claims, I was expecting it below 10ms. Am  I doing something basically wrong, since such a hiuge difference :( ? Please help me fix the latency.
>
> The machine config is:
> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
> Memory:        13.7GB
> 12 Disks of 1TB each.
>
> Let me know if you need anymore details
>
> Thanks,
> Murali Krishna