You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by acure <cu...@xg.pl> on 2008/10/27 23:58:27 UTC

performance conclusions and questions

I have a few questions about performance. First my test harwade:
       AMD Athlon XP 2500+
       1G RAM
       Ubuntu 8.4
       Hbase 0.18  based on temp dir - without hadoop
       HBASE_HEAPSIZE=500
       Java 1.6 sun

I create a simple table with two column family: "name:","lastname:" ;
RowGets and Scanners are executed for this two columns family.
I put there 50000 elements with a random guid rowId.


test 1 - SCANNER READS
create scanner and iterate from 0 element to the last.
  - 50 000 elements
        total time : 19 604 ms (for 50000 elements) [average - 100 runs ]
       average 0.392 ms per one row

  - 200 000 elements
       total time : 80 844 ms (for 200 000 elements) [average - 100 runs ]
       average 0.404 ms per one row

Test 2 - RANDOM READS
create scanner and iterate from 0 element to the last - and put into 
list all rowId
than start a timer and do "getRow" for each element from list in back order.

 - 50 000 elements
        total time: 76 801 ms (for 50000 elements) [average - 100 runs ]
        average 1.53 ms per one row   

 - 200 000 elements
        total time: 333 565 ms (for 200 000 elements) [average - 100 runs ]
        average 1.66 ms per one row   

Results:
     random read is about 4 times slower than scanner read.
     Do you agree with this experiment ? maby you know how to faster hbase ?

Questions:
a) how fast should it be ?
b) how much faster than the random reads should the scaner reads be ?
c) how will parrallel multithread reads affect the performance ?
d) how should it work for 100 threads (with 120 RPC handlers in 
hbase-site.xml) - do you have any experience?
  
tomorrow i will do this test for 10 000 000 rows and the multithread 
killer test :).
   Antoni

Re: performance conclusions and questions

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Acure

I tried to answer to all your questions. Major comment is: try the
Performance Evaluation and try to get on a more realistic setup, that kind
of machine really affects the numbers because of thread starvation and
swapping. But you are on a good track, coordinating your efforts with ours
should prove fruitful.

J-D

Do you agree with this experiment ?

I basically agree on the numbers, though they are not always being well
represented. First you should have a look at the Performance Evaluation that
we keep updated in order to compare to Bigtable. We use the same tests as
much as possible. See:
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation

Second, 50 000 rows may be not enough, even 200 000 may not be but it
depends on the size of each rows. How much regions does that give you? Since
a region is HBase's basic unit of distribution, this affects all your
numbers.

Third, the fact that you ran them 100 times is good, very good.

maby you know how to faster hbase ?

Faster machine  and more machines ;)  Well with only 1G of RAM I'm sure that
you must have a lot of swapping and with only 1 CPU some thread starvation.
Also not using HDFS, although it gives better numbers, means that you cannot
scale your test and still have good comparisons. Finally, you should really
try HBase TRUNK because we have 2x, 3x,  and 4x optimizations depending on
the test :P

Questions:
a) how fast should it be ?

See Performance Evaluation (PE) but be sure that with this kind of machine,
you can't expect much.

b) how much faster than the random reads should the scaner reads be ?

Scanner are really faster than random reads, see PE

c) how will parrallel multithread reads affect the performance ?

On a single machine like that... well if the client is another machine, it
can scale nearly linearly (but it will become less and less adding new
clients).

d) how should it work for 100 threads (with 120 RPC handlers in
hbase-site.xml) - do you have any experience?

Not me sorry... But maybe you could try that on PE? That would be very nice.

 tomorrow i will do this test for 10 000 000 rows and the multithread killer
test :).

On Mon, Oct 27, 2008 at 6:58 PM, acure <cu...@xg.pl> wrote:

> I have a few questions about performance. First my test harwade:
>      AMD Athlon XP 2500+
>      1G RAM
>      Ubuntu 8.4
>      Hbase 0.18  based on temp dir - without hadoop
>      HBASE_HEAPSIZE=500
>      Java 1.6 sun
>
> I create a simple table with two column family: "name:","lastname:" ;
> RowGets and Scanners are executed for this two columns family.
> I put there 50000 elements with a random guid rowId.
>
>
> test 1 - SCANNER READS
> create scanner and iterate from 0 element to the last.
>  - 50 000 elements
>       total time : 19 604 ms (for 50000 elements) [average - 100 runs ]
>      average 0.392 ms per one row
>
>  - 200 000 elements
>      total time : 80 844 ms (for 200 000 elements) [average - 100 runs ]
>      average 0.404 ms per one row
>
> Test 2 - RANDOM READS
> create scanner and iterate from 0 element to the last - and put into list
> all rowId
> than start a timer and do "getRow" for each element from list in back
> order.
>
> - 50 000 elements
>       total time: 76 801 ms (for 50000 elements) [average - 100 runs ]
>       average 1.53 ms per one row
> - 200 000 elements
>       total time: 333 565 ms (for 200 000 elements) [average - 100 runs ]
>       average 1.66 ms per one row
> Results:
>    random read is about 4 times slower than scanner read.
>    Do you agree with this experiment ? maby you know how to faster hbase ?
>
> Questions:
> a) how fast should it be ?
> b) how much faster than the random reads should the scaner reads be ?
> c) how will parrallel multithread reads affect the performance ?
> d) how should it work for 100 threads (with 120 RPC handlers in
> hbase-site.xml) - do you have any experience?
>  tomorrow i will do this test for 10 000 000 rows and the multithread
> killer test :).
>  Antoni
>