You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by stack <st...@duboce.net> on 2008/09/02 22:39:22 UTC
Re: Multi get/put

Ning Li wrote:
> Some follow-up on the performance issues:
>   
>>> PERFORMANCE ISSUES
>>> Our preliminary performance experiments show that the performance
>>> of building an index is quite reasonable. However, the performance of
>>> random reads in HDFS is so poor that the search performance is
>>> dramatically worse than that on local file systems.
>>>
>>>       
>> What do you mean by 'dramatic' in the above?  This is a sweet feature.  That
>> its slow on first implementation is OK.  Are you thinking its so slow, its
>> not functional?
>>     
>
> On local FS, real disk IO is expensive. Lucene relies on FS cache to
> provide high search performance on local FS. Because of this, the
> following comparisons are based on warm test results.
>
> The comparison is between the local FS and a one-node HDFS. HDFS
> provides high sequential read performance but poor random read
> performance mainly because of socket overhead when data is warm.
>
> On HDFS 0.17.1, the search performance is more than an order of
> magnitude slower than that on a local FS. Even with reusing socket
> connection, the search performance is still about an order of
> magnitude slower.
>
> Since this is caused by the socket overhead in HDFS, you see similar
> results with random reads on a map file. I used HBase's
> MapFilePerformanceEvaluation. The random read performance is a bit
> less than 7 times lower than that on a local FS. This is a bit better
> than the search performance probably because a random read on a map
> file is several almost-sequential reads on the data file in HDFS.
>
> Given the above, would the search performance be acceptable?
>   
I think performance  -- an order of magnitude slower than local fs --  
is OK for now.  Slow search will be just one more reason why random-read 
performance needs to be improved.

> PS: I saw on http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
> that the random read performance on a map file improved quite a bit
> from 0.17.1 to 0.18.0. Any insight?
>   
Chatting w/ some of the fellas, they said that they've started to worry 
about performance and have been making improvements slowly.  Let me try 
and get some more specifics.  Will be back if I learn anything.

St.Ack