You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Juan P." <go...@gmail.com> on 2012/10/02 01:01:07 UTC

HBase vs. HDFS

Hi guys,
I'm trying to get familiarized with HBase and one thing I noticed is that
reads seem to very slow. I just tried doing a "scan 'my_table'" to get 120K
records and it took about 50 seconds to print it all out.

In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
completed in under a second.

Is that possible? Am I missing something about HBase reads?

Thanks,
Joni

Re: HBase vs. HDFS

Posted by Andrew Purtell <ap...@apache.org>.

On Tue, Oct 2, 2012 at 9:05 AM, lars hofhansl <lh...@yahoo.com> wrote:
> You probably executed 120k next() RPC against your server, unless you enabled scanner caching.
> (On a related note, we should probably not default this to 1, but something more sensible, like 10 or 100).

We use 100.

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Re: HBase vs. HDFS

Posted by lars hofhansl <lh...@yahoo.com>.

You probably executed 120k next() RPC against your server, unless you enabled scanner caching.
(On a related note, we should probably not default this to 1, but something more sensible, like 10 or 100).

-- Lars

----- Original Message -----
From: Juan P. <go...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Monday, October 1, 2012 4:01 PM
Subject: HBase vs. HDFS

Hi guys,
I'm trying to get familiarized with HBase and one thing I noticed is that
reads seem to very slow. I just tried doing a "scan 'my_table'" to get 120K
records and it took about 50 seconds to print it all out.

In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
completed in under a second.

Is that possible? Am I missing something about HBase reads?

Thanks,
Joni

Re: HBase vs. HDFS

Posted by techbuddy <te...@gmail.com>.

How did you verify that all the rows indeed reside on the same region server? 



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-vs-HDFS-tp4032463p4032473.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase vs. HDFS

Posted by "Juan P." <go...@gmail.com>.

yes

On Mon, Oct 1, 2012 at 8:05 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> Are these 120K rows from a single region server?
>
> On Mon, Oct 1, 2012 at 4:01 PM, Juan P. <go...@gmail.com> wrote:
>
> > Hi guys,
> > I'm trying to get familiarized with HBase and one thing I noticed is that
> > reads seem to very slow. I just tried doing a "scan 'my_table'" to get
> 120K
> > records and it took about 50 seconds to print it all out.
> >
> > In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
> > completed in under a second.
> >
> > Is that possible? Am I missing something about HBase reads?
> >
> > Thanks,
> > Joni
> >
>

Re: HBase vs. HDFS

Posted by Mohit Anchlia <mo...@gmail.com>.

Are these 120K rows from a single region server?

On Mon, Oct 1, 2012 at 4:01 PM, Juan P. <go...@gmail.com> wrote:

> Hi guys,
> I'm trying to get familiarized with HBase and one thing I noticed is that
> reads seem to very slow. I just tried doing a "scan 'my_table'" to get 120K
> records and it took about 50 seconds to print it all out.
>
> In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
> completed in under a second.
>
> Is that possible? Am I missing something about HBase reads?
>
> Thanks,
> Joni
>

Re: HBase vs. HDFS

Posted by Doug Meil <do...@explorysmedical.com>.

If you take Hbase out of it and think of it from the standpoint of 2
programs, one of which opens a file and write the output to another file,
and the other one which actually processes each row and then writes out
results, the 2nd one is going to be slower because it's doing more,
ceteris paribus.  HBase is like the 2nd program in your test.




On 10/2/12 8:46 AM, "gordoslocos" <go...@gmail.com> wrote:

>Thank you all! Setting a cache size helped a great deal. It's still
>slower though.
>
>I think it might be possible that the overhead of processing the data
>from the table might be the cause.
>
>I guess if HBase adds an indirection to the HDFS then it makes sense that
>it'd be slower, right?
>
>On 02/10/2012, at 09:28, Doug Meil <do...@explorysmedical.com> wrote:
>
>> 
>> Hi there, 
>> 
>> Another thing to consider on top of the scan-caching is that that HBase
>>is
>> doing more in the process of scanning the table.  See...
>> 
>> http://hbase.apache.org/book.html#conceptual.view
>> 
>> http://hbase.apache.org/book.html#regions.arch
>> 
>> 
>> ... Specifically, processing the KeyValues, potentially merging rows
>>between
>> StoreFiles, checking for un-flushed updates in the MemStore per CF.
>> 
>> 
>> 
>> On 10/1/12 8:54 PM, "Doug Meil" <do...@explorysmedical.com> wrote:
>> 
>>> 
>>> Hi there-
>>> 
>>> Might want to start with thisŠ
>>> 
>>> http://hbase.apache.org/book.html#perf.reading
>>> 
>>> Š if you're using default scan caching (which is 1) that would explain
>>>a
>>> lot.
>>> 
>>> 
>>> 
>>> 
>>> On 10/1/12 7:01 PM, "Juan P." <go...@gmail.com> wrote:
>>> 
>>>> Hi guys,
>>>> I'm trying to get familiarized with HBase and one thing I noticed is
>>>>that
>>>> reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>>>> 120K
>>>> records and it took about 50 seconds to print it all out.
>>>> 
>>>> In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K
>>>>lines
>>>> completed in under a second.
>>>> 
>>>> Is that possible? Am I missing something about HBase reads?
>>>> 
>>>> Thanks,
>>>> Joni
>> 
>> 
>

Re: HBase vs. HDFS

Posted by gordoslocos <go...@gmail.com>.

Thank you all! Setting a cache size helped a great deal. It's still slower though.

I think it might be possible that the overhead of processing the data from the table might be the cause.

I guess if HBase adds an indirection to the HDFS then it makes sense that it'd be slower, right?

On 02/10/2012, at 09:28, Doug Meil <do...@explorysmedical.com> wrote:

> 
> Hi there, 
> 
> Another thing to consider on top of the scan-caching is that that HBase is
> doing more in the process of scanning the table.  See...
> 
> http://hbase.apache.org/book.html#conceptual.view
> 
> http://hbase.apache.org/book.html#regions.arch
> 
> 
> ... Specifically, processing the KeyValues, potentially merging rows between
> StoreFiles, checking for un-flushed updates in the MemStore per CF.
> 
> 
> 
> On 10/1/12 8:54 PM, "Doug Meil" <do...@explorysmedical.com> wrote:
> 
>> 
>> Hi there-
>> 
>> Might want to start with thisŠ
>> 
>> http://hbase.apache.org/book.html#perf.reading
>> 
>> Š if you're using default scan caching (which is 1) that would explain a
>> lot.
>> 
>> 
>> 
>> 
>> On 10/1/12 7:01 PM, "Juan P." <go...@gmail.com> wrote:
>> 
>>> Hi guys,
>>> I'm trying to get familiarized with HBase and one thing I noticed is that
>>> reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>>> 120K
>>> records and it took about 50 seconds to print it all out.
>>> 
>>> In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
>>> completed in under a second.
>>> 
>>> Is that possible? Am I missing something about HBase reads?
>>> 
>>> Thanks,
>>> Joni
> 
>

Re: HBase vs. HDFS

Posted by Doug Meil <do...@explorysmedical.com>.

Hi there, 

Another thing to consider on top of the scan-caching is that that HBase is
doing more in the process of scanning the table.  See...

http://hbase.apache.org/book.html#conceptual.view

http://hbase.apache.org/book.html#regions.arch

... Specifically, processing the KeyValues, potentially merging rows between
StoreFiles, checking for un-flushed updates in the MemStore per CF.

On 10/1/12 8:54 PM, "Doug Meil" <do...@explorysmedical.com> wrote:

>
>Hi there-
>
>Might want to start with thisŠ
>
>http://hbase.apache.org/book.html#perf.reading
>
>Š if you're using default scan caching (which is 1) that would explain a
>lot.
>
>
>
>
>On 10/1/12 7:01 PM, "Juan P." <go...@gmail.com> wrote:
>
>>Hi guys,
>>I'm trying to get familiarized with HBase and one thing I noticed is that
>>reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>>120K
>>records and it took about 50 seconds to print it all out.
>>
>>In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
>>completed in under a second.
>>
>>Is that possible? Am I missing something about HBase reads?
>>
>>Thanks,
>>Joni
>
>
>

Re: HBase vs. HDFS

Posted by Doug Meil <do...@explorysmedical.com>.

Hi there-

Might want to start with thisŠ

http://hbase.apache.org/book.html#perf.reading

Š if you're using default scan caching (which is 1) that would explain a
lot.




On 10/1/12 7:01 PM, "Juan P." <go...@gmail.com> wrote:

>Hi guys,
>I'm trying to get familiarized with HBase and one thing I noticed is that
>reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>120K
>records and it took about 50 seconds to print it all out.
>
>In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
>completed in under a second.
>
>Is that possible? Am I missing something about HBase reads?
>
>Thanks,
>Joni