You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by alokob <al...@gmail.com> on 2015/02/04 08:09:08 UTC

Hbase multiget vs scan with RowFilter

Hello ,

I have a use case where I need to get a set of records from Hbase . Keys of
the records to be fetched from Hbase is know , so I wanted to know if its a
better way to use multi-get approach or scan with RowFilter will give the
better performance. Number of records to be fetched at a time are 500. 

I am using Hbase 0.98 , please suggest the approach. Also I wanted to know
if there is any limit on number of Filters we can apply to a scan  i.e. if
we can have 500 filters on a scan object. Thanks for any help.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase multiget vs scan with RowFilter

Posted by alokob <al...@gmail.com>.
Ted thanks. 

Yes I am going to try both the options.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066p4068136.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase multiget vs scan with RowFilter

Posted by Ted Yu <yu...@gmail.com>.
alokob:
As Lars mentioned, for scan with RowFilter approach, set start and stop
rows properly so that the number of rows to scan is limited.

Probably you can play with both approaches using sample data to find out
which one is faster.

Cheers

On Wed, Feb 4, 2015 at 10:07 PM, lars hofhansl <la...@apache.org> wrote:

> It depends.
> A scan will always scan all rows between the passed start and stop rows
> (or all rows when none where passed). A Filter can filter rows out, but
> they will all be read in.A MultiGet does a seek (in a sense) for each Get.
> So if the set of Gets in the MultiGet is very small compared to the total
> number of rows you need to scan you'll be better off with that.If you can
> limit the set of rows to scan (i.e. there is a start and stop row not too
> far apart) a scan is faster.
> The numbers depend on many variables but Scan'ning a row in a larger set
> is probably 100-1000x faster than Get'ing a single row.
> -- Lars
>
>       From: alokob <al...@gmail.com>
>  To: user@hbase.apache.org
>  Sent: Wednesday, February 4, 2015 8:53 PM
>  Subject: Re: Hbase multiget vs scan with RowFilter
>
> Thanks Ted for your reply.
>
> I was under impression that scan with RowFilter would give better
> performance as in case of multi-get each Get would be treated as an
> independent scan. Or you mean to say that in case scan it would be full
> table scan but in case of multi-get it would be return once we get the row
> without continuing to scan , so multi-get would be efficient.
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066p4068108.html
>
>
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
>
>

Re: Hbase multiget vs scan with RowFilter

Posted by alokob <al...@gmail.com>.
Thanks Lars.

Total number of records we are expecting is in the range of 5-10 millions ,
and as the records which we are willing to fetch from Hbase are result of
search result so there is no notion of start or end row.  So depending on
you note with the scan approach I assume it will always be slower compared
to MultiGet.

Or may be I can sort by search result keys and then set first row as
start-row and last row as stop-row.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066p4068114.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase multiget vs scan with RowFilter

Posted by lars hofhansl <la...@apache.org>.
It depends.
A scan will always scan all rows between the passed start and stop rows (or all rows when none where passed). A Filter can filter rows out, but they will all be read in.A MultiGet does a seek (in a sense) for each Get.
So if the set of Gets in the MultiGet is very small compared to the total number of rows you need to scan you'll be better off with that.If you can limit the set of rows to scan (i.e. there is a start and stop row not too far apart) a scan is faster.
The numbers depend on many variables but Scan'ning a row in a larger set is probably 100-1000x faster than Get'ing a single row.
-- Lars

      From: alokob <al...@gmail.com>
 To: user@hbase.apache.org 
 Sent: Wednesday, February 4, 2015 8:53 PM
 Subject: Re: Hbase multiget vs scan with RowFilter
   
Thanks Ted for your reply.

I was under impression that scan with RowFilter would give better
performance as in case of multi-get each Get would be treated as an
independent scan. Or you mean to say that in case scan it would be full
table scan but in case of multi-get it would be return once we get the row
without continuing to scan , so multi-get would be efficient.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066p4068108.html


Sent from the HBase User mailing list archive at Nabble.com.


   

Re: Hbase multiget vs scan with RowFilter

Posted by alokob <al...@gmail.com>.
Thanks Ted for your reply.

I was under impression that scan with RowFilter would give better
performance as in case of multi-get each Get would be treated as an
independent scan. Or you mean to say that in case scan it would be full
table scan but in case of multi-get it would be return once we get the row
without continuing to scan , so multi-get would be efficient.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066p4068108.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase multiget vs scan with RowFilter

Posted by Ted Yu <yu...@gmail.com>.
>From your description, the multi-get approach seems better.

bq. we can have 500 filters on a scan object

This is supported (through FilterList). However, the efficiency may not be
good.

Cheers

On Tue, Feb 3, 2015 at 11:09 PM, alokob <al...@gmail.com> wrote:

>
> Hello ,
>
> I have a use case where I need to get a set of records from Hbase . Keys of
> the records to be fetched from Hbase is know , so I wanted to know if its a
> better way to use multi-get approach or scan with RowFilter will give the
> better performance. Number of records to be fetched at a time are 500.
>
> I am using Hbase 0.98 , please suggest the approach. Also I wanted to know
> if there is any limit on number of Filters we can apply to a scan  i.e. if
> we can have 500 filters on a scan object. Thanks for any help.
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Hbase-multiget-vs-scan-with-RowFilter-tp4068066.html
> Sent from the HBase User mailing list archive at Nabble.com.
>