You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Matus Zamborsky <za...@b-ideas.eu> on 2009/08/20 14:38:53 UTC

real prefix filter

Hello,
I am scaning hbase a table with Scan and I am using PrefixFilter. As I 
understand, it scans the whole table and run the filter on every row. 
But why it does not stop after finding row without the desired prefix? 
If it did not find the prefix, if should return true in 
filterAllRemaining calling.
Combining this with possible specifing the start row in Scan object, one 
can very fast filter only rows with the desired prefix.

I am using hbase 0.20 from trunk.

Regards

Matus Zamborsky

Re: real prefix filter

Posted by Jonathan Gray <jl...@streamy.com>.
Thanks Ryan.

Do note, this will not give you proper behavior on a Get, only a Scan. 
I don't just mean the prefix+while, I also mean using the 
WhileMatchFilter at all on a Get.

Filters are allowed in Gets because you can filter on columns and values 
as well, they don't make sense on row keys.  If you need an "early-out" 
filter with a Get, you most likely need to use a Scan instead.

This kind of confusion/inconsistency is more reason to re-implement Gets 
as optimized Scans...

JG

Ryan Rawson wrote:
> The expected idiom is like so:
>       scanSpec.setFilter(new WhileMatchFilter(
>           new PrefixFilter(prefix)));
> 
> This is common for most filters, rather than encoding the 'stop once
> past' type of logic, it is embedded in the while match flter and all
> others are wrapped with it where necessary.
> 
> -ryan
> 
> 2009/8/20 Jonathan Gray <jl...@streamy.com>:
>> It should, perhaps, stop once you pass the prefix.  I actually thought it
>> did, but you and the code say otherwise.  Doing the early-out with a Get is
>> actually not possible, so this may be why it is not implemented as such.
>>
>> However, a Scan can take both a startRow and a stopRow.  So you can use that
>> to early-out instead.
>>
>> Given that filters now work with Gets, you cannot actually implement the
>> early-out within the filter.  You'll have to use start/stop rows.  One could
>> argue a prefix filter may not make much sense on a Get (since you must
>> explicitly specify row), so if you'd like to raise that issue and see if we
>> could integrate an early-out in the filter, please file a JIRA.
>>
>> JG
>>
>> Matus Zamborsky wrote:
>>> Hello,
>>> I am scaning hbase a table with Scan and I am using PrefixFilter. As I
>>> understand, it scans the whole table and run the filter on every row. But
>>> why it does not stop after finding row without the desired prefix? If it did
>>> not find the prefix, if should return true in filterAllRemaining calling.
>>> Combining this with possible specifing the start row in Scan object, one
>>> can very fast filter only rows with the desired prefix.
>>>
>>> I am using hbase 0.20 from trunk.
>>>
>>> Regards
>>>
>>> Matus Zamborsky
>>>
> 

Re: real prefix filter

Posted by Ryan Rawson <ry...@gmail.com>.
The expected idiom is like so:
      scanSpec.setFilter(new WhileMatchFilter(
          new PrefixFilter(prefix)));

This is common for most filters, rather than encoding the 'stop once
past' type of logic, it is embedded in the while match flter and all
others are wrapped with it where necessary.

-ryan

2009/8/20 Jonathan Gray <jl...@streamy.com>:
> It should, perhaps, stop once you pass the prefix.  I actually thought it
> did, but you and the code say otherwise.  Doing the early-out with a Get is
> actually not possible, so this may be why it is not implemented as such.
>
> However, a Scan can take both a startRow and a stopRow.  So you can use that
> to early-out instead.
>
> Given that filters now work with Gets, you cannot actually implement the
> early-out within the filter.  You'll have to use start/stop rows.  One could
> argue a prefix filter may not make much sense on a Get (since you must
> explicitly specify row), so if you'd like to raise that issue and see if we
> could integrate an early-out in the filter, please file a JIRA.
>
> JG
>
> Matus Zamborsky wrote:
>>
>> Hello,
>> I am scaning hbase a table with Scan and I am using PrefixFilter. As I
>> understand, it scans the whole table and run the filter on every row. But
>> why it does not stop after finding row without the desired prefix? If it did
>> not find the prefix, if should return true in filterAllRemaining calling.
>> Combining this with possible specifing the start row in Scan object, one
>> can very fast filter only rows with the desired prefix.
>>
>> I am using hbase 0.20 from trunk.
>>
>> Regards
>>
>> Matus Zamborsky
>>
>

Re: real prefix filter

Posted by Jonathan Gray <jl...@streamy.com>.
It should, perhaps, stop once you pass the prefix.  I actually thought 
it did, but you and the code say otherwise.  Doing the early-out with a 
Get is actually not possible, so this may be why it is not implemented 
as such.

However, a Scan can take both a startRow and a stopRow.  So you can use 
that to early-out instead.

Given that filters now work with Gets, you cannot actually implement the 
early-out within the filter.  You'll have to use start/stop rows.  One 
could argue a prefix filter may not make much sense on a Get (since you 
must explicitly specify row), so if you'd like to raise that issue and 
see if we could integrate an early-out in the filter, please file a JIRA.

JG

Matus Zamborsky wrote:
> Hello,
> I am scaning hbase a table with Scan and I am using PrefixFilter. As I 
> understand, it scans the whole table and run the filter on every row. 
> But why it does not stop after finding row without the desired prefix? 
> If it did not find the prefix, if should return true in 
> filterAllRemaining calling.
> Combining this with possible specifing the start row in Scan object, one 
> can very fast filter only rows with the desired prefix.
> 
> I am using hbase 0.20 from trunk.
> 
> Regards
> 
> Matus Zamborsky
>