You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Matus Zamborsky <za...@b-ideas.eu> on 2009/08/20 14:38:53 UTC
real prefix filter
Hello,
I am scaning hbase a table with Scan and I am using PrefixFilter. As I
understand, it scans the whole table and run the filter on every row.
But why it does not stop after finding row without the desired prefix?
If it did not find the prefix, if should return true in
filterAllRemaining calling.
Combining this with possible specifing the start row in Scan object, one
can very fast filter only rows with the desired prefix.
I am using hbase 0.20 from trunk.
Regards
Matus Zamborsky
Re: real prefix filter
Posted by Jonathan Gray <jl...@streamy.com>.
Thanks Ryan.
Do note, this will not give you proper behavior on a Get, only a Scan.
I don't just mean the prefix+while, I also mean using the
WhileMatchFilter at all on a Get.
Filters are allowed in Gets because you can filter on columns and values
as well, they don't make sense on row keys. If you need an "early-out"
filter with a Get, you most likely need to use a Scan instead.
This kind of confusion/inconsistency is more reason to re-implement Gets
as optimized Scans...
JG
Ryan Rawson wrote:
> The expected idiom is like so:
> scanSpec.setFilter(new WhileMatchFilter(
> new PrefixFilter(prefix)));
>
> This is common for most filters, rather than encoding the 'stop once
> past' type of logic, it is embedded in the while match flter and all
> others are wrapped with it where necessary.
>
> -ryan
>
> 2009/8/20 Jonathan Gray <jl...@streamy.com>:
>> It should, perhaps, stop once you pass the prefix. I actually thought it
>> did, but you and the code say otherwise. Doing the early-out with a Get is
>> actually not possible, so this may be why it is not implemented as such.
>>
>> However, a Scan can take both a startRow and a stopRow. So you can use that
>> to early-out instead.
>>
>> Given that filters now work with Gets, you cannot actually implement the
>> early-out within the filter. You'll have to use start/stop rows. One could
>> argue a prefix filter may not make much sense on a Get (since you must
>> explicitly specify row), so if you'd like to raise that issue and see if we
>> could integrate an early-out in the filter, please file a JIRA.
>>
>> JG
>>
>> Matus Zamborsky wrote:
>>> Hello,
>>> I am scaning hbase a table with Scan and I am using PrefixFilter. As I
>>> understand, it scans the whole table and run the filter on every row. But
>>> why it does not stop after finding row without the desired prefix? If it did
>>> not find the prefix, if should return true in filterAllRemaining calling.
>>> Combining this with possible specifing the start row in Scan object, one
>>> can very fast filter only rows with the desired prefix.
>>>
>>> I am using hbase 0.20 from trunk.
>>>
>>> Regards
>>>
>>> Matus Zamborsky
>>>
>
Re: real prefix filter
Posted by Ryan Rawson <ry...@gmail.com>.
The expected idiom is like so:
scanSpec.setFilter(new WhileMatchFilter(
new PrefixFilter(prefix)));
This is common for most filters, rather than encoding the 'stop once
past' type of logic, it is embedded in the while match flter and all
others are wrapped with it where necessary.
-ryan
2009/8/20 Jonathan Gray <jl...@streamy.com>:
> It should, perhaps, stop once you pass the prefix. I actually thought it
> did, but you and the code say otherwise. Doing the early-out with a Get is
> actually not possible, so this may be why it is not implemented as such.
>
> However, a Scan can take both a startRow and a stopRow. So you can use that
> to early-out instead.
>
> Given that filters now work with Gets, you cannot actually implement the
> early-out within the filter. You'll have to use start/stop rows. One could
> argue a prefix filter may not make much sense on a Get (since you must
> explicitly specify row), so if you'd like to raise that issue and see if we
> could integrate an early-out in the filter, please file a JIRA.
>
> JG
>
> Matus Zamborsky wrote:
>>
>> Hello,
>> I am scaning hbase a table with Scan and I am using PrefixFilter. As I
>> understand, it scans the whole table and run the filter on every row. But
>> why it does not stop after finding row without the desired prefix? If it did
>> not find the prefix, if should return true in filterAllRemaining calling.
>> Combining this with possible specifing the start row in Scan object, one
>> can very fast filter only rows with the desired prefix.
>>
>> I am using hbase 0.20 from trunk.
>>
>> Regards
>>
>> Matus Zamborsky
>>
>
Re: real prefix filter
Posted by Jonathan Gray <jl...@streamy.com>.
It should, perhaps, stop once you pass the prefix. I actually thought
it did, but you and the code say otherwise. Doing the early-out with a
Get is actually not possible, so this may be why it is not implemented
as such.
However, a Scan can take both a startRow and a stopRow. So you can use
that to early-out instead.
Given that filters now work with Gets, you cannot actually implement the
early-out within the filter. You'll have to use start/stop rows. One
could argue a prefix filter may not make much sense on a Get (since you
must explicitly specify row), so if you'd like to raise that issue and
see if we could integrate an early-out in the filter, please file a JIRA.
JG
Matus Zamborsky wrote:
> Hello,
> I am scaning hbase a table with Scan and I am using PrefixFilter. As I
> understand, it scans the whole table and run the filter on every row.
> But why it does not stop after finding row without the desired prefix?
> If it did not find the prefix, if should return true in
> filterAllRemaining calling.
> Combining this with possible specifing the start row in Scan object, one
> can very fast filter only rows with the desired prefix.
>
> I am using hbase 0.20 from trunk.
>
> Regards
>
> Matus Zamborsky
>