You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2013/03/08 03:50:13 UTC
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

    [ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596725#comment-13596725 ] 

James Taylor commented on HBASE-6618:
-------------------------------------

Would it be possible to generalize this a bit further to handle variable length key parts assuming you know the terminator (both Phoenix and Orderly, use a 0 byte terminator)? With the work you've already done here to support ranges, you could support a good set of skip scanning scenarios for multi-part row keys.

Take for example a fuzzy row filter expressed for a two part row key of VARCHAR + byte[4] like this:

    'foo%' (this would be for the VARCHAR key part - 'foo' followed by zero or more characters)
     [4000-6000) (this would be for the INT key part - 4000 inclusive to 6000 exclusive)
    
In this case (as you've already pointed out), you can use the first row key as your guide. Let's say the first row key is ['foobar'][1000]. You could form a skip hint as ['foobar'][4000] (i.e. 'foobar' + new byte[] {0} + new byte[] {1,0,0,0}).
Then you'd let all values pass until you got to ['foobar'][6000], in which case you'd form your next skip hint.


                
> Implement FuzzyRowFilter with ranges support
> --------------------------------------------
>
>                 Key: HBASE-6618
>                 URL: https://issues.apache.org/jira/browse/HBASE-6618
>             Project: HBase
>          Issue Type: New Feature
>          Components: Filters
>            Reporter: Alex Baranau
>            Assignee: Alex Baranau
>            Priority: Minor
>         Attachments: HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch
>
>
> Apart from current ability to specify fuzzy row filter e.g. for <userId_actionId> format as ????_0004 (where 0004 - actionId) it would be great to also have ability to specify the "fuzzy range" , e.g. ????_0004, ..., ????_0099.
> See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
> Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient.
> Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter).
> While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira