You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Liam Slusser <ls...@gmail.com> on 2014/04/25 03:24:08 UTC

FuzzyRowFilter weird results

Hey All -

I'm having some strange results using FuzzyRowFilter.  I'm programming in
jython for that extra bit of adventure.

My hbase key looks something like [random 10bytes][servicetype
12bytes][timestamp 10bytes] = 32 bytes total.  For an example key
e23d4ac4b90002000100011398388474

So the following code will find the above key:

filter = FuzzyRowFilter([ Pair(array('b',
"e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
array('b',
[0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]))])

But I'm only able to match at the beginning of the key, never the middle or
at the end.

This will not find the above key:

filter = FuzzyRowFilter([ Pair(array('b',
"e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x004"),
array('b',
[0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0]))])

Am I doing something wrong?  Is there a better way to search for keys?
 Really I'm going to want to search on the 12-byte service-type.

Here is the full jython code:

from array import array
from org.apache.hadoop.hbase.util import Pair
from org.apache.hadoop.hbase import HBaseConfiguration
from org.apache.hadoop.hbase.client import HBaseAdmin, HTable, Scan
from org.apache.hadoop.hbase.filter import FuzzyRowFilter

conf = HBaseConfiguration()
filter = FuzzyRowFilter([ Pair(array('b',
"e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
array('b',
[0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1])) ])

scan = Scan()
scan.setFilter(filter)
table = HTable(conf,'mytable')
s = table.getScanner(scan)

while True:
    r = s.next()
    if not r:
        break
    else:
        print r

Re: FuzzyRowFilter weird results

Posted by Liam Slusser <ls...@gmail.com>.
I've figured out my problem, in python, or jython as this is, you don't
need to escape the \.  So in Java \\x00 is \x00 in jython/python.  Oops!
 Basically I was adding a whole bunch of bytes for \ that shouldn't be
there, causing it to never match anything.

thank!
liam



On Thu, Apr 24, 2014 at 9:24 PM, Liam Slusser <ls...@gmail.com> wrote:

> I'm running CDH4.6.0 with HBase 0.94.15-cdh4.6.0.  I was wondering, does
> the key need to be serialized?  Currently my keys are strings, not raw
> bytes.
>
> thanks,
> liam
>
>
> On Thu, Apr 24, 2014 at 6:28 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Which HBase version are you using ?
>>
>> Cheers
>>
>>
>> On Thu, Apr 24, 2014 at 6:24 PM, Liam Slusser <ls...@gmail.com> wrote:
>>
>> > Hey All -
>> >
>> > I'm having some strange results using FuzzyRowFilter.  I'm programming
>> in
>> > jython for that extra bit of adventure.
>> >
>> > My hbase key looks something like [random 10bytes][servicetype
>> > 12bytes][timestamp 10bytes] = 32 bytes total.  For an example key
>> > e23d4ac4b90002000100011398388474
>> >
>> > So the following code will find the above key:
>> >
>> > filter = FuzzyRowFilter([ Pair(array('b',
>> >
>> >
>> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
>> > array('b',
>> > [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]))])
>> >
>> > But I'm only able to match at the beginning of the key, never the
>> middle or
>> > at the end.
>> >
>> > This will not find the above key:
>> >
>> > filter = FuzzyRowFilter([ Pair(array('b',
>> >
>> >
>> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x004"),
>> > array('b',
>> > [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0]))])
>> >
>> > Am I doing something wrong?  Is there a better way to search for keys?
>> >  Really I'm going to want to search on the 12-byte service-type.
>> >
>> > Here is the full jython code:
>> >
>> > from array import array
>> > from org.apache.hadoop.hbase.util import Pair
>> > from org.apache.hadoop.hbase import HBaseConfiguration
>> > from org.apache.hadoop.hbase.client import HBaseAdmin, HTable, Scan
>> > from org.apache.hadoop.hbase.filter import FuzzyRowFilter
>> >
>> > conf = HBaseConfiguration()
>> > filter = FuzzyRowFilter([ Pair(array('b',
>> >
>> >
>> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
>> > array('b',
>> > [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1])) ])
>> >
>> > scan = Scan()
>> > scan.setFilter(filter)
>> > table = HTable(conf,'mytable')
>> > s = table.getScanner(scan)
>> >
>> > while True:
>> >     r = s.next()
>> >     if not r:
>> >         break
>> >     else:
>> >         print r
>> >
>>
>
>

Re: FuzzyRowFilter weird results

Posted by Liam Slusser <ls...@gmail.com>.
I'm running CDH4.6.0 with HBase 0.94.15-cdh4.6.0.  I was wondering, does
the key need to be serialized?  Currently my keys are strings, not raw
bytes.

thanks,
liam


On Thu, Apr 24, 2014 at 6:28 PM, Ted Yu <yu...@gmail.com> wrote:

> Which HBase version are you using ?
>
> Cheers
>
>
> On Thu, Apr 24, 2014 at 6:24 PM, Liam Slusser <ls...@gmail.com> wrote:
>
> > Hey All -
> >
> > I'm having some strange results using FuzzyRowFilter.  I'm programming in
> > jython for that extra bit of adventure.
> >
> > My hbase key looks something like [random 10bytes][servicetype
> > 12bytes][timestamp 10bytes] = 32 bytes total.  For an example key
> > e23d4ac4b90002000100011398388474
> >
> > So the following code will find the above key:
> >
> > filter = FuzzyRowFilter([ Pair(array('b',
> >
> >
> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
> > array('b',
> > [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]))])
> >
> > But I'm only able to match at the beginning of the key, never the middle
> or
> > at the end.
> >
> > This will not find the above key:
> >
> > filter = FuzzyRowFilter([ Pair(array('b',
> >
> >
> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x004"),
> > array('b',
> > [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0]))])
> >
> > Am I doing something wrong?  Is there a better way to search for keys?
> >  Really I'm going to want to search on the 12-byte service-type.
> >
> > Here is the full jython code:
> >
> > from array import array
> > from org.apache.hadoop.hbase.util import Pair
> > from org.apache.hadoop.hbase import HBaseConfiguration
> > from org.apache.hadoop.hbase.client import HBaseAdmin, HTable, Scan
> > from org.apache.hadoop.hbase.filter import FuzzyRowFilter
> >
> > conf = HBaseConfiguration()
> > filter = FuzzyRowFilter([ Pair(array('b',
> >
> >
> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
> > array('b',
> > [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1])) ])
> >
> > scan = Scan()
> > scan.setFilter(filter)
> > table = HTable(conf,'mytable')
> > s = table.getScanner(scan)
> >
> > while True:
> >     r = s.next()
> >     if not r:
> >         break
> >     else:
> >         print r
> >
>

Re: FuzzyRowFilter weird results

Posted by Ted Yu <yu...@gmail.com>.
Which HBase version are you using ?

Cheers


On Thu, Apr 24, 2014 at 6:24 PM, Liam Slusser <ls...@gmail.com> wrote:

> Hey All -
>
> I'm having some strange results using FuzzyRowFilter.  I'm programming in
> jython for that extra bit of adventure.
>
> My hbase key looks something like [random 10bytes][servicetype
> 12bytes][timestamp 10bytes] = 32 bytes total.  For an example key
> e23d4ac4b90002000100011398388474
>
> So the following code will find the above key:
>
> filter = FuzzyRowFilter([ Pair(array('b',
>
> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
> array('b',
> [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]))])
>
> But I'm only able to match at the beginning of the key, never the middle or
> at the end.
>
> This will not find the above key:
>
> filter = FuzzyRowFilter([ Pair(array('b',
>
> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x004"),
> array('b',
> [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0]))])
>
> Am I doing something wrong?  Is there a better way to search for keys?
>  Really I'm going to want to search on the 12-byte service-type.
>
> Here is the full jython code:
>
> from array import array
> from org.apache.hadoop.hbase.util import Pair
> from org.apache.hadoop.hbase import HBaseConfiguration
> from org.apache.hadoop.hbase.client import HBaseAdmin, HTable, Scan
> from org.apache.hadoop.hbase.filter import FuzzyRowFilter
>
> conf = HBaseConfiguration()
> filter = FuzzyRowFilter([ Pair(array('b',
>
> "e\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00"),
> array('b',
> [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1])) ])
>
> scan = Scan()
> scan.setFilter(filter)
> table = HTable(conf,'mytable')
> s = table.getScanner(scan)
>
> while True:
>     r = s.next()
>     if not r:
>         break
>     else:
>         print r
>