You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chris Bates <ch...@gmail.com> on 2010/01/24 04:49:53 UTC

help with filters

Hi all,

I'm trying to do an AND operation and I'm not sure if I did the filtering
correctly because HBase is hanging on me.

What I want is this:

I have two qualifiers, theme and IP, to my column user.  I'd like to print
out all matches (or maybe just 10) where the row has both of them in it.  My
impression is that this is what HBase would excel at, because the dataset is
VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or 2 will
have BOTH an IP and a theme in it.  Most of the time its just one or the
other.

So this is my code to make that query, but as I said, its hanging.
http://pastebin.com/m7fcef49

If I comment out the filters, the query runs just fine and will print null
wherever the value is not present.

Re: help with filters

Posted by Stack <st...@duboce.net>.
On Tue, Jan 26, 2010 at 5:24 PM, Chris Bates
<ch...@gmail.com> wrote:
> So are you saying that MUST_PASS_ALL might be flawed (for a FilterList of
> two QualifierFilters)?  If so, I can dig into the source and see if I can
> find anything.

Yes.  I'm not up to much on filters.  My experience is no one cares
about them much, not until they need them.  The unit tests around
filters are pretty good because of past experience where fellas would
put them together in exotic combos triggering not-so-exotic bugs.
There is a must pass all test so it basically works (it seems).  I'm
just saying, something about your particular combination is triggering
either a bug -- most likely -- or your expectation is off (unlikely).

I'm suggesting you put up a little harness and try and figure whats
up, if you have the time.  Make a patch if you can (sounds like you
have one patch in you, an improvement to the qualifiers filter so
other fellas don't pass full column name as you did).

...
>
> I haven't actually run the JUnit tests because I haven't dealt with JUnit
> before

Its not too tough. On command line:

> ant test

That'll run them all.

You are likely interested in one test only so do this:

> ant test -Dtestcase=TESTNAME

For example, to run the unit test org.apache.hadoop.hbase.TestHBase, you do:

> ant test -Dtestcase=TestHBase

.. you pass the class name only minus the package.

St.Ack



> ant test

, but if you suggest I run those I can do that as well. I was hoping
> someone could submit a working implementation of MPALL with 2
> QualfierFilters. I thought that might have been a pretty common operation.
>
>
> On Tue, Jan 26, 2010 at 8:01 PM, Stack <st...@duboce.net> wrote:
>
>> On Tue, Jan 26, 2010 at 4:51 PM, Chris Bates
>> <ch...@gmail.com> wrote:
>> >
>>
>> Must pass all "works" because there's a unit test that asserts so?
>> I'm not sure what it is about your data profile that is messing with
>> this functionality.  Its something involved where my guess is the only
>> way to figure it is to set up some kinda harness and step through the
>> debugger.  Any chance of your having a go at that Chris?
>>
>> Thanks,
>> St.Ack
>>
>>
>> > Second, I'm still not able to get the AND operation working.
>> >
>> > To illustrate:
>> >
>> > hbase(main):010:0> scan 'testTable', {COLUMNS=>["user:theme",
>> > "user:REMOTE_ADDR"]}
>> > ROW                          COLUMN+CELL
>> >
>> >  row1                        column=user:REMOTE_ADDR,
>> > timestamp=1264464021672, value=172.16.1.3
>> >  row1                        column=user:theme, timestamp=1264464041857,
>> > value=Frost
>> >  row2                        column=user:theme, timestamp=1264464058064,
>> > value=Sunshine
>> >  row3                        column=user:REMOTE_ADDR,
>> > timestamp=1264464083332, value=172.16.0.06
>> >
>> > With MUST_PASS_ALL enabled...
>> >
>> > If I comment out the REMOTE_ADDR filter, I get:
>> > IP: null Theme: Frost
>> > IP: null Theme: Sunshine
>> >
>> > If I comment out the theme filter, I get the reverse.
>> > IP: 172.16.1.3 Theme: null
>> > IP: 172.16.0.06 Theme: null
>> >
>> > If I leave both in, I get __nothing__, when I want:
>> > IP: 172.16.1.3 Theme: Frost
>> >
>> > I thought this might be due to HBase not being able to do an AND
>> operation
>> > on Qualifiers of the same column, so I created another testTable2 with
>> two
>> > different columns:
>> >
>> > hbase(main):024:0> scan 'testTable2'
>> > ROW                          COLUMN+CELL
>> >
>> >  row1                        column=addr:REMOTE_ADDR,
>> > timestamp=1264552425218, value=172.16.1.3
>> >  row1                        column=user:theme, timestamp=1264552375737,
>> > value=Frost
>> >  row2                        column=user:theme, timestamp=1264552505491,
>> > value=Sunshine
>> >  row3                        column=addr:REMOTE_ADDR,
>> > timestamp=1264552538651, value=172.16.0.36
>> >
>> > But nothing changed.
>> >
>> >
>> > Any other thoughts?  The only solution I can see to get this done is to
>> > implement a row counter for each column+qualifier and then store the
>> results
>> > that meet criteria that I expect, but I was hoping a native filter would
>> do
>> > the job.
>> >
>> >
>> > On Mon, Jan 25, 2010 at 8:43 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> See the TestFilterList under unit tests, src/test.  Can you mess
>> >> around with it using your data and see if it tells you anything?
>> >> There's a testMPALL in there.   Might give you a clue (Your code looks
>> >> fine)
>> >>
>> >> St.Ack
>> >>
>> >> On Mon, Jan 25, 2010 at 4:25 PM, Chris Bates
>> >> <ch...@gmail.com> wrote:
>> >> > thanks stack. i upgraded to the RC3 0.20.3.
>> >> >
>> >> > I was still getting the hanging, so I decided to create a real simple
>> >> table
>> >> > to try to see if I can get the logic working:
>> >> >
>> >> > hbase(main):031:0> scan 'testTable'
>> >> > ROW                          COLUMN+CELL
>> >> >
>> >> >  row1                        column=user:REMOTE_ADDR,
>> >> > timestamp=1264464021672, value=172.16.1.3
>> >> >  row1                        column=user:theme,
>> timestamp=1264464041857,
>> >> > value=Frost
>> >> >  row2                        column=user:theme,
>> timestamp=1264464058064,
>> >> > value=Sunshine
>> >> >  row3                        column=user:REMOTE_ADDR,
>> >> > timestamp=1264464083332, value=172.16.0.06
>> >> >
>> >> > Without the filter (http://pastebin.com/m20ba0d2d) this is my output
>> >> > client-side:
>> >> > IP: 172.16.1.3
>> >> > Theme: Frost
>> >> > IP: null
>> >> > Theme: Sunshine
>> >> > IP: 172.16.0.06
>> >> > Theme: null
>> >> >
>> >> > If I uncomment the setFilter, I get nothing.  I'm expecting to get the
>> >> first
>> >> > two lines (row1).  Thus I don't believe my filters are setup
>> correctly,
>> >> but
>> >> > I'm unsure where the error would be.
>> >> >
>> >> > Does anyone have any thoughts or examples?
>> >> >
>> >> > Thanks!
>> >> >
>> >> >
>> >> > On Mon, Jan 25, 2010 at 1:45 PM, Stack <st...@duboce.net> wrote:
>> >> >
>> >> >> Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
>> >> >>
>> >> >>
>> >>
>> http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log
>> >> >> .
>> >> >>  I believe what your issue fixed.
>> >> >> St.Ack
>> >> >>
>> >> >> On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
>> >> >> <ch...@gmail.com> wrote:
>> >> >> > 0.20.1
>> >> >> >
>> >> >> > On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:
>> >> >> >
>> >> >> >> What version of HBase?
>> >> >> >> St.Ack
>> >> >> >>
>> >> >> >> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
>> >> >> >> <ch...@gmail.com> wrote:
>> >> >> >> > Hi all,
>> >> >> >> >
>> >> >> >> > I'm trying to do an AND operation and I'm not sure if I did the
>> >> >> filtering
>> >> >> >> > correctly because HBase is hanging on me.
>> >> >> >> >
>> >> >> >> > What I want is this:
>> >> >> >> >
>> >> >> >> > I have two qualifiers, theme and IP, to my column user.  I'd
>> like
>> >> to
>> >> >> >> print
>> >> >> >> > out all matches (or maybe just 10) where the row has both of
>> them
>> >> in
>> >> >> it.
>> >> >> >>  My
>> >> >> >> > impression is that this is what HBase would excel at, because
>> the
>> >> >> dataset
>> >> >> >> is
>> >> >> >> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1
>> or
>> >> 2
>> >> >> will
>> >> >> >> > have BOTH an IP and a theme in it.  Most of the time its just
>> one
>> >> or
>> >> >> the
>> >> >> >> > other.
>> >> >> >> >
>> >> >> >> > So this is my code to make that query, but as I said, its
>> hanging.
>> >> >> >> > http://pastebin.com/m7fcef49
>> >> >> >> >
>> >> >> >> > If I comment out the filters, the query runs just fine and will
>> >> print
>> >> >> >> null
>> >> >> >> > wherever the value is not present.
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: help with filters

Posted by Chris Bates <ch...@gmail.com>.
So are you saying that MUST_PASS_ALL might be flawed (for a FilterList of
two QualifierFilters)?  If so, I can dig into the source and see if I can
find anything.

Or are you saying that my data profile is wrong? If so, can you (or someone
else) suggest one that works?

I tried this:
hbase(main):032:0> scan 'testTable3'
ROW                          COLUMN+CELL

 row1                        column=col1:qualifier-1,
timestamp=1264554774915, value=some_col1_qual1_value
 row1                        column=col1:qualifier-2,
timestamp=1264554866041, value=some_col1_qual2_value

And that doesn't work with the two QualifierFilters.


I haven't actually run the JUnit tests because I haven't dealt with JUnit
before, but if you suggest I run those I can do that as well. I was hoping
someone could submit a working implementation of MPALL with 2
QualfierFilters. I thought that might have been a pretty common operation.


On Tue, Jan 26, 2010 at 8:01 PM, Stack <st...@duboce.net> wrote:

> On Tue, Jan 26, 2010 at 4:51 PM, Chris Bates
> <ch...@gmail.com> wrote:
> >
>
> Must pass all "works" because there's a unit test that asserts so?
> I'm not sure what it is about your data profile that is messing with
> this functionality.  Its something involved where my guess is the only
> way to figure it is to set up some kinda harness and step through the
> debugger.  Any chance of your having a go at that Chris?
>
> Thanks,
> St.Ack
>
>
> > Second, I'm still not able to get the AND operation working.
> >
> > To illustrate:
> >
> > hbase(main):010:0> scan 'testTable', {COLUMNS=>["user:theme",
> > "user:REMOTE_ADDR"]}
> > ROW                          COLUMN+CELL
> >
> >  row1                        column=user:REMOTE_ADDR,
> > timestamp=1264464021672, value=172.16.1.3
> >  row1                        column=user:theme, timestamp=1264464041857,
> > value=Frost
> >  row2                        column=user:theme, timestamp=1264464058064,
> > value=Sunshine
> >  row3                        column=user:REMOTE_ADDR,
> > timestamp=1264464083332, value=172.16.0.06
> >
> > With MUST_PASS_ALL enabled...
> >
> > If I comment out the REMOTE_ADDR filter, I get:
> > IP: null Theme: Frost
> > IP: null Theme: Sunshine
> >
> > If I comment out the theme filter, I get the reverse.
> > IP: 172.16.1.3 Theme: null
> > IP: 172.16.0.06 Theme: null
> >
> > If I leave both in, I get __nothing__, when I want:
> > IP: 172.16.1.3 Theme: Frost
> >
> > I thought this might be due to HBase not being able to do an AND
> operation
> > on Qualifiers of the same column, so I created another testTable2 with
> two
> > different columns:
> >
> > hbase(main):024:0> scan 'testTable2'
> > ROW                          COLUMN+CELL
> >
> >  row1                        column=addr:REMOTE_ADDR,
> > timestamp=1264552425218, value=172.16.1.3
> >  row1                        column=user:theme, timestamp=1264552375737,
> > value=Frost
> >  row2                        column=user:theme, timestamp=1264552505491,
> > value=Sunshine
> >  row3                        column=addr:REMOTE_ADDR,
> > timestamp=1264552538651, value=172.16.0.36
> >
> > But nothing changed.
> >
> >
> > Any other thoughts?  The only solution I can see to get this done is to
> > implement a row counter for each column+qualifier and then store the
> results
> > that meet criteria that I expect, but I was hoping a native filter would
> do
> > the job.
> >
> >
> > On Mon, Jan 25, 2010 at 8:43 PM, Stack <st...@duboce.net> wrote:
> >
> >> See the TestFilterList under unit tests, src/test.  Can you mess
> >> around with it using your data and see if it tells you anything?
> >> There's a testMPALL in there.   Might give you a clue (Your code looks
> >> fine)
> >>
> >> St.Ack
> >>
> >> On Mon, Jan 25, 2010 at 4:25 PM, Chris Bates
> >> <ch...@gmail.com> wrote:
> >> > thanks stack. i upgraded to the RC3 0.20.3.
> >> >
> >> > I was still getting the hanging, so I decided to create a real simple
> >> table
> >> > to try to see if I can get the logic working:
> >> >
> >> > hbase(main):031:0> scan 'testTable'
> >> > ROW                          COLUMN+CELL
> >> >
> >> >  row1                        column=user:REMOTE_ADDR,
> >> > timestamp=1264464021672, value=172.16.1.3
> >> >  row1                        column=user:theme,
> timestamp=1264464041857,
> >> > value=Frost
> >> >  row2                        column=user:theme,
> timestamp=1264464058064,
> >> > value=Sunshine
> >> >  row3                        column=user:REMOTE_ADDR,
> >> > timestamp=1264464083332, value=172.16.0.06
> >> >
> >> > Without the filter (http://pastebin.com/m20ba0d2d) this is my output
> >> > client-side:
> >> > IP: 172.16.1.3
> >> > Theme: Frost
> >> > IP: null
> >> > Theme: Sunshine
> >> > IP: 172.16.0.06
> >> > Theme: null
> >> >
> >> > If I uncomment the setFilter, I get nothing.  I'm expecting to get the
> >> first
> >> > two lines (row1).  Thus I don't believe my filters are setup
> correctly,
> >> but
> >> > I'm unsure where the error would be.
> >> >
> >> > Does anyone have any thoughts or examples?
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > On Mon, Jan 25, 2010 at 1:45 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> >> Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
> >> >>
> >> >>
> >>
> http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log
> >> >> .
> >> >>  I believe what your issue fixed.
> >> >> St.Ack
> >> >>
> >> >> On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
> >> >> <ch...@gmail.com> wrote:
> >> >> > 0.20.1
> >> >> >
> >> >> > On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:
> >> >> >
> >> >> >> What version of HBase?
> >> >> >> St.Ack
> >> >> >>
> >> >> >> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
> >> >> >> <ch...@gmail.com> wrote:
> >> >> >> > Hi all,
> >> >> >> >
> >> >> >> > I'm trying to do an AND operation and I'm not sure if I did the
> >> >> filtering
> >> >> >> > correctly because HBase is hanging on me.
> >> >> >> >
> >> >> >> > What I want is this:
> >> >> >> >
> >> >> >> > I have two qualifiers, theme and IP, to my column user.  I'd
> like
> >> to
> >> >> >> print
> >> >> >> > out all matches (or maybe just 10) where the row has both of
> them
> >> in
> >> >> it.
> >> >> >>  My
> >> >> >> > impression is that this is what HBase would excel at, because
> the
> >> >> dataset
> >> >> >> is
> >> >> >> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1
> or
> >> 2
> >> >> will
> >> >> >> > have BOTH an IP and a theme in it.  Most of the time its just
> one
> >> or
> >> >> the
> >> >> >> > other.
> >> >> >> >
> >> >> >> > So this is my code to make that query, but as I said, its
> hanging.
> >> >> >> > http://pastebin.com/m7fcef49
> >> >> >> >
> >> >> >> > If I comment out the filters, the query runs just fine and will
> >> print
> >> >> >> null
> >> >> >> > wherever the value is not present.
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: help with filters

Posted by Stack <st...@duboce.net>.
On Tue, Jan 26, 2010 at 4:51 PM, Chris Bates
<ch...@gmail.com> wrote:
>

Must pass all "works" because there's a unit test that asserts so?
I'm not sure what it is about your data profile that is messing with
this functionality.  Its something involved where my guess is the only
way to figure it is to set up some kinda harness and step through the
debugger.  Any chance of your having a go at that Chris?

Thanks,
St.Ack


> Second, I'm still not able to get the AND operation working.
>
> To illustrate:
>
> hbase(main):010:0> scan 'testTable', {COLUMNS=>["user:theme",
> "user:REMOTE_ADDR"]}
> ROW                          COLUMN+CELL
>
>  row1                        column=user:REMOTE_ADDR,
> timestamp=1264464021672, value=172.16.1.3
>  row1                        column=user:theme, timestamp=1264464041857,
> value=Frost
>  row2                        column=user:theme, timestamp=1264464058064,
> value=Sunshine
>  row3                        column=user:REMOTE_ADDR,
> timestamp=1264464083332, value=172.16.0.06
>
> With MUST_PASS_ALL enabled...
>
> If I comment out the REMOTE_ADDR filter, I get:
> IP: null Theme: Frost
> IP: null Theme: Sunshine
>
> If I comment out the theme filter, I get the reverse.
> IP: 172.16.1.3 Theme: null
> IP: 172.16.0.06 Theme: null
>
> If I leave both in, I get __nothing__, when I want:
> IP: 172.16.1.3 Theme: Frost
>
> I thought this might be due to HBase not being able to do an AND operation
> on Qualifiers of the same column, so I created another testTable2 with two
> different columns:
>
> hbase(main):024:0> scan 'testTable2'
> ROW                          COLUMN+CELL
>
>  row1                        column=addr:REMOTE_ADDR,
> timestamp=1264552425218, value=172.16.1.3
>  row1                        column=user:theme, timestamp=1264552375737,
> value=Frost
>  row2                        column=user:theme, timestamp=1264552505491,
> value=Sunshine
>  row3                        column=addr:REMOTE_ADDR,
> timestamp=1264552538651, value=172.16.0.36
>
> But nothing changed.
>
>
> Any other thoughts?  The only solution I can see to get this done is to
> implement a row counter for each column+qualifier and then store the results
> that meet criteria that I expect, but I was hoping a native filter would do
> the job.
>
>
> On Mon, Jan 25, 2010 at 8:43 PM, Stack <st...@duboce.net> wrote:
>
>> See the TestFilterList under unit tests, src/test.  Can you mess
>> around with it using your data and see if it tells you anything?
>> There's a testMPALL in there.   Might give you a clue (Your code looks
>> fine)
>>
>> St.Ack
>>
>> On Mon, Jan 25, 2010 at 4:25 PM, Chris Bates
>> <ch...@gmail.com> wrote:
>> > thanks stack. i upgraded to the RC3 0.20.3.
>> >
>> > I was still getting the hanging, so I decided to create a real simple
>> table
>> > to try to see if I can get the logic working:
>> >
>> > hbase(main):031:0> scan 'testTable'
>> > ROW                          COLUMN+CELL
>> >
>> >  row1                        column=user:REMOTE_ADDR,
>> > timestamp=1264464021672, value=172.16.1.3
>> >  row1                        column=user:theme, timestamp=1264464041857,
>> > value=Frost
>> >  row2                        column=user:theme, timestamp=1264464058064,
>> > value=Sunshine
>> >  row3                        column=user:REMOTE_ADDR,
>> > timestamp=1264464083332, value=172.16.0.06
>> >
>> > Without the filter (http://pastebin.com/m20ba0d2d) this is my output
>> > client-side:
>> > IP: 172.16.1.3
>> > Theme: Frost
>> > IP: null
>> > Theme: Sunshine
>> > IP: 172.16.0.06
>> > Theme: null
>> >
>> > If I uncomment the setFilter, I get nothing.  I'm expecting to get the
>> first
>> > two lines (row1).  Thus I don't believe my filters are setup correctly,
>> but
>> > I'm unsure where the error would be.
>> >
>> > Does anyone have any thoughts or examples?
>> >
>> > Thanks!
>> >
>> >
>> > On Mon, Jan 25, 2010 at 1:45 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
>> >>
>> >>
>> http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log
>> >> .
>> >>  I believe what your issue fixed.
>> >> St.Ack
>> >>
>> >> On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
>> >> <ch...@gmail.com> wrote:
>> >> > 0.20.1
>> >> >
>> >> > On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:
>> >> >
>> >> >> What version of HBase?
>> >> >> St.Ack
>> >> >>
>> >> >> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
>> >> >> <ch...@gmail.com> wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I'm trying to do an AND operation and I'm not sure if I did the
>> >> filtering
>> >> >> > correctly because HBase is hanging on me.
>> >> >> >
>> >> >> > What I want is this:
>> >> >> >
>> >> >> > I have two qualifiers, theme and IP, to my column user.  I'd like
>> to
>> >> >> print
>> >> >> > out all matches (or maybe just 10) where the row has both of them
>> in
>> >> it.
>> >> >>  My
>> >> >> > impression is that this is what HBase would excel at, because the
>> >> dataset
>> >> >> is
>> >> >> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or
>> 2
>> >> will
>> >> >> > have BOTH an IP and a theme in it.  Most of the time its just one
>> or
>> >> the
>> >> >> > other.
>> >> >> >
>> >> >> > So this is my code to make that query, but as I said, its hanging.
>> >> >> > http://pastebin.com/m7fcef49
>> >> >> >
>> >> >> > If I comment out the filters, the query runs just fine and will
>> print
>> >> >> null
>> >> >> > wherever the value is not present.
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: help with filters

Posted by Chris Bates <ch...@gmail.com>.
So there was a couple points of confusion--one that I was able to solve, the
other I'm still struggling with.

First, after looking at the TestFilter and TestFilterList code (thanks for
pointing that out), the QualifierFilter seems like it should only take the
column qualifier in the BinaryComparator.  I was previously passing
"user:theme" when it should have just been "theme".  Filters worked after
this change.

Second, I'm still not able to get the AND operation working.

To illustrate:

hbase(main):010:0> scan 'testTable', {COLUMNS=>["user:theme",
"user:REMOTE_ADDR"]}
ROW                          COLUMN+CELL

 row1                        column=user:REMOTE_ADDR,
timestamp=1264464021672, value=172.16.1.3
 row1                        column=user:theme, timestamp=1264464041857,
value=Frost
 row2                        column=user:theme, timestamp=1264464058064,
value=Sunshine
 row3                        column=user:REMOTE_ADDR,
timestamp=1264464083332, value=172.16.0.06

With MUST_PASS_ALL enabled...

If I comment out the REMOTE_ADDR filter, I get:
IP: null Theme: Frost
IP: null Theme: Sunshine

If I comment out the theme filter, I get the reverse.
IP: 172.16.1.3 Theme: null
IP: 172.16.0.06 Theme: null

If I leave both in, I get __nothing__, when I want:
IP: 172.16.1.3 Theme: Frost

I thought this might be due to HBase not being able to do an AND operation
on Qualifiers of the same column, so I created another testTable2 with two
different columns:

hbase(main):024:0> scan 'testTable2'
ROW                          COLUMN+CELL

 row1                        column=addr:REMOTE_ADDR,
timestamp=1264552425218, value=172.16.1.3
 row1                        column=user:theme, timestamp=1264552375737,
value=Frost
 row2                        column=user:theme, timestamp=1264552505491,
value=Sunshine
 row3                        column=addr:REMOTE_ADDR,
timestamp=1264552538651, value=172.16.0.36

But nothing changed.


Any other thoughts?  The only solution I can see to get this done is to
implement a row counter for each column+qualifier and then store the results
that meet criteria that I expect, but I was hoping a native filter would do
the job.


On Mon, Jan 25, 2010 at 8:43 PM, Stack <st...@duboce.net> wrote:

> See the TestFilterList under unit tests, src/test.  Can you mess
> around with it using your data and see if it tells you anything?
> There's a testMPALL in there.   Might give you a clue (Your code looks
> fine)
>
> St.Ack
>
> On Mon, Jan 25, 2010 at 4:25 PM, Chris Bates
> <ch...@gmail.com> wrote:
> > thanks stack. i upgraded to the RC3 0.20.3.
> >
> > I was still getting the hanging, so I decided to create a real simple
> table
> > to try to see if I can get the logic working:
> >
> > hbase(main):031:0> scan 'testTable'
> > ROW                          COLUMN+CELL
> >
> >  row1                        column=user:REMOTE_ADDR,
> > timestamp=1264464021672, value=172.16.1.3
> >  row1                        column=user:theme, timestamp=1264464041857,
> > value=Frost
> >  row2                        column=user:theme, timestamp=1264464058064,
> > value=Sunshine
> >  row3                        column=user:REMOTE_ADDR,
> > timestamp=1264464083332, value=172.16.0.06
> >
> > Without the filter (http://pastebin.com/m20ba0d2d) this is my output
> > client-side:
> > IP: 172.16.1.3
> > Theme: Frost
> > IP: null
> > Theme: Sunshine
> > IP: 172.16.0.06
> > Theme: null
> >
> > If I uncomment the setFilter, I get nothing.  I'm expecting to get the
> first
> > two lines (row1).  Thus I don't believe my filters are setup correctly,
> but
> > I'm unsure where the error would be.
> >
> > Does anyone have any thoughts or examples?
> >
> > Thanks!
> >
> >
> > On Mon, Jan 25, 2010 at 1:45 PM, Stack <st...@duboce.net> wrote:
> >
> >> Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log
> >> .
> >>  I believe what your issue fixed.
> >> St.Ack
> >>
> >> On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
> >> <ch...@gmail.com> wrote:
> >> > 0.20.1
> >> >
> >> > On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> >> What version of HBase?
> >> >> St.Ack
> >> >>
> >> >> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
> >> >> <ch...@gmail.com> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I'm trying to do an AND operation and I'm not sure if I did the
> >> filtering
> >> >> > correctly because HBase is hanging on me.
> >> >> >
> >> >> > What I want is this:
> >> >> >
> >> >> > I have two qualifiers, theme and IP, to my column user.  I'd like
> to
> >> >> print
> >> >> > out all matches (or maybe just 10) where the row has both of them
> in
> >> it.
> >> >>  My
> >> >> > impression is that this is what HBase would excel at, because the
> >> dataset
> >> >> is
> >> >> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or
> 2
> >> will
> >> >> > have BOTH an IP and a theme in it.  Most of the time its just one
> or
> >> the
> >> >> > other.
> >> >> >
> >> >> > So this is my code to make that query, but as I said, its hanging.
> >> >> > http://pastebin.com/m7fcef49
> >> >> >
> >> >> > If I comment out the filters, the query runs just fine and will
> print
> >> >> null
> >> >> > wherever the value is not present.
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: help with filters

Posted by Stack <st...@duboce.net>.
See the TestFilterList under unit tests, src/test.  Can you mess
around with it using your data and see if it tells you anything?
There's a testMPALL in there.   Might give you a clue (Your code looks
fine)

St.Ack

On Mon, Jan 25, 2010 at 4:25 PM, Chris Bates
<ch...@gmail.com> wrote:
> thanks stack. i upgraded to the RC3 0.20.3.
>
> I was still getting the hanging, so I decided to create a real simple table
> to try to see if I can get the logic working:
>
> hbase(main):031:0> scan 'testTable'
> ROW                          COLUMN+CELL
>
>  row1                        column=user:REMOTE_ADDR,
> timestamp=1264464021672, value=172.16.1.3
>  row1                        column=user:theme, timestamp=1264464041857,
> value=Frost
>  row2                        column=user:theme, timestamp=1264464058064,
> value=Sunshine
>  row3                        column=user:REMOTE_ADDR,
> timestamp=1264464083332, value=172.16.0.06
>
> Without the filter (http://pastebin.com/m20ba0d2d) this is my output
> client-side:
> IP: 172.16.1.3
> Theme: Frost
> IP: null
> Theme: Sunshine
> IP: 172.16.0.06
> Theme: null
>
> If I uncomment the setFilter, I get nothing.  I'm expecting to get the first
> two lines (row1).  Thus I don't believe my filters are setup correctly, but
> I'm unsure where the error would be.
>
> Does anyone have any thoughts or examples?
>
> Thanks!
>
>
> On Mon, Jan 25, 2010 at 1:45 PM, Stack <st...@duboce.net> wrote:
>
>> Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
>>
>> http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log
>> .
>>  I believe what your issue fixed.
>> St.Ack
>>
>> On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
>> <ch...@gmail.com> wrote:
>> > 0.20.1
>> >
>> > On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> What version of HBase?
>> >> St.Ack
>> >>
>> >> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
>> >> <ch...@gmail.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > I'm trying to do an AND operation and I'm not sure if I did the
>> filtering
>> >> > correctly because HBase is hanging on me.
>> >> >
>> >> > What I want is this:
>> >> >
>> >> > I have two qualifiers, theme and IP, to my column user.  I'd like to
>> >> print
>> >> > out all matches (or maybe just 10) where the row has both of them in
>> it.
>> >>  My
>> >> > impression is that this is what HBase would excel at, because the
>> dataset
>> >> is
>> >> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or 2
>> will
>> >> > have BOTH an IP and a theme in it.  Most of the time its just one or
>> the
>> >> > other.
>> >> >
>> >> > So this is my code to make that query, but as I said, its hanging.
>> >> > http://pastebin.com/m7fcef49
>> >> >
>> >> > If I comment out the filters, the query runs just fine and will print
>> >> null
>> >> > wherever the value is not present.
>> >> >
>> >>
>> >
>>
>

Re: help with filters

Posted by Chris Bates <ch...@gmail.com>.
thanks stack. i upgraded to the RC3 0.20.3.

I was still getting the hanging, so I decided to create a real simple table
to try to see if I can get the logic working:

hbase(main):031:0> scan 'testTable'
ROW                          COLUMN+CELL

 row1                        column=user:REMOTE_ADDR,
timestamp=1264464021672, value=172.16.1.3
 row1                        column=user:theme, timestamp=1264464041857,
value=Frost
 row2                        column=user:theme, timestamp=1264464058064,
value=Sunshine
 row3                        column=user:REMOTE_ADDR,
timestamp=1264464083332, value=172.16.0.06

Without the filter (http://pastebin.com/m20ba0d2d) this is my output
client-side:
IP: 172.16.1.3
Theme: Frost
IP: null
Theme: Sunshine
IP: 172.16.0.06
Theme: null

If I uncomment the setFilter, I get nothing.  I'm expecting to get the first
two lines (row1).  Thus I don't believe my filters are setup correctly, but
I'm unsure where the error would be.

Does anyone have any thoughts or examples?

Thanks!


On Mon, Jan 25, 2010 at 1:45 PM, Stack <st...@duboce.net> wrote:

> Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
>
> http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log
> .
>  I believe what your issue fixed.
> St.Ack
>
> On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
> <ch...@gmail.com> wrote:
> > 0.20.1
> >
> > On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:
> >
> >> What version of HBase?
> >> St.Ack
> >>
> >> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
> >> <ch...@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I'm trying to do an AND operation and I'm not sure if I did the
> filtering
> >> > correctly because HBase is hanging on me.
> >> >
> >> > What I want is this:
> >> >
> >> > I have two qualifiers, theme and IP, to my column user.  I'd like to
> >> print
> >> > out all matches (or maybe just 10) where the row has both of them in
> it.
> >>  My
> >> > impression is that this is what HBase would excel at, because the
> dataset
> >> is
> >> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or 2
> will
> >> > have BOTH an IP and a theme in it.  Most of the time its just one or
> the
> >> > other.
> >> >
> >> > So this is my code to make that query, but as I said, its hanging.
> >> > http://pastebin.com/m7fcef49
> >> >
> >> > If I comment out the filters, the query runs just fine and will print
> >> null
> >> > wherever the value is not present.
> >> >
> >>
> >
>

Re: help with filters

Posted by Stack <st...@duboce.net>.
Check out the CHANGES in 0.20.2 and even in 0.20.3RC3:
http://svn.apache.org/viewvc/hadoop/hbase/branches/0.20/CHANGES.txt?view=log.
 I believe what your issue fixed.
St.Ack

On Mon, Jan 25, 2010 at 10:36 AM, Chris Bates
<ch...@gmail.com> wrote:
> 0.20.1
>
> On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:
>
>> What version of HBase?
>> St.Ack
>>
>> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
>> <ch...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I'm trying to do an AND operation and I'm not sure if I did the filtering
>> > correctly because HBase is hanging on me.
>> >
>> > What I want is this:
>> >
>> > I have two qualifiers, theme and IP, to my column user.  I'd like to
>> print
>> > out all matches (or maybe just 10) where the row has both of them in it.
>>  My
>> > impression is that this is what HBase would excel at, because the dataset
>> is
>> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or 2 will
>> > have BOTH an IP and a theme in it.  Most of the time its just one or the
>> > other.
>> >
>> > So this is my code to make that query, but as I said, its hanging.
>> > http://pastebin.com/m7fcef49
>> >
>> > If I comment out the filters, the query runs just fine and will print
>> null
>> > wherever the value is not present.
>> >
>>
>

Re: help with filters

Posted by Chris Bates <ch...@gmail.com>.
0.20.1

On Mon, Jan 25, 2010 at 1:31 PM, Stack <st...@duboce.net> wrote:

> What version of HBase?
> St.Ack
>
> On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
> <ch...@gmail.com> wrote:
> > Hi all,
> >
> > I'm trying to do an AND operation and I'm not sure if I did the filtering
> > correctly because HBase is hanging on me.
> >
> > What I want is this:
> >
> > I have two qualifiers, theme and IP, to my column user.  I'd like to
> print
> > out all matches (or maybe just 10) where the row has both of them in it.
>  My
> > impression is that this is what HBase would excel at, because the dataset
> is
> > VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or 2 will
> > have BOTH an IP and a theme in it.  Most of the time its just one or the
> > other.
> >
> > So this is my code to make that query, but as I said, its hanging.
> > http://pastebin.com/m7fcef49
> >
> > If I comment out the filters, the query runs just fine and will print
> null
> > wherever the value is not present.
> >
>

Re: help with filters

Posted by Stack <st...@duboce.net>.
What version of HBase?
St.Ack

On Sat, Jan 23, 2010 at 7:49 PM, Chris Bates
<ch...@gmail.com> wrote:
> Hi all,
>
> I'm trying to do an AND operation and I'm not sure if I did the filtering
> correctly because HBase is hanging on me.
>
> What I want is this:
>
> I have two qualifiers, theme and IP, to my column user.  I'd like to print
> out all matches (or maybe just 10) where the row has both of them in it.  My
> impression is that this is what HBase would excel at, because the dataset is
> VERY sparse, meaning that out of 1000-10,000 rows, maybe just 1 or 2 will
> have BOTH an IP and a theme in it.  Most of the time its just one or the
> other.
>
> So this is my code to make that query, but as I said, its hanging.
> http://pastebin.com/m7fcef49
>
> If I comment out the filters, the query runs just fine and will print null
> wherever the value is not present.
>