You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shahab Yunus <sh...@gmail.com> on 2014/11/18 06:22:50 UTC

Hierarchy of filters and filters list

Hi,

I have data where each row has start and end time stored in UTC (long). The
table is created through Phoenix and the columns have type UNSIGNED_DATE
(which according to Phoenix docs
<http://phoenix.apache.org/language/datatypes.html#unsigned_date_type> does
Hbase.toBytes(long) underneath for 8 bye long). I am storing data in this
table using regular Bytes.toBytes from HBase api as well.

Now I want to query data given a time range, and get all rows lying within
or overlapping the search range. Pretty standard scenario.

For this I create a set of filtersList. A hierarchy of filtersList and
filters in fact.

If search criteria timerange  is denoted by *sd* and *ed*

And each row's date columns are denoted as *s* and *e* (signifying start
and end datetimes.)

These 4 filterLists are created as per logic given below....

filterListLeft (must past all)= This further contains 2 filters= (sd<= s
and ed>=s)

filterListRight (must past all)=This further contains 2 filters= (sd<= e
and ed>=e)

filterListOverlap (must past all)=This further contains 2 filters= (sd<= s
and ed>=e)

filterListWiithin (must past all)= This further contains 2 filters= (sd>= s
and ed<=e)


Then I add these 4 filterLists into another filterList and that must past
one. I realize that some records might satisfy more than one filter above.
But that is OK.

parentFilterList = new FilterList(must past one)
parentFilterList.addFilter(filterListLeft):
parentFilterList.addFilter(filterListRight):
parentFilterList.addFilter(filterListOverlap):
parentFilterList.addFilter(filterListWithin):

Note all filters have setFilterIfMissing = true.

Then I pass parentFilterList.addFilter to the scanner.

So it is like= (A and B) or (B and C) or (D and E) or (F and G)

But what is happening is that I only get data back for the records matching
filterListWithin. No records which satisfy the other 3 filterList criteria
comeback. The data exists and is valid form for other scenarios. I can also
view it through Phoenix UI tools.

What am I missing? Could this be a phoenix issue?

Thanks like always.

Regards,
Shahab

Re: Hierarchy of filters and filters list

Posted by Ted Yu <yu...@gmail.com>.
See TestSingleColumnValueFilter which tests SingleColumnValueFilter without
spinning up minicluster.

TestHRegion#testIndexesScanWithOneDeletedRow() is an example where
FilterList is involved.

Cheers



On Tue, Nov 18, 2014 at 8:51 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> I don't have a unit test for HBase right now. If you can provide me a
> sample or directions, then I can try.
>
> Regards,
> Shahab
>
> On Tue, Nov 18, 2014 at 11:24 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Are you able to reproduce this using a unit test ?
> >
> > I will take a closer look.
> >
> > Thanks
> >
> > On Nov 18, 2014, at 8:06 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
> >
> > > You mean if used independently? Yes, they do.
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Tue, Nov 18, 2014 at 10:51 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Have you verified that at least one of the following (when used alone)
> > >> returns data ?
> > >> (A and B), (B and C), (D and E)
> > >>
> > >> Thanks
> > >>
> > >> On Mon, Nov 17, 2014 at 9:27 PM, Shahab Yunus <shahab.yunus@gmail.com
> >
> > >> wrote:
> > >>
> > >>> Missed couple of things.
> > >>>
> > >>> 1- I am using SingleColumnValueFilter and the comparator
> > >>> is BinaryComparator which is passed into it.
> > >>>
> > >>> 2- CDH 5.1.0
> > >>> (Hbase is 0.98.1-cdh5.1.0)
> > >>>
> > >>> Regards,
> > >>> Shahab
> > >>>
> > >>> On Tue, Nov 18, 2014 at 12:22 AM, Shahab Yunus <
> shahab.yunus@gmail.com
> > >
> > >>> wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> I have data where each row has start and end time stored in UTC
> > (long).
> > >>>> The table is created through Phoenix and the columns have type
> > >>>> UNSIGNED_DATE (which according to Phoenix docs
> > >>>> <
> http://phoenix.apache.org/language/datatypes.html#unsigned_date_type
> > >
> > >>>> does Hbase.toBytes(long) underneath for 8 bye long). I am storing
> data
> > >> in
> > >>>> this table using regular Bytes.toBytes from HBase api as well.
> > >>>>
> > >>>> Now I want to query data given a time range, and get all rows lying
> > >>> within
> > >>>> or overlapping the search range. Pretty standard scenario.
> > >>>>
> > >>>> For this I create a set of filtersList. A hierarchy of filtersList
> and
> > >>>> filters in fact.
> > >>>>
> > >>>> If search criteria timerange  is denoted by *sd* and *ed*
> > >>>>
> > >>>> And each row's date columns are denoted as *s* and *e* (signifying
> > >> start
> > >>>> and end datetimes.)
> > >>>>
> > >>>> These 4 filterLists are created as per logic given below....
> > >>>>
> > >>>> filterListLeft (must past all)= This further contains 2 filters=
> (sd<=
> > >> s
> > >>>> and ed>=s)
> > >>>>
> > >>>> filterListRight (must past all)=This further contains 2 filters=
> (sd<=
> > >> e
> > >>>> and ed>=e)
> > >>>>
> > >>>> filterListOverlap (must past all)=This further contains 2 filters=
> > >> (sd<=
> > >>> s
> > >>>> and ed>=e)
> > >>>>
> > >>>> filterListWiithin (must past all)= This further contains 2 filters=
> > >> (sd>=
> > >>>> s and ed<=e)
> > >>>>
> > >>>>
> > >>>> Then I add these 4 filterLists into another filterList and that must
> > >> past
> > >>>> one. I realize that some records might satisfy more than one filter
> > >>> above.
> > >>>> But that is OK.
> > >>>>
> > >>>> parentFilterList = new FilterList(must past one)
> > >>>> parentFilterList.addFilter(filterListLeft):
> > >>>> parentFilterList.addFilter(filterListRight):
> > >>>> parentFilterList.addFilter(filterListOverlap):
> > >>>> parentFilterList.addFilter(filterListWithin):
> > >>>>
> > >>>> Note all filters have setFilterIfMissing = true.
> > >>>>
> > >>>> Then I pass parentFilterList.addFilter to the scanner.
> > >>>>
> > >>>> So it is like= (A and B) or (B and C) or (D and E) or (F and G)
> > >>>>
> > >>>> But what is happening is that I only get data back for the records
> > >>>> matching filterListWithin. No records which satisfy the other 3
> > >>> filterList
> > >>>> criteria comeback. The data exists and is valid form for other
> > >>> scenarios. I
> > >>>> can also view it through Phoenix UI tools.
> > >>>>
> > >>>> What am I missing? Could this be a phoenix issue?
> > >>>>
> > >>>> Thanks like always.
> > >>>>
> > >>>> Regards,
> > >>>> Shahab
> > >>
> >
>

Re: Hierarchy of filters and filters list

Posted by Shahab Yunus <sh...@gmail.com>.
I don't have a unit test for HBase right now. If you can provide me a
sample or directions, then I can try.

Regards,
Shahab

On Tue, Nov 18, 2014 at 11:24 AM, Ted Yu <yu...@gmail.com> wrote:

> Are you able to reproduce this using a unit test ?
>
> I will take a closer look.
>
> Thanks
>
> On Nov 18, 2014, at 8:06 AM, Shahab Yunus <sh...@gmail.com> wrote:
>
> > You mean if used independently? Yes, they do.
> >
> > Regards,
> > Shahab
> >
> > On Tue, Nov 18, 2014 at 10:51 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Have you verified that at least one of the following (when used alone)
> >> returns data ?
> >> (A and B), (B and C), (D and E)
> >>
> >> Thanks
> >>
> >> On Mon, Nov 17, 2014 at 9:27 PM, Shahab Yunus <sh...@gmail.com>
> >> wrote:
> >>
> >>> Missed couple of things.
> >>>
> >>> 1- I am using SingleColumnValueFilter and the comparator
> >>> is BinaryComparator which is passed into it.
> >>>
> >>> 2- CDH 5.1.0
> >>> (Hbase is 0.98.1-cdh5.1.0)
> >>>
> >>> Regards,
> >>> Shahab
> >>>
> >>> On Tue, Nov 18, 2014 at 12:22 AM, Shahab Yunus <shahab.yunus@gmail.com
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have data where each row has start and end time stored in UTC
> (long).
> >>>> The table is created through Phoenix and the columns have type
> >>>> UNSIGNED_DATE (which according to Phoenix docs
> >>>> <http://phoenix.apache.org/language/datatypes.html#unsigned_date_type
> >
> >>>> does Hbase.toBytes(long) underneath for 8 bye long). I am storing data
> >> in
> >>>> this table using regular Bytes.toBytes from HBase api as well.
> >>>>
> >>>> Now I want to query data given a time range, and get all rows lying
> >>> within
> >>>> or overlapping the search range. Pretty standard scenario.
> >>>>
> >>>> For this I create a set of filtersList. A hierarchy of filtersList and
> >>>> filters in fact.
> >>>>
> >>>> If search criteria timerange  is denoted by *sd* and *ed*
> >>>>
> >>>> And each row's date columns are denoted as *s* and *e* (signifying
> >> start
> >>>> and end datetimes.)
> >>>>
> >>>> These 4 filterLists are created as per logic given below....
> >>>>
> >>>> filterListLeft (must past all)= This further contains 2 filters= (sd<=
> >> s
> >>>> and ed>=s)
> >>>>
> >>>> filterListRight (must past all)=This further contains 2 filters= (sd<=
> >> e
> >>>> and ed>=e)
> >>>>
> >>>> filterListOverlap (must past all)=This further contains 2 filters=
> >> (sd<=
> >>> s
> >>>> and ed>=e)
> >>>>
> >>>> filterListWiithin (must past all)= This further contains 2 filters=
> >> (sd>=
> >>>> s and ed<=e)
> >>>>
> >>>>
> >>>> Then I add these 4 filterLists into another filterList and that must
> >> past
> >>>> one. I realize that some records might satisfy more than one filter
> >>> above.
> >>>> But that is OK.
> >>>>
> >>>> parentFilterList = new FilterList(must past one)
> >>>> parentFilterList.addFilter(filterListLeft):
> >>>> parentFilterList.addFilter(filterListRight):
> >>>> parentFilterList.addFilter(filterListOverlap):
> >>>> parentFilterList.addFilter(filterListWithin):
> >>>>
> >>>> Note all filters have setFilterIfMissing = true.
> >>>>
> >>>> Then I pass parentFilterList.addFilter to the scanner.
> >>>>
> >>>> So it is like= (A and B) or (B and C) or (D and E) or (F and G)
> >>>>
> >>>> But what is happening is that I only get data back for the records
> >>>> matching filterListWithin. No records which satisfy the other 3
> >>> filterList
> >>>> criteria comeback. The data exists and is valid form for other
> >>> scenarios. I
> >>>> can also view it through Phoenix UI tools.
> >>>>
> >>>> What am I missing? Could this be a phoenix issue?
> >>>>
> >>>> Thanks like always.
> >>>>
> >>>> Regards,
> >>>> Shahab
> >>
>

Re: Hierarchy of filters and filters list

Posted by Ted Yu <yu...@gmail.com>.
Are you able to reproduce this using a unit test ?

I will take a closer look. 

Thanks 

On Nov 18, 2014, at 8:06 AM, Shahab Yunus <sh...@gmail.com> wrote:

> You mean if used independently? Yes, they do.
> 
> Regards,
> Shahab
> 
> On Tue, Nov 18, 2014 at 10:51 AM, Ted Yu <yu...@gmail.com> wrote:
> 
>> Have you verified that at least one of the following (when used alone)
>> returns data ?
>> (A and B), (B and C), (D and E)
>> 
>> Thanks
>> 
>> On Mon, Nov 17, 2014 at 9:27 PM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>> 
>>> Missed couple of things.
>>> 
>>> 1- I am using SingleColumnValueFilter and the comparator
>>> is BinaryComparator which is passed into it.
>>> 
>>> 2- CDH 5.1.0
>>> (Hbase is 0.98.1-cdh5.1.0)
>>> 
>>> Regards,
>>> Shahab
>>> 
>>> On Tue, Nov 18, 2014 at 12:22 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have data where each row has start and end time stored in UTC (long).
>>>> The table is created through Phoenix and the columns have type
>>>> UNSIGNED_DATE (which according to Phoenix docs
>>>> <http://phoenix.apache.org/language/datatypes.html#unsigned_date_type>
>>>> does Hbase.toBytes(long) underneath for 8 bye long). I am storing data
>> in
>>>> this table using regular Bytes.toBytes from HBase api as well.
>>>> 
>>>> Now I want to query data given a time range, and get all rows lying
>>> within
>>>> or overlapping the search range. Pretty standard scenario.
>>>> 
>>>> For this I create a set of filtersList. A hierarchy of filtersList and
>>>> filters in fact.
>>>> 
>>>> If search criteria timerange  is denoted by *sd* and *ed*
>>>> 
>>>> And each row's date columns are denoted as *s* and *e* (signifying
>> start
>>>> and end datetimes.)
>>>> 
>>>> These 4 filterLists are created as per logic given below....
>>>> 
>>>> filterListLeft (must past all)= This further contains 2 filters= (sd<=
>> s
>>>> and ed>=s)
>>>> 
>>>> filterListRight (must past all)=This further contains 2 filters= (sd<=
>> e
>>>> and ed>=e)
>>>> 
>>>> filterListOverlap (must past all)=This further contains 2 filters=
>> (sd<=
>>> s
>>>> and ed>=e)
>>>> 
>>>> filterListWiithin (must past all)= This further contains 2 filters=
>> (sd>=
>>>> s and ed<=e)
>>>> 
>>>> 
>>>> Then I add these 4 filterLists into another filterList and that must
>> past
>>>> one. I realize that some records might satisfy more than one filter
>>> above.
>>>> But that is OK.
>>>> 
>>>> parentFilterList = new FilterList(must past one)
>>>> parentFilterList.addFilter(filterListLeft):
>>>> parentFilterList.addFilter(filterListRight):
>>>> parentFilterList.addFilter(filterListOverlap):
>>>> parentFilterList.addFilter(filterListWithin):
>>>> 
>>>> Note all filters have setFilterIfMissing = true.
>>>> 
>>>> Then I pass parentFilterList.addFilter to the scanner.
>>>> 
>>>> So it is like= (A and B) or (B and C) or (D and E) or (F and G)
>>>> 
>>>> But what is happening is that I only get data back for the records
>>>> matching filterListWithin. No records which satisfy the other 3
>>> filterList
>>>> criteria comeback. The data exists and is valid form for other
>>> scenarios. I
>>>> can also view it through Phoenix UI tools.
>>>> 
>>>> What am I missing? Could this be a phoenix issue?
>>>> 
>>>> Thanks like always.
>>>> 
>>>> Regards,
>>>> Shahab
>> 

Re: Hierarchy of filters and filters list

Posted by Shahab Yunus <sh...@gmail.com>.
You mean if used independently? Yes, they do.

Regards,
Shahab

On Tue, Nov 18, 2014 at 10:51 AM, Ted Yu <yu...@gmail.com> wrote:

> Have you verified that at least one of the following (when used alone)
> returns data ?
> (A and B), (B and C), (D and E)
>
> Thanks
>
> On Mon, Nov 17, 2014 at 9:27 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Missed couple of things.
> >
> > 1- I am using SingleColumnValueFilter and the comparator
> > is BinaryComparator which is passed into it.
> >
> > 2- CDH 5.1.0
> > (Hbase is 0.98.1-cdh5.1.0)
> >
> > Regards,
> > Shahab
> >
> > On Tue, Nov 18, 2014 at 12:22 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have data where each row has start and end time stored in UTC (long).
> > > The table is created through Phoenix and the columns have type
> > > UNSIGNED_DATE (which according to Phoenix docs
> > > <http://phoenix.apache.org/language/datatypes.html#unsigned_date_type>
> > > does Hbase.toBytes(long) underneath for 8 bye long). I am storing data
> in
> > > this table using regular Bytes.toBytes from HBase api as well.
> > >
> > > Now I want to query data given a time range, and get all rows lying
> > within
> > > or overlapping the search range. Pretty standard scenario.
> > >
> > > For this I create a set of filtersList. A hierarchy of filtersList and
> > > filters in fact.
> > >
> > > If search criteria timerange  is denoted by *sd* and *ed*
> > >
> > > And each row's date columns are denoted as *s* and *e* (signifying
> start
> > > and end datetimes.)
> > >
> > > These 4 filterLists are created as per logic given below....
> > >
> > > filterListLeft (must past all)= This further contains 2 filters= (sd<=
> s
> > > and ed>=s)
> > >
> > > filterListRight (must past all)=This further contains 2 filters= (sd<=
> e
> > > and ed>=e)
> > >
> > > filterListOverlap (must past all)=This further contains 2 filters=
> (sd<=
> > s
> > > and ed>=e)
> > >
> > > filterListWiithin (must past all)= This further contains 2 filters=
> (sd>=
> > > s and ed<=e)
> > >
> > >
> > > Then I add these 4 filterLists into another filterList and that must
> past
> > > one. I realize that some records might satisfy more than one filter
> > above.
> > > But that is OK.
> > >
> > > parentFilterList = new FilterList(must past one)
> > > parentFilterList.addFilter(filterListLeft):
> > > parentFilterList.addFilter(filterListRight):
> > > parentFilterList.addFilter(filterListOverlap):
> > > parentFilterList.addFilter(filterListWithin):
> > >
> > > Note all filters have setFilterIfMissing = true.
> > >
> > > Then I pass parentFilterList.addFilter to the scanner.
> > >
> > > So it is like= (A and B) or (B and C) or (D and E) or (F and G)
> > >
> > > But what is happening is that I only get data back for the records
> > > matching filterListWithin. No records which satisfy the other 3
> > filterList
> > > criteria comeback. The data exists and is valid form for other
> > scenarios. I
> > > can also view it through Phoenix UI tools.
> > >
> > > What am I missing? Could this be a phoenix issue?
> > >
> > > Thanks like always.
> > >
> > > Regards,
> > > Shahab
> > >
> >
>

Re: Hierarchy of filters and filters list

Posted by Ted Yu <yu...@gmail.com>.
Have you verified that at least one of the following (when used alone)
returns data ?
(A and B), (B and C), (D and E)

Thanks

On Mon, Nov 17, 2014 at 9:27 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Missed couple of things.
>
> 1- I am using SingleColumnValueFilter and the comparator
> is BinaryComparator which is passed into it.
>
> 2- CDH 5.1.0
> (Hbase is 0.98.1-cdh5.1.0)
>
> Regards,
> Shahab
>
> On Tue, Nov 18, 2014 at 12:22 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have data where each row has start and end time stored in UTC (long).
> > The table is created through Phoenix and the columns have type
> > UNSIGNED_DATE (which according to Phoenix docs
> > <http://phoenix.apache.org/language/datatypes.html#unsigned_date_type>
> > does Hbase.toBytes(long) underneath for 8 bye long). I am storing data in
> > this table using regular Bytes.toBytes from HBase api as well.
> >
> > Now I want to query data given a time range, and get all rows lying
> within
> > or overlapping the search range. Pretty standard scenario.
> >
> > For this I create a set of filtersList. A hierarchy of filtersList and
> > filters in fact.
> >
> > If search criteria timerange  is denoted by *sd* and *ed*
> >
> > And each row's date columns are denoted as *s* and *e* (signifying start
> > and end datetimes.)
> >
> > These 4 filterLists are created as per logic given below....
> >
> > filterListLeft (must past all)= This further contains 2 filters= (sd<= s
> > and ed>=s)
> >
> > filterListRight (must past all)=This further contains 2 filters= (sd<= e
> > and ed>=e)
> >
> > filterListOverlap (must past all)=This further contains 2 filters= (sd<=
> s
> > and ed>=e)
> >
> > filterListWiithin (must past all)= This further contains 2 filters= (sd>=
> > s and ed<=e)
> >
> >
> > Then I add these 4 filterLists into another filterList and that must past
> > one. I realize that some records might satisfy more than one filter
> above.
> > But that is OK.
> >
> > parentFilterList = new FilterList(must past one)
> > parentFilterList.addFilter(filterListLeft):
> > parentFilterList.addFilter(filterListRight):
> > parentFilterList.addFilter(filterListOverlap):
> > parentFilterList.addFilter(filterListWithin):
> >
> > Note all filters have setFilterIfMissing = true.
> >
> > Then I pass parentFilterList.addFilter to the scanner.
> >
> > So it is like= (A and B) or (B and C) or (D and E) or (F and G)
> >
> > But what is happening is that I only get data back for the records
> > matching filterListWithin. No records which satisfy the other 3
> filterList
> > criteria comeback. The data exists and is valid form for other
> scenarios. I
> > can also view it through Phoenix UI tools.
> >
> > What am I missing? Could this be a phoenix issue?
> >
> > Thanks like always.
> >
> > Regards,
> > Shahab
> >
>

Re: Hierarchy of filters and filters list

Posted by Shahab Yunus <sh...@gmail.com>.
Missed couple of things.

1- I am using SingleColumnValueFilter and the comparator
is BinaryComparator which is passed into it.

2- CDH 5.1.0
(Hbase is 0.98.1-cdh5.1.0)

Regards,
Shahab

On Tue, Nov 18, 2014 at 12:22 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> Hi,
>
> I have data where each row has start and end time stored in UTC (long).
> The table is created through Phoenix and the columns have type
> UNSIGNED_DATE (which according to Phoenix docs
> <http://phoenix.apache.org/language/datatypes.html#unsigned_date_type>
> does Hbase.toBytes(long) underneath for 8 bye long). I am storing data in
> this table using regular Bytes.toBytes from HBase api as well.
>
> Now I want to query data given a time range, and get all rows lying within
> or overlapping the search range. Pretty standard scenario.
>
> For this I create a set of filtersList. A hierarchy of filtersList and
> filters in fact.
>
> If search criteria timerange  is denoted by *sd* and *ed*
>
> And each row's date columns are denoted as *s* and *e* (signifying start
> and end datetimes.)
>
> These 4 filterLists are created as per logic given below....
>
> filterListLeft (must past all)= This further contains 2 filters= (sd<= s
> and ed>=s)
>
> filterListRight (must past all)=This further contains 2 filters= (sd<= e
> and ed>=e)
>
> filterListOverlap (must past all)=This further contains 2 filters= (sd<= s
> and ed>=e)
>
> filterListWiithin (must past all)= This further contains 2 filters= (sd>=
> s and ed<=e)
>
>
> Then I add these 4 filterLists into another filterList and that must past
> one. I realize that some records might satisfy more than one filter above.
> But that is OK.
>
> parentFilterList = new FilterList(must past one)
> parentFilterList.addFilter(filterListLeft):
> parentFilterList.addFilter(filterListRight):
> parentFilterList.addFilter(filterListOverlap):
> parentFilterList.addFilter(filterListWithin):
>
> Note all filters have setFilterIfMissing = true.
>
> Then I pass parentFilterList.addFilter to the scanner.
>
> So it is like= (A and B) or (B and C) or (D and E) or (F and G)
>
> But what is happening is that I only get data back for the records
> matching filterListWithin. No records which satisfy the other 3 filterList
> criteria comeback. The data exists and is valid form for other scenarios. I
> can also view it through Phoenix UI tools.
>
> What am I missing? Could this be a phoenix issue?
>
> Thanks like always.
>
> Regards,
> Shahab
>