You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shumin Wu <sh...@gmail.com> on 2012/10/11 01:24:24 UTC

Re: Temporal in Hbase?

How I can miss this reply!!

Hi Anoop,

First, thanks for your reply to my question and apologize for not following
up promptly. I have put off a million of fires and come back to this issue.
Here are my thoughts. Yes, a FilterList with MUST_PASS_ALL works fine for
simple temporal clause.

However, I have a use case like this. I need to find all data having
overlapping time range for a given time range. Some data are valid till
now, which have a open-ended end time timestamp, marked as end_time = null
in our database.

To express it formally, for a given time range [const_st, const_end], where
const_st represents the constant start time and const_et the constant end
time, my task is to find all data rows with start_time and end_time
satisfying this expression:

start_time < const_et and end_time >= const_st or end_time is null.


In a FilterList, I can choose either MUST_PASS_ALL or MUST_PASS_ONE, but
none is applicable to this use case.

It would be nice if there is a temporal filter that allows me to select
data valid between [const_st, const_et] (and that end_time is null will be
automatically interpreted as valid up to now).

My domain is not traditionally Internet area, but I am sure folks in
clickstream business have a similar need. And I am wondering how they solve
this problem.

Temporal is commonly supported in traditional databases. So maybe HBase can
offer the same? I guess the current version does not have this support, and
a customer filter needs to be written by myself. I could be wrong. Please
enlighten me.


Shumin Wu





On Mon, Sep 17, 2012 at 8:16 PM, Anoop Sam John <an...@huawei.com> wrote:

> Hi
> start_time and end_time are 2 qualifiers in your table.
> You can use a FilterList with MUST_PASS_ALL ( AND condition)
> Add SingleColumnValueFilter for each of the qualifier with the value and
> condition..
>
> -Anoop-
>
> ________________________________________
> From: Shumin Wu [shumin.wu@gmail.com]
> Sent: Monday, September 17, 2012 9:58 PM
> To: user@hbase.apache.org
> Subject: Temporal in Hbase?
>
> Hi,
>
> I have a user case to "filter" out rows using an "as of" predicate. For
> example, given a specific time point T, I would like to find all rows where
> start_time<=T<=end_time.
>
> Example hbase table schema:
> row_key, col_A, col_B, start_time, end_time
>
> I am wondering if there is any existing filter that allows me to do this.
>
> If not, I guess I would have to write my own custom filter.
>
> Thanks,
>
> Shumin Wu
>

RE: Temporal in Hbase?

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
If your Column doesnot contain the given value means
If the end_time qualifier is null still the row should be retrieved right?

As far as I read what is temporal database (am not very much familiar.  Just
read thro WIKI to know what is temporal) it is related to multiversioning of
the same row.
So the same row will have multiple versions.

Suppose
row_key, col_A, col_B, start_time, end_time
row1		xxx	yyy	1800		1801
row1		xxx	yyy	1800		(empty)

As per the versioning if the row1 with empty value for endtime is inserted
then that will show up first.  Now if your versioning is 1 it will try to
retrieve the latest value.
On the SingleColumnValueFilter we have a property called
setFilterIfMissing().  
If the specified value is not found setting this property will filter out
the row.  If you still want the value the property should be false.  Default
is false.

So now if my query is start_time>1800 and end_time<1900 with MUSTPASSALL
then if setFilterIfMissing (false) we can get the latest row which has
endtime empty.
Does this answer your question?

Regards
Ram

> -----Original Message-----
> From: Shumin Wu [mailto:shumin.wu@gmail.com]
> Sent: Thursday, October 11, 2012 4:54 AM
> To: user@hbase.apache.org
> Subject: Re: Temporal in Hbase?
> 
> How I can miss this reply!!
> 
> Hi Anoop,
> 
> First, thanks for your reply to my question and apologize for not
> following
> up promptly. I have put off a million of fires and come back to this
> issue.
> Here are my thoughts. Yes, a FilterList with MUST_PASS_ALL works fine
> for
> simple temporal clause.
> 
> However, I have a use case like this. I need to find all data having
> overlapping time range for a given time range. Some data are valid till
> now, which have a open-ended end time timestamp, marked as end_time =
> null
> in our database.
> 
> To express it formally, for a given time range [const_st, const_end],
> where
> const_st represents the constant start time and const_et the constant
> end
> time, my task is to find all data rows with start_time and end_time
> satisfying this expression:
> 
> start_time < const_et and end_time >= const_st or end_time is null.
> 
> 
> In a FilterList, I can choose either MUST_PASS_ALL or MUST_PASS_ONE,
> but
> none is applicable to this use case.
> 
> It would be nice if there is a temporal filter that allows me to select
> data valid between [const_st, const_et] (and that end_time is null will
> be
> automatically interpreted as valid up to now).
> 
> My domain is not traditionally Internet area, but I am sure folks in
> clickstream business have a similar need. And I am wondering how they
> solve
> this problem.
> 
> Temporal is commonly supported in traditional databases. So maybe HBase
> can
> offer the same? I guess the current version does not have this support,
> and
> a customer filter needs to be written by myself. I could be wrong.
> Please
> enlighten me.
> 
> 
> Shumin Wu
> 
> 
> 
> 
> 
> On Mon, Sep 17, 2012 at 8:16 PM, Anoop Sam John <an...@huawei.com>
> wrote:
> 
> > Hi
> > start_time and end_time are 2 qualifiers in your table.
> > You can use a FilterList with MUST_PASS_ALL ( AND condition)
> > Add SingleColumnValueFilter for each of the qualifier with the value
> and
> > condition..
> >
> > -Anoop-
> >
> > ________________________________________
> > From: Shumin Wu [shumin.wu@gmail.com]
> > Sent: Monday, September 17, 2012 9:58 PM
> > To: user@hbase.apache.org
> > Subject: Temporal in Hbase?
> >
> > Hi,
> >
> > I have a user case to "filter" out rows using an "as of" predicate.
> For
> > example, given a specific time point T, I would like to find all rows
> where
> > start_time<=T<=end_time.
> >
> > Example hbase table schema:
> > row_key, col_A, col_B, start_time, end_time
> >
> > I am wondering if there is any existing filter that allows me to do
> this.
> >
> > If not, I guess I would have to write my own custom filter.
> >
> > Thanks,
> >
> > Shumin Wu
> >


RE: Temporal in Hbase?

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
Oops!! Anoop has just replied the same similar to mine :)

Regards
Ram

> -----Original Message-----
> From: Anoop Sam John [mailto:anoopsj@huawei.com]
> Sent: Thursday, October 11, 2012 10:40 AM
> To: user@hbase.apache.org
> Subject: RE: Temporal in Hbase?
> 
> Hi Shumin,
> 
> >start_time < const_et and end_time >= const_st or end_time is null.
> Your problem was end_time >= const_st or end_time is null.....
> 
> You can make use of FilterList with MUST_PASS_ALL (AND) condition
> only.. This can contain one SingleColumnValueFilter correspodning to
> start_time with your condition and value.  "end_time" is null means you
> are not having any KV added in the row for this column? [This column
> value is missing for a row] In that case you can use a
> SingleColumnValueFilter with condition and value "const_st"  and to
> this filter set  filterIfMissing(false). This means if the column is
> absent for a row, that row wont get filtered out.. This is what you
> want right? [But the default value for filterIfMissing is false only ]
> 
> FYI a FilterList can contain another filter list
> So if you have a query like  col1=? AND ( col2=? OR col2=? ) you can
> use FilterList.. One inner filter list with MUST_PASS_ONE for col2 and
> an outer FL with MUST_PASS_ALL which contains the inner FL and SCVF for
> col1..    Hope I understood your problem and giving the answer which
> you are looking for   :)
> 
> -Anoop-
> ________________________________________
> From: Shumin Wu [shumin.wu@gmail.com]
> Sent: Thursday, October 11, 2012 4:54 AM
> To: user@hbase.apache.org
> Subject: Re: Temporal in Hbase?
> 
> How I can miss this reply!!
> 
> Hi Anoop,
> 
> First, thanks for your reply to my question and apologize for not
> following
> up promptly. I have put off a million of fires and come back to this
> issue.
> Here are my thoughts. Yes, a FilterList with MUST_PASS_ALL works fine
> for
> simple temporal clause.
> 
> However, I have a use case like this. I need to find all data having
> overlapping time range for a given time range. Some data are valid till
> now, which have a open-ended end time timestamp, marked as end_time =
> null
> in our database.
> 
> To express it formally, for a given time range [const_st, const_end],
> where
> const_st represents the constant start time and const_et the constant
> end
> time, my task is to find all data rows with start_time and end_time
> satisfying this expression:
> 
> start_time < const_et and end_time >= const_st or end_time is null.
> 
> 
> In a FilterList, I can choose either MUST_PASS_ALL or MUST_PASS_ONE,
> but
> none is applicable to this use case.
> 
> It would be nice if there is a temporal filter that allows me to select
> data valid between [const_st, const_et] (and that end_time is null will
> be
> automatically interpreted as valid up to now).
> 
> My domain is not traditionally Internet area, but I am sure folks in
> clickstream business have a similar need. And I am wondering how they
> solve
> this problem.
> 
> Temporal is commonly supported in traditional databases. So maybe HBase
> can
> offer the same? I guess the current version does not have this support,
> and
> a customer filter needs to be written by myself. I could be wrong.
> Please
> enlighten me.
> 
> 
> Shumin Wu
> 
> 
> 
> 
> 
> On Mon, Sep 17, 2012 at 8:16 PM, Anoop Sam John <an...@huawei.com>
> wrote:
> 
> > Hi
> > start_time and end_time are 2 qualifiers in your table.
> > You can use a FilterList with MUST_PASS_ALL ( AND condition)
> > Add SingleColumnValueFilter for each of the qualifier with the value
> and
> > condition..
> >
> > -Anoop-
> >
> > ________________________________________
> > From: Shumin Wu [shumin.wu@gmail.com]
> > Sent: Monday, September 17, 2012 9:58 PM
> > To: user@hbase.apache.org
> > Subject: Temporal in Hbase?
> >
> > Hi,
> >
> > I have a user case to "filter" out rows using an "as of" predicate.
> For
> > example, given a specific time point T, I would like to find all rows
> where
> > start_time<=T<=end_time.
> >
> > Example hbase table schema:
> > row_key, col_A, col_B, start_time, end_time
> >
> > I am wondering if there is any existing filter that allows me to do
> this.
> >
> > If not, I guess I would have to write my own custom filter.
> >
> > Thanks,
> >
> > Shumin Wu
> >=


RE: Temporal in Hbase?

Posted by Anoop Sam John <an...@huawei.com>.
Hi Shumin,

>start_time < const_et and end_time >= const_st or end_time is null.
Your problem was end_time >= const_st or end_time is null.....

You can make use of FilterList with MUST_PASS_ALL (AND) condition only.. This can contain one SingleColumnValueFilter correspodning to start_time with your condition and value.  "end_time" is null means you are not having any KV added in the row for this column? [This column value is missing for a row] In that case you can use a SingleColumnValueFilter with condition and value "const_st"  and to this filter set  filterIfMissing(false). This means if the column is absent for a row, that row wont get filtered out.. This is what you want right? [But the default value for filterIfMissing is false only ]

FYI a FilterList can contain another filter list
So if you have a query like  col1=? AND ( col2=? OR col2=? ) you can use FilterList.. One inner filter list with MUST_PASS_ONE for col2 and an outer FL with MUST_PASS_ALL which contains the inner FL and SCVF for col1..    Hope I understood your problem and giving the answer which you are looking for   :)

-Anoop-
________________________________________
From: Shumin Wu [shumin.wu@gmail.com]
Sent: Thursday, October 11, 2012 4:54 AM
To: user@hbase.apache.org
Subject: Re: Temporal in Hbase?

How I can miss this reply!!

Hi Anoop,

First, thanks for your reply to my question and apologize for not following
up promptly. I have put off a million of fires and come back to this issue.
Here are my thoughts. Yes, a FilterList with MUST_PASS_ALL works fine for
simple temporal clause.

However, I have a use case like this. I need to find all data having
overlapping time range for a given time range. Some data are valid till
now, which have a open-ended end time timestamp, marked as end_time = null
in our database.

To express it formally, for a given time range [const_st, const_end], where
const_st represents the constant start time and const_et the constant end
time, my task is to find all data rows with start_time and end_time
satisfying this expression:

start_time < const_et and end_time >= const_st or end_time is null.


In a FilterList, I can choose either MUST_PASS_ALL or MUST_PASS_ONE, but
none is applicable to this use case.

It would be nice if there is a temporal filter that allows me to select
data valid between [const_st, const_et] (and that end_time is null will be
automatically interpreted as valid up to now).

My domain is not traditionally Internet area, but I am sure folks in
clickstream business have a similar need. And I am wondering how they solve
this problem.

Temporal is commonly supported in traditional databases. So maybe HBase can
offer the same? I guess the current version does not have this support, and
a customer filter needs to be written by myself. I could be wrong. Please
enlighten me.


Shumin Wu





On Mon, Sep 17, 2012 at 8:16 PM, Anoop Sam John <an...@huawei.com> wrote:

> Hi
> start_time and end_time are 2 qualifiers in your table.
> You can use a FilterList with MUST_PASS_ALL ( AND condition)
> Add SingleColumnValueFilter for each of the qualifier with the value and
> condition..
>
> -Anoop-
>
> ________________________________________
> From: Shumin Wu [shumin.wu@gmail.com]
> Sent: Monday, September 17, 2012 9:58 PM
> To: user@hbase.apache.org
> Subject: Temporal in Hbase?
>
> Hi,
>
> I have a user case to "filter" out rows using an "as of" predicate. For
> example, given a specific time point T, I would like to find all rows where
> start_time<=T<=end_time.
>
> Example hbase table schema:
> row_key, col_A, col_B, start_time, end_time
>
> I am wondering if there is any existing filter that allows me to do this.
>
> If not, I guess I would have to write my own custom filter.
>
> Thanks,
>
> Shumin Wu
>

Re: Temporal in Hbase?

Posted by Shumin Wu <sh...@gmail.com>.
Anoop and Ramkrishna,

Your answers combined solved my problem! I tried the approach this morning.
Without making my own customer filter, only 20+ LOC completed my mission!
Thanks for your help!

Anoop: "FYI a FilterList can contain another filter list
So if you have a query like  col1=? AND ( col2=? OR col2=? ) you can use
FilterList.. One inner filter list with MUST_PASS_ONE for col2 and an outer
FL with MUST_PASS_ALL which contains the inner FL and SCVF for col1..
 Hope I understood your problem and giving the answer which you are looking
for   :)"

Ramkrishna: "On the SingleColumnValueFilter we have a property called
setFilterIfMissing().
If the specified value is not found setting this property will filter out
the row.  If you still want the value the property should be false.  Default
is false."

Shumin

On Wed, Oct 10, 2012 at 10:32 PM, Ramkrishna.S.Vasudevan <
ramkrishna.vasudevan@huawei.com> wrote:

> If your Column doesnot contain the given value means
> If the end_time qualifier is null still the row should be retrieved right?
>
> As far as I read what is temporal database (am not very much familiar.
>  Just
> read thro WIKI to know what is temporal) it is related to multiversioning
> of
> the same row.
> So the same row will have multiple versions.
>
> Suppose
> row_key, col_A, col_B, start_time, end_time
> row1            xxx     yyy     1800            1801
> row1            xxx     yyy     1800            (empty)
>
> As per the versioning if the row1 with empty value for endtime is inserted
> then that will show up first.  Now if your versioning is 1 it will try to
> retrieve the latest value.
> On the SingleColumnValueFilter we have a property called
> setFilterIfMissing().
> If the specified value is not found setting this property will filter out
> the row.  If you still want the value the property should be false.
>  Default
> is false.
>
> So now if my query is start_time>1800 and end_time<1900 with MUSTPASSALL
> then if setFilterIfMissing (false) we can get the latest row which has
> endtime empty.
> Does this answer your question?
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: Shumin Wu [mailto:shumin.wu@gmail.com]
> > Sent: Thursday, October 11, 2012 4:54 AM
> > To: user@hbase.apache.org
> > Subject: Re: Temporal in Hbase?
> >
> > How I can miss this reply!!
> >
> > Hi Anoop,
> >
> > First, thanks for your reply to my question and apologize for not
> > following
> > up promptly. I have put off a million of fires and come back to this
> > issue.
> > Here are my thoughts. Yes, a FilterList with MUST_PASS_ALL works fine
> > for
> > simple temporal clause.
> >
> > However, I have a use case like this. I need to find all data having
> > overlapping time range for a given time range. Some data are valid till
> > now, which have a open-ended end time timestamp, marked as end_time =
> > null
> > in our database.
> >
> > To express it formally, for a given time range [const_st, const_end],
> > where
> > const_st represents the constant start time and const_et the constant
> > end
> > time, my task is to find all data rows with start_time and end_time
> > satisfying this expression:
> >
> > start_time < const_et and end_time >= const_st or end_time is null.
> >
> >
> > In a FilterList, I can choose either MUST_PASS_ALL or MUST_PASS_ONE,
> > but
> > none is applicable to this use case.
> >
> > It would be nice if there is a temporal filter that allows me to select
> > data valid between [const_st, const_et] (and that end_time is null will
> > be
> > automatically interpreted as valid up to now).
> >
> > My domain is not traditionally Internet area, but I am sure folks in
> > clickstream business have a similar need. And I am wondering how they
> > solve
> > this problem.
> >
> > Temporal is commonly supported in traditional databases. So maybe HBase
> > can
> > offer the same? I guess the current version does not have this support,
> > and
> > a customer filter needs to be written by myself. I could be wrong.
> > Please
> > enlighten me.
> >
> >
> > Shumin Wu
> >
> >
> >
> >
> >
> > On Mon, Sep 17, 2012 at 8:16 PM, Anoop Sam John <an...@huawei.com>
> > wrote:
> >
> > > Hi
> > > start_time and end_time are 2 qualifiers in your table.
> > > You can use a FilterList with MUST_PASS_ALL ( AND condition)
> > > Add SingleColumnValueFilter for each of the qualifier with the value
> > and
> > > condition..
> > >
> > > -Anoop-
> > >
> > > ________________________________________
> > > From: Shumin Wu [shumin.wu@gmail.com]
> > > Sent: Monday, September 17, 2012 9:58 PM
> > > To: user@hbase.apache.org
> > > Subject: Temporal in Hbase?
> > >
> > > Hi,
> > >
> > > I have a user case to "filter" out rows using an "as of" predicate.
> > For
> > > example, given a specific time point T, I would like to find all rows
> > where
> > > start_time<=T<=end_time.
> > >
> > > Example hbase table schema:
> > > row_key, col_A, col_B, start_time, end_time
> > >
> > > I am wondering if there is any existing filter that allows me to do
> > this.
> > >
> > > If not, I guess I would have to write my own custom filter.
> > >
> > > Thanks,
> > >
> > > Shumin Wu
> > >
>
>