You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jerry Lam <ch...@gmail.com> on 2012/07/26 19:43:56 UTC

Query a version of a column efficiently

Hi HBase guru:

I need some advises on a problem that I'm facing using HBase. How can I
efficiently query a version of a column when I don't know exactly the
version I'm looking for?
For instance, I want to query a column with timestamp that is less or equal
to N, if version = N is available, return it to me. Otherwise, I want the
version that is closest to the version N (order by descending of
timestamp). That is if version = N - 1 exists, I want it to be returned.

I looked into the TimeRange query, it doesn't seem to provide this semantic
naturally. Note that I don't know which version is closest to N so the
setTimeRange(0,N+1). Do I need to implement a filter to do that or is it
already available?

Any help will be appreciated.

Best Regards,

Jerry

Re: Query a version of a column efficiently

Posted by Tom Brown <to...@gmail.com>.
Somebody will correct me if I'm wrong, but I think that for your
example, you should use setTimeRange(0, 5) and setMaxVersion(1).  It's
my understanding that those settings will give you the 1 latest
version from all applicable version (0 <= timestamp <= 5).

Since it's pretty easy to set the timestamp of a row when you update
it, try it, and see if it's what you want.

--Tom

On Thu, Jul 26, 2012 at 3:40 PM, Jerry Lam <ch...@gmail.com> wrote:
> Hi St.Ack:
>
> Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6,
> 10].
> I want to execute an efficient query that returns one version of the column
> that has a timestamp that is equal to 5 or less. So in this case, it should
> return the value of the column A with timestamp = 3.
>
> Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my guess
> is that it will return the version 6 not version 3. Correct me if I'm
> wrong.
>
> Best Regards,
>
> Jerry
>
> On Thu, Jul 26, 2012 at 5:13 PM, Stack <st...@duboce.net> wrote:
>
>> On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam <ch...@gmail.com> wrote:
>> > I need some advises on a problem that I'm facing using HBase. How can I
>> > efficiently query a version of a column when I don't know exactly the
>> > version I'm looking for?
>> > For instance, I want to query a column with timestamp that is less or
>> equal
>> > to N, if version = N is available, return it to me. Otherwise, I want the
>> > version that is closest to the version N (order by descending of
>> > timestamp). That is if version = N - 1 exists, I want it to be returned.
>> >
>>
>> Have you tried a timerange w/ minStamp of N and maxStamp of
>> HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only
>> (setMaxVersion(1))?
>>
>> St.Ack
>>

Re: Query a version of a column efficiently

Posted by Jerry Lam <ch...@gmail.com>.
Thanks Suraj. I looked at the code but it looks like the logic is not
self-contained, particularly for the way hbase works with search for a
specific version using TimeRange.

Best Regards,

Jerry

On Mon, Jul 30, 2012 at 12:53 PM, Suraj Varma <sv...@gmail.com> wrote:

> You may need to setup your Eclipse workspace and search using
> references etc.To get started, this is one class that uses TimeRange
> based matching ...
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher
> Also - Get is internally implemented as a Scan over a single row.
>
> Hope this gets you started.
> --Suraj
>
> On Thu, Jul 26, 2012 at 4:34 PM, Jerry Lam <ch...@gmail.com> wrote:
> > Hi St.Ack:
> >
> > Can you tell me which source code is responsible for the logic. The
> source code in the get and scan doesnt provide an indication of how the
> setTimeRange works.
> >
> > Best Regards,
> >
> > Jerry
> >
> > Sent from my iPad (sorry for spelling mistakes)
> >
> > On 2012-07-26, at 18:30, Stack <st...@duboce.net> wrote:
> >
> >> On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <ch...@gmail.com>
> wrote:
> >>> Hi St.Ack:
> >>>
> >>> Let say there are 5 versions for a column A with timestamp = [0, 1, 3,
> 6,
> >>> 10].
> >>> I want to execute an efficient query that returns one version of the
> column
> >>> that has a timestamp that is equal to 5 or less. So in this case, it
> should
> >>> return the value of the column A with timestamp = 3.
> >>>
> >>> Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my
> guess
> >>> is that it will return the version 6 not version 3. Correct me if I'm
> >>> wrong.
> >>>
> >>
> >> What Tom says, try it.  IIUC, it'll give you your 3.  It won't give
> >> you 6 since that is outside of the timerange (try 0 instead of
> >> MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would
> >> have to check code).
> >>
> >> St.Ack
>

Re: Query a version of a column efficiently

Posted by Suraj Varma <sv...@gmail.com>.
You may need to setup your Eclipse workspace and search using
references etc.To get started, this is one class that uses TimeRange
based matching ...
org.apache.hadoop.hbase.regionserver.ScanQueryMatcher
Also - Get is internally implemented as a Scan over a single row.

Hope this gets you started.
--Suraj

On Thu, Jul 26, 2012 at 4:34 PM, Jerry Lam <ch...@gmail.com> wrote:
> Hi St.Ack:
>
> Can you tell me which source code is responsible for the logic. The source code in the get and scan doesnt provide an indication of how the setTimeRange works.
>
> Best Regards,
>
> Jerry
>
> Sent from my iPad (sorry for spelling mistakes)
>
> On 2012-07-26, at 18:30, Stack <st...@duboce.net> wrote:
>
>> On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <ch...@gmail.com> wrote:
>>> Hi St.Ack:
>>>
>>> Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6,
>>> 10].
>>> I want to execute an efficient query that returns one version of the column
>>> that has a timestamp that is equal to 5 or less. So in this case, it should
>>> return the value of the column A with timestamp = 3.
>>>
>>> Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my guess
>>> is that it will return the version 6 not version 3. Correct me if I'm
>>> wrong.
>>>
>>
>> What Tom says, try it.  IIUC, it'll give you your 3.  It won't give
>> you 6 since that is outside of the timerange (try 0 instead of
>> MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would
>> have to check code).
>>
>> St.Ack

Re: Query a version of a column efficiently

Posted by Jerry Lam <ch...@gmail.com>.
Hi St.Ack:

Can you tell me which source code is responsible for the logic. The source code in the get and scan doesnt provide an indication of how the setTimeRange works.

Best Regards,

Jerry 

Sent from my iPad (sorry for spelling mistakes)

On 2012-07-26, at 18:30, Stack <st...@duboce.net> wrote:

> On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <ch...@gmail.com> wrote:
>> Hi St.Ack:
>> 
>> Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6,
>> 10].
>> I want to execute an efficient query that returns one version of the column
>> that has a timestamp that is equal to 5 or less. So in this case, it should
>> return the value of the column A with timestamp = 3.
>> 
>> Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my guess
>> is that it will return the version 6 not version 3. Correct me if I'm
>> wrong.
>> 
> 
> What Tom says, try it.  IIUC, it'll give you your 3.  It won't give
> you 6 since that is outside of the timerange (try 0 instead of
> MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would
> have to check code).
> 
> St.Ack

Re: Query a version of a column efficiently

Posted by Stack <st...@duboce.net>.
On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam <ch...@gmail.com> wrote:
> Hi St.Ack:
>
> Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6,
> 10].
> I want to execute an efficient query that returns one version of the column
> that has a timestamp that is equal to 5 or less. So in this case, it should
> return the value of the column A with timestamp = 3.
>
> Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my guess
> is that it will return the version 6 not version 3. Correct me if I'm
> wrong.
>

What Tom says, try it.  IIUC, it'll give you your 3.  It won't give
you 6 since that is outside of the timerange (try 0 instead of
MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would
have to check code).

St.Ack

Re: Query a version of a column efficiently

Posted by Jerry Lam <ch...@gmail.com>.
Hi St.Ack:

Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6,
10].
I want to execute an efficient query that returns one version of the column
that has a timestamp that is equal to 5 or less. So in this case, it should
return the value of the column A with timestamp = 3.

Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my guess
is that it will return the version 6 not version 3. Correct me if I'm
wrong.

Best Regards,

Jerry

On Thu, Jul 26, 2012 at 5:13 PM, Stack <st...@duboce.net> wrote:

> On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam <ch...@gmail.com> wrote:
> > I need some advises on a problem that I'm facing using HBase. How can I
> > efficiently query a version of a column when I don't know exactly the
> > version I'm looking for?
> > For instance, I want to query a column with timestamp that is less or
> equal
> > to N, if version = N is available, return it to me. Otherwise, I want the
> > version that is closest to the version N (order by descending of
> > timestamp). That is if version = N - 1 exists, I want it to be returned.
> >
>
> Have you tried a timerange w/ minStamp of N and maxStamp of
> HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only
> (setMaxVersion(1))?
>
> St.Ack
>

Re: Query a version of a column efficiently

Posted by Stack <st...@duboce.net>.
On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam <ch...@gmail.com> wrote:
> I need some advises on a problem that I'm facing using HBase. How can I
> efficiently query a version of a column when I don't know exactly the
> version I'm looking for?
> For instance, I want to query a column with timestamp that is less or equal
> to N, if version = N is available, return it to me. Otherwise, I want the
> version that is closest to the version N (order by descending of
> timestamp). That is if version = N - 1 exists, I want it to be returned.
>

Have you tried a timerange w/ minStamp of N and maxStamp of
HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only
(setMaxVersion(1))?

St.Ack