You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Shengjie Min <ke...@gmail.com> on 2012/12/20 15:55:46 UTC

HBASE - select distinct query against the rowkey

I have a hbase table called "users", rowkey consists of three parts:

   1. userid
   2. messageid
   3. timestamp

rowkey looks like: ${userid}_${messageid}_${timestamp}

Given I can hash the userid and make the length of the field fixed, is
there anyway I can do a query like SQL query:

select distinct(userid) from users

If rowkey doesn't allow me to query like this, does that mean I need to
create a separated table just contains all the user ids? I guess if I do
something like that, it won't be atomic anymore when I insert a record in,
becoz I am dealing with two tables without transaction.
-- 
All the best,
Shengjie Min

Re: HBASE - select distinct query against the rowkey

Posted by Michael Segel <mi...@hotmail.com>.

There is no concept of transaction in the NoSQL world.  At least not in HBase.

All writes are atomic. Note that you *could* hold a lock, however, not really a good idea for a client to hold a lock. 

Don't know if its really a problem though... 

HTH 

-Mike

On Dec 20, 2012, at 10:08 AM, Shengjie Min <ke...@gmail.com> wrote:

> Thanks Michael,
> 
>> Not sure why you have timestamp in the key... assuming that message id
> would be incremented therefore rows would be in time order anyways.
> 
> I will need to do query like give me the message from timestamp1 to
> timestamp2.
> 
>> You will want to use a separate table.
> That's what I thought as well. If i don't have a separated table, i will
> end up having table scanning. But how about the atomicity? If you write a
> record in, succeeded on one table failed on another? Hbase has no concept
> of transaction in this case.
> 
> Shengjie
> 
> 
> On 20 December 2012 15:59, Michael Segel <mi...@hotmail.com> wrote:
> 
>> Not sure why you have timestamp in the key... assuming that message id
>> would be incremented therefore rows would be in time order anyways.
>> 
>> But to answer your question...
>> You will want to use a separate table.
>> 
>> In both instances you will end up doing a full table scan, however the
>> number of rows in a distinct user table would be much less than your user's
>> table.
>> 
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Dec 20, 2012, at 8:55 AM, Shengjie Min <ke...@gmail.com> wrote:
>> 
>>> I have a hbase table called "users", rowkey consists of three parts:
>>> 
>>>  1. userid
>>>  2. messageid
>>>  3. timestamp
>>> 
>>> rowkey looks like: ${userid}_${messageid}_${timestamp}
>>> 
>>> Given I can hash the userid and make the length of the field fixed, is
>>> there anyway I can do a query like SQL query:
>>> 
>>> select distinct(userid) from users
>>> 
>>> If rowkey doesn't allow me to query like this, does that mean I need to
>>> create a separated table just contains all the user ids? I guess if I do
>>> something like that, it won't be atomic anymore when I insert a record
>> in,
>>> becoz I am dealing with two tables without transaction.
>>> --
>>> All the best,
>>> Shengjie Min
>> 
>> 
> 
> 
> -- 
> All the best,
> Shengjie Min

Re: HBASE - select distinct query against the rowkey

Posted by Shengjie Min <ke...@gmail.com>.

Thanks Michael,

>Not sure why you have timestamp in the key... assuming that message id
would be incremented therefore rows would be in time order anyways.

I will need to do query like give me the message from timestamp1 to
timestamp2.

>You will want to use a separate table.
That's what I thought as well. If i don't have a separated table, i will
end up having table scanning. But how about the atomicity? If you write a
record in, succeeded on one table failed on another? Hbase has no concept
of transaction in this case.

Shengjie


On 20 December 2012 15:59, Michael Segel <mi...@hotmail.com> wrote:

> Not sure why you have timestamp in the key... assuming that message id
> would be incremented therefore rows would be in time order anyways.
>
> But to answer your question...
> You will want to use a separate table.
>
> In both instances you will end up doing a full table scan, however the
> number of rows in a distinct user table would be much less than your user's
> table.
>
>
> HTH
>
> -Mike
>
> On Dec 20, 2012, at 8:55 AM, Shengjie Min <ke...@gmail.com> wrote:
>
> > I have a hbase table called "users", rowkey consists of three parts:
> >
> >   1. userid
> >   2. messageid
> >   3. timestamp
> >
> > rowkey looks like: ${userid}_${messageid}_${timestamp}
> >
> > Given I can hash the userid and make the length of the field fixed, is
> > there anyway I can do a query like SQL query:
> >
> > select distinct(userid) from users
> >
> > If rowkey doesn't allow me to query like this, does that mean I need to
> > create a separated table just contains all the user ids? I guess if I do
> > something like that, it won't be atomic anymore when I insert a record
> in,
> > becoz I am dealing with two tables without transaction.
> > --
> > All the best,
> > Shengjie Min
>
>


-- 
All the best,
Shengjie Min

Re: HBASE - select distinct query against the rowkey

Posted by Michael Segel <mi...@hotmail.com>.

Not sure why you have timestamp in the key... assuming that message id would be incremented therefore rows would be in time order anyways. 

But to answer your question... 
You will want to use a separate table.

In both instances you will end up doing a full table scan, however the number of rows in a distinct user table would be much less than your user's table. 

HTH

-Mike

On Dec 20, 2012, at 8:55 AM, Shengjie Min <ke...@gmail.com> wrote:

> I have a hbase table called "users", rowkey consists of three parts:
> 
>   1. userid
>   2. messageid
>   3. timestamp
> 
> rowkey looks like: ${userid}_${messageid}_${timestamp}
> 
> Given I can hash the userid and make the length of the field fixed, is
> there anyway I can do a query like SQL query:
> 
> select distinct(userid) from users
> 
> If rowkey doesn't allow me to query like this, does that mean I need to
> create a separated table just contains all the user ids? I guess if I do
> something like that, it won't be atomic anymore when I insert a record in,
> becoz I am dealing with two tables without transaction.
> -- 
> All the best,
> Shengjie Min