You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bui Ngoc Son <ge...@gmail.com> on 2011/04/20 08:53:31 UTC

HBase Schema: how to sort row by last update?

Hi everybody,

I am desiging a two-tiers comment system like facebook: the system 
inlcuded main comments and each main comment has a various number of 
sub-comments. My schema is as follow:

table comments
     family "data":
         "data:content" - content of main comment
         "data:uid" - uid of user who posted this comment
         "data:sub_count" - the number of its sub-comment.
     family "sub_content"
         "sub_content:<cid>": content of the sub-comment which has id=cid
     family "sub_uid"
         "sub_uid:<cid>": uid of user who posted the sub-comment which 
has id=cid

So, each sub-comment has two columns in 2 families with an unique cid. 
As well, each main comment has an unique id that is decreased by time 
(to make sure that newer main comments are selected first)

However, my boss requires that main comments have to be ordered by last 
update first. It is mean that, each time a main comment have a new 
sub-comment, it becomes "newer" than the others, and i have to sort the 
main comments by the time of their last sub-comments.

How can i design the comment schema to meet my boss's requirement? can i 
keep the current schema and use some extra-indexing table? or i have to 
re-design the schema?

Thanks in advance

Regards,

Eddie Bui

Re: HBase Schema: how to sort row by last update?

Posted by Bui Ngoc Son <ge...@gmail.com>.
Hi Stack,

Thanks for your reply!

Two major queries in my system are "select" queries:

1. Select main comments by page, n items per page, ordered by last 
update DESC. The last update of a main comment is the timestamp of its 
newest sub-comment (if it doesn't have any sub-comment, last update is 
the timestamp of main comment).

2. For each main comment, select 2 newest sub-comments, and select all 
sub-comments.

With my current schema, the second query can be done easily, but the 
first cannot.

I'm thinking about the second index table with this schema:

table "lastpost_index"
     row key: <reversed_timestamp_of_last_post>_<id_of_main_comment>
     no need any family.

Each time a main comment has new sub comment, i insert a new row into 
"lastpost_index" table.
Scanning "lastpost_index" table and parsing row keys i could have a list 
of <id_of_main_comment> order by lastpost DESC. However, there is one 
problem:
Some main comments are "busy" than others, it means that they have more 
sub-comment and are updated more regularly than others. For example:

table "lastpost_index":
key: 123456789_10
key: 123456799_11
key: 123456899_11
key: 123456999_9

the main comment with id 11 has two newer sub-comments than main comment 
9. When i scan the table, the main comment 11 appear twice.

So, it's difficult for me to get exactly n newest distinct main comment 
using lastpost_index.

Regards,

Eddie Bui

On 4/20/2011 11:30 PM, Stack wrote:
> It looks like you keep up a second index table ordered by last post
> (row key will have a reversed timestamp of the post's date).
>
> What are the queries you are going to make against the table?
>
> St.Ack
>
> On Tue, Apr 19, 2011 at 11:53 PM, Bui Ngoc Son<ge...@gmail.com>  wrote:
>> Hi everybody,
>>
>> I am desiging a two-tiers comment system like facebook: the system inlcuded
>> main comments and each main comment has a various number of sub-comments. My
>> schema is as follow:
>>
>> table comments
>>     family "data":
>>         "data:content" - content of main comment
>>         "data:uid" - uid of user who posted this comment
>>         "data:sub_count" - the number of its sub-comment.
>>     family "sub_content"
>>         "sub_content:<cid>": content of the sub-comment which has id=cid
>>     family "sub_uid"
>>         "sub_uid:<cid>": uid of user who posted the sub-comment which has
>> id=cid
>>
>> So, each sub-comment has two columns in 2 families with an unique cid. As
>> well, each main comment has an unique id that is decreased by time (to make
>> sure that newer main comments are selected first)
>>
>> However, my boss requires that main comments have to be ordered by last
>> update first. It is mean that, each time a main comment have a new
>> sub-comment, it becomes "newer" than the others, and i have to sort the main
>> comments by the time of their last sub-comments.
>>
>> How can i design the comment schema to meet my boss's requirement? can i
>> keep the current schema and use some extra-indexing table? or i have to
>> re-design the schema?
>>
>> Thanks in advance
>>
>> Regards,
>>
>> Eddie Bui
>>


Re: HBase Schema: how to sort row by last update?

Posted by Stack <st...@duboce.net>.
It looks like you keep up a second index table ordered by last post
(row key will have a reversed timestamp of the post's date).

What are the queries you are going to make against the table?

St.Ack

On Tue, Apr 19, 2011 at 11:53 PM, Bui Ngoc Son <ge...@gmail.com> wrote:
> Hi everybody,
>
> I am desiging a two-tiers comment system like facebook: the system inlcuded
> main comments and each main comment has a various number of sub-comments. My
> schema is as follow:
>
> table comments
>    family "data":
>        "data:content" - content of main comment
>        "data:uid" - uid of user who posted this comment
>        "data:sub_count" - the number of its sub-comment.
>    family "sub_content"
>        "sub_content:<cid>": content of the sub-comment which has id=cid
>    family "sub_uid"
>        "sub_uid:<cid>": uid of user who posted the sub-comment which has
> id=cid
>
> So, each sub-comment has two columns in 2 families with an unique cid. As
> well, each main comment has an unique id that is decreased by time (to make
> sure that newer main comments are selected first)
>
> However, my boss requires that main comments have to be ordered by last
> update first. It is mean that, each time a main comment have a new
> sub-comment, it becomes "newer" than the others, and i have to sort the main
> comments by the time of their last sub-comments.
>
> How can i design the comment schema to meet my boss's requirement? can i
> keep the current schema and use some extra-indexing table? or i have to
> re-design the schema?
>
> Thanks in advance
>
> Regards,
>
> Eddie Bui
>