You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bui Ngoc Son <ge...@gmail.com> on 2011/02/07 12:25:18 UTC

Select first n columns of a column family

Hi everyone,

I am going to implement a facebook-like comments system (with main 
comments and their sub-comments), using HBase 0.90. I designed the 
storage of comments as follow:

- Row key is composite row key. Each row key is a composition of user id 
and timestamp (in reverse order). With this row key, i can easily get 
top n newest main comments of a specific user.

- Each row has two column family: "data" and "sub_comment"
+ The "data" family stores information of main comment, such as comment 
content, user id of poster, time of post...
+ The "sub_comment" family, as its name, store all the sub comments. The 
number of columns in this family equal the number of sub-comments, and 
column names follow this format: "sub-comment:timestamp" (timestamp in 
reverse order), and its value is the content of sub-comment.

So, my question is, is there any way to get the first n columns (ordered 
by timestamp) of the "sub_comment" family? I searched arround google and 
this mailing list, but cannot find this information.
I know that i can use different versions of a column to solve this 
problem, but for some reasons, i do not want to use multiple versions of 
a column.

Thanks in advance

Eddie Bui,

Re: Select first n columns of a column family

Posted by Stack <st...@duboce.net>.
The timestamp as column family qualifier on sub_comments seems
redundant since HBase sorts already in reverse timestamp order.  Why
not instead name your column as sub_comment:data_comment_id and then
just do a Get w/ maximum versions set to the number of returns you
want (You can even supply a TimeRange to Get and it should do the
right thing).

St.Ack

On Mon, Feb 7, 2011 at 3:25 AM, Bui Ngoc Son <ge...@gmail.com> wrote:
> Hi everyone,
>
> I am going to implement a facebook-like comments system (with main comments
> and their sub-comments), using HBase 0.90. I designed the storage of
> comments as follow:
>
> - Row key is composite row key. Each row key is a composition of user id and
> timestamp (in reverse order). With this row key, i can easily get top n
> newest main comments of a specific user.
>
> - Each row has two column family: "data" and "sub_comment"
> + The "data" family stores information of main comment, such as comment
> content, user id of poster, time of post...
> + The "sub_comment" family, as its name, store all the sub comments. The
> number of columns in this family equal the number of sub-comments, and
> column names follow this format: "sub-comment:timestamp" (timestamp in
> reverse order), and its value is the content of sub-comment.
>
> So, my question is, is there any way to get the first n columns (ordered by
> timestamp) of the "sub_comment" family? I searched arround google and this
> mailing list, but cannot find this information.
> I know that i can use different versions of a column to solve this problem,
> but for some reasons, i do not want to use multiple versions of a column.
>
> Thanks in advance
>
> Eddie Bui,
>

RE: How long can I have a table open?

Posted by Jonathan Gray <jg...@fb.com>.
What's the nature of the stability problems?  These are issues seen server-side or just client-side?

Restarting your long-running clients fixes the issues?

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Monday, February 07, 2011 11:29 AM
> To: user@hbase.apache.org
> Subject: RE: How long can I have a table open?
> 
> Thanks, I have been having some real stability problems with my cluster and
> I'm trying to narrow down the possible problems.
> 
> -Pete
> 
> -----Original Message-----
> From: Jonathan Gray [mailto:jgray@fb.com]
> Sent: Monday, February 07, 2011 11:27 AM
> To: user@hbase.apache.org
> Subject: RE: How long can I have a table open?
> 
> There is not really a limit on that.  Underneath, the client will deal with
> following regions around if the move, a new master if there is a failover,
> etc...
> 
> The only thing that cannot be left open indefinitely is a scanner (they have a
> server-side lease and expire if left idle).
> 
> JG
> 
> > -----Original Message-----
> > From: Peter Haidinyak [mailto:phaidinyak@local.com]
> > Sent: Monday, February 07, 2011 10:07 AM
> > To: user@hbase.apache.org
> > Subject: How long can I have a table open?
> >
> > During my import process I create a connection to a table. How long
> > can I leave that connection open without using it before HBase will try to
> close it?
> >
> > Hadoop 0.20.2+737
> > HBase 0.89.20100924+28
> >
> > Thanks
> >
> > -Pete

RE: How long can I have a table open?

Posted by Peter Haidinyak <ph...@local.com>.
Thanks, I have been having some real stability problems with my cluster and I'm trying to narrow down the possible problems.

-Pete

-----Original Message-----
From: Jonathan Gray [mailto:jgray@fb.com] 
Sent: Monday, February 07, 2011 11:27 AM
To: user@hbase.apache.org
Subject: RE: How long can I have a table open?

There is not really a limit on that.  Underneath, the client will deal with following regions around if the move, a new master if there is a failover, etc...

The only thing that cannot be left open indefinitely is a scanner (they have a server-side lease and expire if left idle).

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Monday, February 07, 2011 10:07 AM
> To: user@hbase.apache.org
> Subject: How long can I have a table open?
> 
> During my import process I create a connection to a table. How long can I
> leave that connection open without using it before HBase will try to close it?
> 
> Hadoop 0.20.2+737
> HBase 0.89.20100924+28
> 
> Thanks
> 
> -Pete

RE: How long can I have a table open?

Posted by Jonathan Gray <jg...@fb.com>.
There is not really a limit on that.  Underneath, the client will deal with following regions around if the move, a new master if there is a failover, etc...

The only thing that cannot be left open indefinitely is a scanner (they have a server-side lease and expire if left idle).

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Monday, February 07, 2011 10:07 AM
> To: user@hbase.apache.org
> Subject: How long can I have a table open?
> 
> During my import process I create a connection to a table. How long can I
> leave that connection open without using it before HBase will try to close it?
> 
> Hadoop 0.20.2+737
> HBase 0.89.20100924+28
> 
> Thanks
> 
> -Pete

How long can I have a table open?

Posted by Peter Haidinyak <ph...@local.com>.
During my import process I create a connection to a table. How long can I leave that connection open without using it before HBase will try to close it? 

Hadoop 0.20.2+737
HBase 0.89.20100924+28

Thanks

-Pete

Re: Select first n columns of a column family

Posted by Ted Yu <yu...@gmail.com>.
We use multiple versions of a column in our application. It is quite
convenient.

In your current approach, how many columns do you specify when you create
the table ?

On Mon, Feb 7, 2011 at 3:25 AM, Bui Ngoc Son <ge...@gmail.com>wrote:

> Hi everyone,
>
> I am going to implement a facebook-like comments system (with main comments
> and their sub-comments), using HBase 0.90. I designed the storage of
> comments as follow:
>
> - Row key is composite row key. Each row key is a composition of user id
> and timestamp (in reverse order). With this row key, i can easily get top n
> newest main comments of a specific user.
>
> - Each row has two column family: "data" and "sub_comment"
> + The "data" family stores information of main comment, such as comment
> content, user id of poster, time of post...
> + The "sub_comment" family, as its name, store all the sub comments. The
> number of columns in this family equal the number of sub-comments, and
> column names follow this format: "sub-comment:timestamp" (timestamp in
> reverse order), and its value is the content of sub-comment.
>
> So, my question is, is there any way to get the first n columns (ordered by
> timestamp) of the "sub_comment" family? I searched arround google and this
> mailing list, but cannot find this information.
> I know that i can use different versions of a column to solve this problem,
> but for some reasons, i do not want to use multiple versions of a column.
>
> Thanks in advance
>
> Eddie Bui,
>