You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Dingding Ye <ye...@gmail.com> on 2008/10/18 17:28:42 UTC

Possible to set the results' sort method?

Hi.

The default results returned by getRow/Scanner/else is ordered by the key.

Is it possible to set other sort method such as timestamp?

Thanks for the help.

Best regards.

sishen

Re: Possible to set the results' sort method?

Posted by Andrew Purtell <ap...@apache.org>.

THBase and ITHBase are officially supported components in 0.20.0 and will remain so in 0.21.0. Their presence in contrib/ is because they are not core functionality but that placement should not be construed as second class. 

   - Andy




________________________________
From: Keith Thomas <ke...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Thursday, September 10, 2009 3:01:29 PM
Subject: Re: Possible to set the results' sort method?


Brilliant! That works perfectly. Where do I vote for this to be included in
0.21.0? :-)


Clint Morgan-3 wrote:
> 
> perhaps you would be interested in the tableindexed package. (Its in
> transactional contrib, see doc in o.a.h.h.client.tableindexed, or look at
> the tests).
> 
> It will allow you to get a scanner whose results are ordered by a column's
> values (If you have an index on that column).
> 
> -clint
> 
> On Thu, Sep 10, 2009 at 5:49 AM, Keith Thomas
> <ke...@gmail.com>wrote:
> 
>>
>>
>>
>> stack-3 wrote:
>> >
>> > On Wed, Sep 9, 2009 at 7:52 PM, Keith Thomas <ke...@gmail.com>
>> > wrote:
>> >
>> >>
>> >> I think I'm looking at the same problem with HBase as Dingding Ye. I
>> need
>> >> to
>> >> be able to retrieve a list of rows sorted by data in a column and I'm
>> not
>> >> sure how to go about it without resorting to performing the sort on
>> the
>> >> client which feels like I'm just giving up.
>> >>
>> >>
>> >
>> > s3> You want to sort rows in the table by other than the row key or is
>> it
>> > just
>> > s3> that you want to sort the content of a row by other than its column
>> > name?
>> >
>> > I want to sort by the content of a column in each row.
>> >
>> > s3> How big is the set you want to look at?  Is it full table or some
>> > subset of
>> > rows?
>> >
>> > I am writing the data access layer, not the app itself. I have to
>> conform
>> > to a certain api. It is up to the application itself to use certain
>> > limits, although I may impose configurable limits in my layer just to
>> be
>> > conservative in this brave new world I am exploring. Idelly I'd like to
>> be
>> > able to to both, i.e. retrieve a full table or a subset. I think that
>> once
>> > I've written the full table support I's worry about collecting just a
>> > subset.
>> >
>> >
>> >> My current thinking is to create a map class that outputs key/value
>> pairs
>> >> where the key is the field I want to sort upon and the value is row
>> key.
>> >> This way I will get nice sorted input going into my reduce class. I
>> guess
>> >> I
>> >> would have to have once reduce class instance.
>> >>
>> >
>> > s3> Why one reduce?  Write your own partitioner and impose a total
>> order?
>> > Thanks, I will read up on this, thanks for the direction.
>> >
>> >
>> >
>> >>
>> >> However, I am unclear how I can return the row keys and the families
>> with
>> >> their column data to the client from the reduce class. All the
>> examples
>> I
>> >> have found so far write the results to files/tables whereas I want to
>> >> return
>> >> objects to a client.
>> >>
>> >
>> > s3>Yeah.... bit tough making  your client into a reduce sink (Can be
>> done,
>> > it
>> > s3>just has to be available to the full cluster)
>> >
>> > I guess the thing I'm definitely completely stuck upon is how to get
>> > something like Result back to the client when I' writing my own
>> map/reduce
>> > classes.
>> >
>> >>
>> >> In the Hadoop Javadocs I notice a bunch of Comparators but as yet I've
>> >> not
>> >> figure out their purpose. If I spend the cycles understanding the
>> purpose
>> >> of
>> >> these Comparators are they likely to be of help to me in formulating
>> an
>> >> alternate/better approach to that described above?
>> >>
>> >
>> >
>> > s3> In HBase all is lexicographically ordered.  Tables are ordered by
>> > rows.  Row
>> > s3> content is ordered by columns.
>> >
>> > Thanks
>> >
>> > St.Ack
>> >
>> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25376341.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25382714.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25391773.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Possible to set the results' sort method?

Posted by Keith Thomas <ke...@gmail.com>.

Brilliant! That works perfectly. Where do I vote for this to be included in
0.21.0? :-)


Clint Morgan-3 wrote:
> 
> perhaps you would be interested in the tableindexed package. (Its in
> transactional contrib, see doc in o.a.h.h.client.tableindexed, or look at
> the tests).
> 
> It will allow you to get a scanner whose results are ordered by a column's
> values (If you have an index on that column).
> 
> -clint
> 
> On Thu, Sep 10, 2009 at 5:49 AM, Keith Thomas
> <ke...@gmail.com>wrote:
> 
>>
>>
>>
>> stack-3 wrote:
>> >
>> > On Wed, Sep 9, 2009 at 7:52 PM, Keith Thomas <ke...@gmail.com>
>> > wrote:
>> >
>> >>
>> >> I think I'm looking at the same problem with HBase as Dingding Ye. I
>> need
>> >> to
>> >> be able to retrieve a list of rows sorted by data in a column and I'm
>> not
>> >> sure how to go about it without resorting to performing the sort on
>> the
>> >> client which feels like I'm just giving up.
>> >>
>> >>
>> >
>> > s3> You want to sort rows in the table by other than the row key or is
>> it
>> > just
>> > s3> that you want to sort the content of a row by other than its column
>> > name?
>> >
>> > I want to sort by the content of a column in each row.
>> >
>> > s3> How big is the set you want to look at?  Is it full table or some
>> > subset of
>> > rows?
>> >
>> > I am writing the data access layer, not the app itself. I have to
>> conform
>> > to a certain api. It is up to the application itself to use certain
>> > limits, although I may impose configurable limits in my layer just to
>> be
>> > conservative in this brave new world I am exploring. Idelly I'd like to
>> be
>> > able to to both, i.e. retrieve a full table or a subset. I think that
>> once
>> > I've written the full table support I's worry about collecting just a
>> > subset.
>> >
>> >
>> >> My current thinking is to create a map class that outputs key/value
>> pairs
>> >> where the key is the field I want to sort upon and the value is row
>> key.
>> >> This way I will get nice sorted input going into my reduce class. I
>> guess
>> >> I
>> >> would have to have once reduce class instance.
>> >>
>> >
>> > s3> Why one reduce?  Write your own partitioner and impose a total
>> order?
>> > Thanks, I will read up on this, thanks for the direction.
>> >
>> >
>> >
>> >>
>> >> However, I am unclear how I can return the row keys and the families
>> with
>> >> their column data to the client from the reduce class. All the
>> examples
>> I
>> >> have found so far write the results to files/tables whereas I want to
>> >> return
>> >> objects to a client.
>> >>
>> >
>> > s3>Yeah.... bit tough making  your client into a reduce sink (Can be
>> done,
>> > it
>> > s3>just has to be available to the full cluster)
>> >
>> > I guess the thing I'm definitely completely stuck upon is how to get
>> > something like Result back to the client when I' writing my own
>> map/reduce
>> > classes.
>> >
>> >>
>> >> In the Hadoop Javadocs I notice a bunch of Comparators but as yet I've
>> >> not
>> >> figure out their purpose. If I spend the cycles understanding the
>> purpose
>> >> of
>> >> these Comparators are they likely to be of help to me in formulating
>> an
>> >> alternate/better approach to that described above?
>> >>
>> >
>> >
>> > s3> In HBase all is lexicographically ordered.  Tables are ordered by
>> > rows.  Row
>> > s3> content is ordered by columns.
>> >
>> > Thanks
>> >
>> > St.Ack
>> >
>> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25376341.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25382714.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25391773.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Possible to set the results' sort method?

Posted by Clint Morgan <cl...@troove.net>.

perhaps you would be interested in the tableindexed package. (Its in
transactional contrib, see doc in o.a.h.h.client.tableindexed, or look at
the tests).

It will allow you to get a scanner whose results are ordered by a column's
values (If you have an index on that column).

-clint

On Thu, Sep 10, 2009 at 5:49 AM, Keith Thomas <ke...@gmail.com>wrote:

>
>
>
> stack-3 wrote:
> >
> > On Wed, Sep 9, 2009 at 7:52 PM, Keith Thomas <ke...@gmail.com>
> > wrote:
> >
> >>
> >> I think I'm looking at the same problem with HBase as Dingding Ye. I
> need
> >> to
> >> be able to retrieve a list of rows sorted by data in a column and I'm
> not
> >> sure how to go about it without resorting to performing the sort on the
> >> client which feels like I'm just giving up.
> >>
> >>
> >
> > s3> You want to sort rows in the table by other than the row key or is it
> > just
> > s3> that you want to sort the content of a row by other than its column
> > name?
> >
> > I want to sort by the content of a column in each row.
> >
> > s3> How big is the set you want to look at?  Is it full table or some
> > subset of
> > rows?
> >
> > I am writing the data access layer, not the app itself. I have to conform
> > to a certain api. It is up to the application itself to use certain
> > limits, although I may impose configurable limits in my layer just to be
> > conservative in this brave new world I am exploring. Idelly I'd like to
> be
> > able to to both, i.e. retrieve a full table or a subset. I think that
> once
> > I've written the full table support I's worry about collecting just a
> > subset.
> >
> >
> >> My current thinking is to create a map class that outputs key/value
> pairs
> >> where the key is the field I want to sort upon and the value is row key.
> >> This way I will get nice sorted input going into my reduce class. I
> guess
> >> I
> >> would have to have once reduce class instance.
> >>
> >
> > s3> Why one reduce?  Write your own partitioner and impose a total order?
> > Thanks, I will read up on this, thanks for the direction.
> >
> >
> >
> >>
> >> However, I am unclear how I can return the row keys and the families
> with
> >> their column data to the client from the reduce class. All the examples
> I
> >> have found so far write the results to files/tables whereas I want to
> >> return
> >> objects to a client.
> >>
> >
> > s3>Yeah.... bit tough making  your client into a reduce sink (Can be
> done,
> > it
> > s3>just has to be available to the full cluster)
> >
> > I guess the thing I'm definitely completely stuck upon is how to get
> > something like Result back to the client when I' writing my own
> map/reduce
> > classes.
> >
> >>
> >> In the Hadoop Javadocs I notice a bunch of Comparators but as yet I've
> >> not
> >> figure out their purpose. If I spend the cycles understanding the
> purpose
> >> of
> >> these Comparators are they likely to be of help to me in formulating an
> >> alternate/better approach to that described above?
> >>
> >
> >
> > s3> In HBase all is lexicographically ordered.  Tables are ordered by
> > rows.  Row
> > s3> content is ordered by columns.
> >
> > Thanks
> >
> > St.Ack
> >
> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25376341.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25382714.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Possible to set the results' sort method?

Posted by Keith Thomas <ke...@gmail.com>.



stack-3 wrote:
> 
> On Wed, Sep 9, 2009 at 7:52 PM, Keith Thomas <ke...@gmail.com>
> wrote:
> 
>>
>> I think I'm looking at the same problem with HBase as Dingding Ye. I need
>> to
>> be able to retrieve a list of rows sorted by data in a column and I'm not
>> sure how to go about it without resorting to performing the sort on the
>> client which feels like I'm just giving up.
>>
>>
> 
> s3> You want to sort rows in the table by other than the row key or is it
> just
> s3> that you want to sort the content of a row by other than its column
> name?
> 
> I want to sort by the content of a column in each row.
> 
> s3> How big is the set you want to look at?  Is it full table or some
> subset of
> rows?
> 
> I am writing the data access layer, not the app itself. I have to conform
> to a certain api. It is up to the application itself to use certain
> limits, although I may impose configurable limits in my layer just to be
> conservative in this brave new world I am exploring. Idelly I'd like to be
> able to to both, i.e. retrieve a full table or a subset. I think that once
> I've written the full table support I's worry about collecting just a
> subset.
> 
> 
>> My current thinking is to create a map class that outputs key/value pairs
>> where the key is the field I want to sort upon and the value is row key.
>> This way I will get nice sorted input going into my reduce class. I guess
>> I
>> would have to have once reduce class instance.
>>
> 
> s3> Why one reduce?  Write your own partitioner and impose a total order?
> Thanks, I will read up on this, thanks for the direction.
> 
> 
> 
>>
>> However, I am unclear how I can return the row keys and the families with
>> their column data to the client from the reduce class. All the examples I
>> have found so far write the results to files/tables whereas I want to
>> return
>> objects to a client.
>>
> 
> s3>Yeah.... bit tough making  your client into a reduce sink (Can be done,
> it
> s3>just has to be available to the full cluster)
> 
> I guess the thing I'm definitely completely stuck upon is how to get
> something like Result back to the client when I' writing my own map/reduce
> classes.
> 
>>
>> In the Hadoop Javadocs I notice a bunch of Comparators but as yet I've
>> not
>> figure out their purpose. If I spend the cycles understanding the purpose
>> of
>> these Comparators are they likely to be of help to me in formulating an
>> alternate/better approach to that described above?
>>
> 
> 
> s3> In HBase all is lexicographically ordered.  Tables are ordered by
> rows.  Row
> s3> content is ordered by columns.
> 
> Thanks
> 
> St.Ack
> 
> 
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25376341.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25382714.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Possible to set the results' sort method?

Posted by stack <st...@duboce.net>.

On Wed, Sep 9, 2009 at 7:52 PM, Keith Thomas <ke...@gmail.com> wrote:

>
> I think I'm looking at the same problem with HBase as Dingding Ye. I need
> to
> be able to retrieve a list of rows sorted by data in a column and I'm not
> sure how to go about it without resorting to performing the sort on the
> client which feels like I'm just giving up.
>
>

You want to sort rows in the table by other than the row key or is it just
that you want to sort the content of a row by other than its column name?

How big is the set you want to look at?  Is it full table or some subset of
rows?

> My current thinking is to create a map class that outputs key/value pairs
> where the key is the field I want to sort upon and the value is row key.
> This way I will get nice sorted input going into my reduce class. I guess I
> would have to have once reduce class instance.
>

Why one reduce?  Write your own partitioner and impose a total order?

>
> However, I am unclear how I can return the row keys and the families with
> their column data to the client from the reduce class. All the examples I
> have found so far write the results to files/tables whereas I want to
> return
> objects to a client.
>

Yeah.... bit tough making  your client into a reduce sink (Can be done, it
just has to be available to the full cluster)

>
> In the Hadoop Javadocs I notice a bunch of Comparators but as yet I've not
> figure out their purpose. If I spend the cycles understanding the purpose
> of
> these Comparators are they likely to be of help to me in formulating an
> alternate/better approach to that described above?
>

In HBase all is lexicographically ordered.  Tables are ordered by rows.  Row
content is ordered by columns.

St.Ack

>
> --
> View this message in context:
> http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25376341.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Possible to set the results' sort method?

Posted by Keith Thomas <ke...@gmail.com>.

I think I'm looking at the same problem with HBase as Dingding Ye. I need to
be able to retrieve a list of rows sorted by data in a column and I'm not
sure how to go about it without resorting to performing the sort on the
client which feels like I'm just giving up.

My current thinking is to create a map class that outputs key/value pairs
where the key is the field I want to sort upon and the value is row key.
This way I will get nice sorted input going into my reduce class. I guess I
would have to have once reduce class instance. 

However, I am unclear how I can return the row keys and the families with
their column data to the client from the reduce class. All the examples I
have found so far write the results to files/tables whereas I want to return
objects to a client. 

In the Hadoop Javadocs I notice a bunch of Comparators but as yet I've not
figure out their purpose. If I spend the cycles understanding the purpose of
these Comparators are they likely to be of help to me in formulating an
alternate/better approach to that described above?

-- 
View this message in context: http://www.nabble.com/Possible-to-set-the-results%27-sort-method--tp20047852p25376341.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Possible to set the results' sort method?

Posted by Dingding Ye <ye...@gmail.com>.

J-D

Yes. Must I get the values at first and then sort? Is there a direct way to
do that?

Thanks.

sishen

On Sun, Oct 19, 2008 at 10:34 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> sishen,
>
> Do you mean that you want a way to do something like RowResult.values()
> that
> returns cells sorted by timestamps (or something else)?
>
> J-D
>
> On Sat, Oct 18, 2008 at 11:28 AM, Dingding Ye <ye...@gmail.com>
> wrote:
>
> > Hi.
> >
> > The default results returned by getRow/Scanner/else is ordered by the
> key.
> >
> > Is it possible to set other sort method such as timestamp?
> >
> > Thanks for the help.
> >
> > Best regards.
> >
> > sishen
> >
>

Re: Possible to set the results' sort method?

Posted by Michael Stack <st...@duboce.net>.

(Going further down the path I believe J-D was heading), no, its not 
possible to change the sort-order server-side, not without subclassing 
regionserver class, but you might be able client-side to give the 
RowResult to a new SortedMap, one that has a different comparator; e.g. 
one that sorts the results by timestamp.

St.Ack

Jean-Daniel Cryans wrote:
> sishen,
>
> Do you mean that you want a way to do something like RowResult.values() that
> returns cells sorted by timestamps (or something else)?
>
> J-D
>
> On Sat, Oct 18, 2008 at 11:28 AM, Dingding Ye <ye...@gmail.com> wrote:
>
>   
>> Hi.
>>
>> The default results returned by getRow/Scanner/else is ordered by the key.
>>
>> Is it possible to set other sort method such as timestamp?
>>
>> Thanks for the help.
>>
>> Best regards.
>>
>> sishen
>>
>>     
>
>

Re: Possible to set the results' sort method?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

sishen,

Do you mean that you want a way to do something like RowResult.values() that
returns cells sorted by timestamps (or something else)?

J-D

On Sat, Oct 18, 2008 at 11:28 AM, Dingding Ye <ye...@gmail.com> wrote:

> Hi.
>
> The default results returned by getRow/Scanner/else is ordered by the key.
>
> Is it possible to set other sort method such as timestamp?
>
> Thanks for the help.
>
> Best regards.
>
> sishen
>