You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vikram Singh Chandel <vi...@gmail.com> on 2014/04/29 11:13:10 UTC

How to implement sorting in HBase scans for a particular column

Hi

We have a requirement in which we have to get the scan result sorted on a
particular column.

eg. *Get Details of Authors sorted by their Publication Count. Limit :1000 *

*Row Key is a MD5 hash of Author Id*

Number of records 8.2 million rows for 3 year data.(sample dataset, actual
data set is 30 years)

We are currently looking in to implement a *comparator *and sort the
values. But but for this first we have to store all 8.2 m records in a
map/list and then sort. And this approach is neither memory efficient nor
time efficient.

Is there any solution via which this kind of request can be fulfilled in
real time?



-- 
*Regards*

*VIKRAM SINGH CHANDEL*

Please do not print this email unless it is absolutely necessary,Reduce.
Reuse. Recycle. Save our planet.

Re: How to implement sorting in HBase scans for a particular column

Posted by Vikram Singh Chandel <vi...@gmail.com>.
Hi James
Thanks a lot James for the reply,  we will give it a try and let you know
with our progress




On Tue, Apr 29, 2014 at 11:25 PM, James Taylor <jt...@salesforce.com>wrote:

> Hi Vikram,
> I see you sent the Phoenix mailing list back in Dec a question on how to
> use Phoenix 2.1.2 with Hadoop 2 for HBase 0.94. Looks like you were having
> trouble building Phoenix with the hadoop2 profile. In our 3.0/4.0 we bundle
> the phoenix jars pre-built with both hadoop1 and hadoop2, so there's
> nothing you need to do.
>
> Did you have any other issues?
>
> Regarding sorting rows, Apache Phoenix handles this for you when you do an
> ORDER BY:
> CREATE TABLE names(id VARCHAR NOT NULL PRIMARY KEY,
>     name VARCHAR, age INTEGER);
> // populate the table
> SELECT * FROM names ORDER BY age;
>
> Thanks,
> James
>
>
> On Tue, Apr 29, 2014 at 5:33 AM, Vikram Singh Chandel <
> vikramsinghchandel@gmail.com> wrote:
>
> > Yes we have looked, but way back in November December 2013 when it was
> > having a lot of issue and because of which we decided not to use it. We
> > built our solution design on Hbase alone. So we are looking for a better
> > solution.
> >
> > Thanks
> >
> >
> > On Tue, Apr 29, 2014 at 5:46 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Have you looked at Apache Phoenix ?
> > >
> > > Cheers
> > >
> > > On Apr 29, 2014, at 2:13 AM, Vikram Singh Chandel <
> > > vikramsinghchandel@gmail.com> wrote:
> > >
> > > > Hi
> > > >
> > > > We have a requirement in which we have to get the scan result sorted
> > on a
> > > > particular column.
> > > >
> > > > eg. *Get Details of Authors sorted by their Publication Count. Limit
> > > :1000 *
> > > >
> > > > *Row Key is a MD5 hash of Author Id*
> > > >
> > > > Number of records 8.2 million rows for 3 year data.(sample dataset,
> > > actual
> > > > data set is 30 years)
> > > >
> > > > We are currently looking in to implement a *comparator *and sort the
> > > > values. But but for this first we have to store all 8.2 m records in
> a
> > > > map/list and then sort. And this approach is neither memory efficient
> > nor
> > > > time efficient.
> > > >
> > > > Is there any solution via which this kind of request can be fulfilled
> > in
> > > > real time?
> > > >
> > > >
> > > >
> > > > --
> > > > *Regards*
> > > >
> > > > *VIKRAM SINGH CHANDEL*
> > > >
> > > > Please do not print this email unless it is absolutely
> > necessary,Reduce.
> > > > Reuse. Recycle. Save our planet.
> > >
> >
> >
> >
> > --
> > *Regards*
> >
> > *VIKRAM SINGH CHANDEL*
> >
> > Please do not print this email unless it is absolutely necessary,Reduce.
> > Reuse. Recycle. Save our planet.
> >
>



-- 
*Regards*

*VIKRAM SINGH CHANDEL*

Please do not print this email unless it is absolutely necessary,Reduce.
Reuse. Recycle. Save our planet.

Re: How to implement sorting in HBase scans for a particular column

Posted by James Taylor <jt...@salesforce.com>.
Hi Vikram,
I see you sent the Phoenix mailing list back in Dec a question on how to
use Phoenix 2.1.2 with Hadoop 2 for HBase 0.94. Looks like you were having
trouble building Phoenix with the hadoop2 profile. In our 3.0/4.0 we bundle
the phoenix jars pre-built with both hadoop1 and hadoop2, so there's
nothing you need to do.

Did you have any other issues?

Regarding sorting rows, Apache Phoenix handles this for you when you do an
ORDER BY:
CREATE TABLE names(id VARCHAR NOT NULL PRIMARY KEY,
    name VARCHAR, age INTEGER);
// populate the table
SELECT * FROM names ORDER BY age;

Thanks,
James


On Tue, Apr 29, 2014 at 5:33 AM, Vikram Singh Chandel <
vikramsinghchandel@gmail.com> wrote:

> Yes we have looked, but way back in November December 2013 when it was
> having a lot of issue and because of which we decided not to use it. We
> built our solution design on Hbase alone. So we are looking for a better
> solution.
>
> Thanks
>
>
> On Tue, Apr 29, 2014 at 5:46 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Have you looked at Apache Phoenix ?
> >
> > Cheers
> >
> > On Apr 29, 2014, at 2:13 AM, Vikram Singh Chandel <
> > vikramsinghchandel@gmail.com> wrote:
> >
> > > Hi
> > >
> > > We have a requirement in which we have to get the scan result sorted
> on a
> > > particular column.
> > >
> > > eg. *Get Details of Authors sorted by their Publication Count. Limit
> > :1000 *
> > >
> > > *Row Key is a MD5 hash of Author Id*
> > >
> > > Number of records 8.2 million rows for 3 year data.(sample dataset,
> > actual
> > > data set is 30 years)
> > >
> > > We are currently looking in to implement a *comparator *and sort the
> > > values. But but for this first we have to store all 8.2 m records in a
> > > map/list and then sort. And this approach is neither memory efficient
> nor
> > > time efficient.
> > >
> > > Is there any solution via which this kind of request can be fulfilled
> in
> > > real time?
> > >
> > >
> > >
> > > --
> > > *Regards*
> > >
> > > *VIKRAM SINGH CHANDEL*
> > >
> > > Please do not print this email unless it is absolutely
> necessary,Reduce.
> > > Reuse. Recycle. Save our planet.
> >
>
>
>
> --
> *Regards*
>
> *VIKRAM SINGH CHANDEL*
>
> Please do not print this email unless it is absolutely necessary,Reduce.
> Reuse. Recycle. Save our planet.
>

Re: How to implement sorting in HBase scans for a particular column

Posted by Vikram Singh Chandel <vi...@gmail.com>.
Yes we have looked, but way back in November December 2013 when it was
having a lot of issue and because of which we decided not to use it. We
built our solution design on Hbase alone. So we are looking for a better
solution.

Thanks


On Tue, Apr 29, 2014 at 5:46 PM, Ted Yu <yu...@gmail.com> wrote:

> Have you looked at Apache Phoenix ?
>
> Cheers
>
> On Apr 29, 2014, at 2:13 AM, Vikram Singh Chandel <
> vikramsinghchandel@gmail.com> wrote:
>
> > Hi
> >
> > We have a requirement in which we have to get the scan result sorted on a
> > particular column.
> >
> > eg. *Get Details of Authors sorted by their Publication Count. Limit
> :1000 *
> >
> > *Row Key is a MD5 hash of Author Id*
> >
> > Number of records 8.2 million rows for 3 year data.(sample dataset,
> actual
> > data set is 30 years)
> >
> > We are currently looking in to implement a *comparator *and sort the
> > values. But but for this first we have to store all 8.2 m records in a
> > map/list and then sort. And this approach is neither memory efficient nor
> > time efficient.
> >
> > Is there any solution via which this kind of request can be fulfilled in
> > real time?
> >
> >
> >
> > --
> > *Regards*
> >
> > *VIKRAM SINGH CHANDEL*
> >
> > Please do not print this email unless it is absolutely necessary,Reduce.
> > Reuse. Recycle. Save our planet.
>



-- 
*Regards*

*VIKRAM SINGH CHANDEL*

Please do not print this email unless it is absolutely necessary,Reduce.
Reuse. Recycle. Save our planet.

Re: How to implement sorting in HBase scans for a particular column

Posted by Ted Yu <yu...@gmail.com>.
Have you looked at Apache Phoenix ?

Cheers

On Apr 29, 2014, at 2:13 AM, Vikram Singh Chandel <vi...@gmail.com> wrote:

> Hi
> 
> We have a requirement in which we have to get the scan result sorted on a
> particular column.
> 
> eg. *Get Details of Authors sorted by their Publication Count. Limit :1000 *
> 
> *Row Key is a MD5 hash of Author Id*
> 
> Number of records 8.2 million rows for 3 year data.(sample dataset, actual
> data set is 30 years)
> 
> We are currently looking in to implement a *comparator *and sort the
> values. But but for this first we have to store all 8.2 m records in a
> map/list and then sort. And this approach is neither memory efficient nor
> time efficient.
> 
> Is there any solution via which this kind of request can be fulfilled in
> real time?
> 
> 
> 
> -- 
> *Regards*
> 
> *VIKRAM SINGH CHANDEL*
> 
> Please do not print this email unless it is absolutely necessary,Reduce.
> Reuse. Recycle. Save our planet.